C Pointers
  • C Pointers
  • Linux Device Drivers
  • Linux System Programming
    • Linux Socket Programming
    • Message_queues
    • NamedPipes
    • Netlink
      • Basic example netlink
      • Netlink program with Select system call
      • Netlink program with Poll system call
      • Netlink program with Epoll system call
    • Shared_Memory
    • Shared_Memory_2_FDS
    • SocketPair
    • Timerfd
  • Linux Network Programming
  • Linux Build Environments
  • OS Ports
  • FreeBSD Device Drivers
C Pointers
  • »
  • Linux System Programming »
  • Netlink »
  • Netlink program with Epoll system call
  • View page source
Previous Next

Netlink program with Epoll system call

  • In this program, you are going to learn

  • How to communication between kernel and user space ?

  • How to create a socket ?

  • How to send a data ?

  • How to recv a data?

  • How to use user space APIs ?

    • NLMSG_SPACE

    • NLMSG_DATA

  • How to use kernel space APIs ?

    • nlmsg_new

    • nlmsg_put

    • NETLINK_CB

    • nlmsg_unicast

    • netlink_kernel_create

    • netlink_kernel_release

  • How to use socket APIs ?

    • socket

    • epoll_create1

    • epoll_ctl

    • epoll_wait

    • sendmsg

    • recvmsg

Topics in this section,

  • Netlink

  • Netlink socket FAQs

  • step1 : User program Sequence Diagram nl_user.c

  • step2 : User program nl_user.c

  • step3 : Kernel program netlink_kernel.c

  • step4 : Makefile

  • step5 : Compile and Load

  • Summary

  • Netlink is used to transfer information between the kernel and user-space processes.

  • Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type.

Let us answer few basic questions in this socket

What is the purpose of the socket(AF_NETLINK, SOCK_RAW, NETLINK_TESTFAMILY) call?

See Answer

It creates a raw Netlink socket with a custom family identifier (NETLINK_TESTFAMILY).

Why choose AF_NETLINK as the address family for the socket?

See Answer

AF_NETLINK is specifically designed for Netlink communication between the Linux kernel and user-space.

What does the SOCK_RAW type parameter indicate in the socket creation?

See Answer

It signifies that the socket operates in raw mode, providing direct access to Netlink messages.

How is NETLINK_TESTFAMILY used in socket creation?

See Answer

It’s a custom Netlink family identifier, helping segregate messages for a specific application or purpose.

Can multiple sockets share the same Netlink family identifier (NETLINK_TESTFAMILY)?

See Answer

Yes, multiple sockets can use the same Netlink family identifier for communication.

How can the socket be utilized for communication with the Linux kernel?

See Answer

The socket can send and receive Netlink messages, facilitating communication between user-space and the kernel.

Is error checking necessary after creating the Netlink socket?

See Answer

Yes, it’s essential to check for errors after creating the socket to handle potential issues.

Is cleanup necessary after using the Netlink socket?

See Answer

Yes, it’s good practice to close the Netlink socket using close when it’s no longer needed.

What role does the struct sockaddr_nl play in Netlink socket creation?

See Answer

It provides address information for the Netlink socket, specifying details like family and process ID.

What happens if the Netlink buffer becomes full during message reception?

See Answer

The kernel may return an error, such as “No buffer space available” (ENOBUFS).

What is the role of the sendmsg function in Netlink communication?

See Answer

It is used to send a message on a Netlink socket, providing flexibility in constructing and sending messages.

What is the role of the recvmsg function in Netlink communication?

See Answer

It is used to recv a message on a Netlink socket, providing flexibility in recving and constructing messages.

How can a Netlink socket be used for kernel module communication?

See Answer

By defining a custom Netlink family, kernel modules can communicate with user-space applications.

What is the primary purpose of the epoll system call?

See Answer

To efficiently monitor multiple file descriptors for I/O events

What types of file descriptors can be monitored using epoll?

See Answer

sockets, files, timerfd, socketpair, message_queue, Namedpipes and shared_memory.

What data structure is used by epoll to store events?

See Answer

Hash table

How do you handle errors when using the epoll system call?

See Answer

Check the return value for -1 to detect errors, Use perror to print error messages.

How does epoll handle a set of file descriptors with different states (e.g., reading, writing, exception)?

See Answer
Create the epoll Instance:

Before monitoring file descriptors, the application creates an epoll instance using the epoll_create system call.

int epoll_fd = epoll_create1(0);
Register File Discriptors:

The application registers file descriptors with the epoll instance using the epoll_ctl system call. It specifies the file descriptor, the events it is interested in (EPOLLIN for readability, EPOLLOUT for writability, etc.), and a user-defined data associated with the file descriptor.

struct epoll_event event;
event.events = EPOLLIN | EPOLLOUT;  // Interested in readability and writability
event.data.fd = my_file_descriptor; // File descriptor to monitor

epoll_ctl(epoll_fd, EPOLL_CTL_ADD, my_file_descriptor, &event);
Wait for Events:

The application enters a loop where it calls epoll_wait to wait for events. This call blocks until one or more registered file descriptors become ready or until a timeout occurs.

#define MAX_EVENTS 10
struct epoll_event events[MAX_EVENTS];

int num_events = epoll_wait(epoll_fd, events, MAX_EVENTS, timeout_ms);
Modify or Remove File Descriptors:

The application can dynamically modify or remove file descriptors from the epoll set using the epoll_ctl system call. For example, to modify events for an existing file descriptor:

struct epoll_event new_event;
new_event.events = EPOLLOUT;  // Modify to be interested in writability

epoll_ctl(epoll_fd, EPOLL_CTL_MOD, my_file_descriptor, &new_event);

To remove a file descriptor from the epoll set:

epoll_ctl(epoll_fd, EPOLL_CTL_DEL, my_file_descriptor, NULL);

How does epoll Checking Ready File Descriptors?

See Answer

After epoll_wait returns, the application iterates through the returned events to identify which file descriptors are ready and for what types of events.

for (int i = 0; i < num_events; ++i) {
        if (events[i].events & EPOLLIN) {
                // File descriptor i is ready for reading
        }

        if (events[i].events & EPOLLOUT) {
                // File descriptor i is ready for writing
        }
        // Check other events if needed (e.g., EPOLLERR, EPOLLHUP)
}

What does it mean if epoll returns 0?

See Answer

No file descriptors are ready within the specified timeout.

https://www.plantuml.com/plantuml/svg/XL5HIyCm47xlhpXV9C4urjV9OANhOAphP2rJ4GanlRfWseucTlZlJTUAcm-QX-HoztrttvVCZ1MD_IYrTsNtM2AOWv2enQjtAdWJKyjtj2HOy2JAucMoOf1kmXNV1WyCXXJFeZHR1Ejb_4Jll0aUQgrvaSCK-b0sA2pwPFnJbLQJRno3w7uJnppCOXxCvmoaLbXPtezcS8sjkphkSQyq0XaEdr9G1iWd6kfqOfMNvetyvheSWxF1Hw7isUHJQyDW7JpoQ1SbMShWrdTmLG3fYy_Zbr4kh1XrZgQNuYYuLwi63T2j_uNqi0Pb-oSCjN4cgvfwhg4gkdqm7cRR-4OwpwIjLKwbhWrUzd5qO-k3DtCazxTV
  • To create a socket with socket(),

client_socket = socket(AF_NETLINK, SOCK_RAW, NETLINK_TESTFAMILY);
  • nl_pid field of the sockaddr_nl can be filled with the calling process’ own pid.

nlh->nlmsg_pid = getpid();
  • Sending a Netlink Message In order to send a netlink message to the kernel or other user-space processes, another struct sockaddr_nl addr needs to be supplied as the destination address, the same as sending a packet with sendmsg(). If the message is destined for the kernel, both nl_pid and nl_groups should be supplied with 0.

addr.nl_pid = 0;
addr.nl_groups = 0;

struct msghdr msg;

msg.msg_name = (void * ) &addr;
msg.msg_namelen = sizeof(addr);
  • The netlink socket requires its own message header as well. This is for providing a common ground for netlink messages of all protocol types. Because the Linux kernel netlink core assumes the existence of the following header in each netlink message, an application must supply this header in each netlink message it sends:

struct nlmsghdr * nlh = (struct nlmsghdr * ) malloc(NLMSG_SPACE(MAX_PAYLOAD));

nlh->nlmsg_len = NLMSG_SPACE(MAX_PAYLOAD);
nlh->nlmsg_pid = getpid();
nlh->nlmsg_flags = 0;
  • A netlink message thus consists of nlmsghdr and the message payload. Once a message has been entered, it enters a buffer pointed to by the nlh pointer. We also can send the message to the struct msghdr msg:

struct iovec iov;

iov.iov_base = (void * ) nlh;
iov.iov_len = nlh->nlmsg_len;

msg.msg_iov = &iov;
msg.msg_iovlen = 1;
  • After the above steps, a call to sendmsg() kicks out the netlink message:

sendmsg(client_socket, &msg, 0);
  • epoll_create1() creating an epoll instance using epoll_create1, The size parameter is an advisory hint for the kernel regarding the number of file descriptors expected to be monitored, For example,

epoll_fd = epoll_create1(0));
  • epoll_ctl() After creating an epoll instance, file descriptors are added to it using epoll_ctl, For example,

ret = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_socket, &event);
  • epoll_wait() The application then enters a loop where it waits for events using epoll_wait, For example,

ret = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
  • recvmsg used for Receiving Netlink Messages,

recvmsg(client_socket, &msg, 0);
  • close is used to close the socket To free up system resources associated with the socket. For example,

(void)close(client_socket);
  • See the full program below,

#include <linux/netlink.h>
#include <sys/socket.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <sys/epoll.h>

#define NETLINK_TESTFAMILY 25
#define MAX_PAYLOAD 1024
#define MAX_EVENTS 2

struct msghdr msg;
int client_socket;
int epoll_fd;

static void sigint_handler(
int signo)
{
  (void)close(client_socket);
  (void)close(epoll_fd);
  sleep(2);
  (void)printf("Caught sigINT!\n");
  exit(EXIT_SUCCESS);
}

void register_signal_handler(
int signum,
void (*handler)(int))
{
  if (signal(signum, handler) ==
  SIG_ERR) {
     printf("Cannot handle signal\n");
     exit(EXIT_FAILURE);
  }
}

struct nlmsghdr *send_message(
int client_socket, 
struct sockaddr_nl addr)
{
  struct iovec iov;

  struct nlmsghdr *nlh = (
  struct nlmsghdr *) malloc(
  NLMSG_SPACE(MAX_PAYLOAD));

  memset(nlh, 0, 
  NLMSG_SPACE(MAX_PAYLOAD));
  nlh->nlmsg_len = NLMSG_SPACE(
  MAX_PAYLOAD);
  nlh->nlmsg_pid = getpid();
  nlh->nlmsg_flags = 0;
  strcpy((char *) 
  NLMSG_DATA(nlh), "Hello");

  memset(&iov, 0, sizeof(iov));
  iov.iov_base = (void *) nlh;
  iov.iov_len = nlh->nlmsg_len;

  memset(&msg, 0, sizeof(msg));
  msg.msg_name = (void *) &addr;
  msg.msg_namelen = sizeof(addr);
  msg.msg_iov = &iov;
  msg.msg_iovlen = 1;

  printf("Sending message to kernel\n");
  printf("-------------------------\n");
  sendmsg(client_socket, &msg, 0);
  printf("Sent message: %s\n\n", 
  (char *)NLMSG_DATA(nlh));
  
  return nlh;
}


int main()
{
  int len, ret;
  int ready_fds;
  struct nlmsghdr *nlh;
  struct sockaddr_nl addr;
  struct epoll_event 
  events[MAX_EVENTS];
  struct epoll_event event;

  register_signal_handler(SIGINT,
  sigint_handler);

  client_socket = socket(
  AF_NETLINK, SOCK_RAW, 
  NETLINK_TESTFAMILY);

  if (client_socket == -1) {
    perror("socket");
    return -1;
  }

  memset(&addr, 0, sizeof(addr));
  addr.nl_family = AF_NETLINK;
  addr.nl_pid = 0;  // For Linux kernel
  addr.nl_groups = 0;

  epoll_fd = epoll_create1(0);

  if (epoll_fd < 0) {
     perror("Epoll creation failed");
     exit(EXIT_FAILURE);
  }

  event.events = EPOLLIN;
  event.data.fd = client_socket;
  
  ret = epoll_ctl(epoll_fd, 
  EPOLL_CTL_ADD, client_socket, &event);
 
  if (ret < 0) {
    perror("Epoll_ctl failed");
    exit(EXIT_FAILURE);
  }

  while (1) {
    nlh = send_message(
    client_socket, addr);
    
    printf("sent successful\n");

    ready_fds = epoll_wait(epoll_fd, 
    events, MAX_EVENTS, -1);

    if (ready_fds < 0) {
	perror("Epoll wait failed");
	exit(EXIT_FAILURE);
    }

    if (events[0].data.fd == client_socket) {
      nlh = send_message(
      client_socket, addr);
      
      len = recvmsg(client_socket, 
      &msg, 0);

      if (len > 0) {
        printf("Receving msg from kernel\n");
        printf("------------------------\n");
        printf("Received message: %s\n", 
        (char *)NLMSG_DATA(nlh));
      } else {
        if (errno == ENOBUFS) {
          free(nlh);
        } else {
          perror("recv");
          break;
        }
      }
    }
  }

  (void)close(client_socket);

  return 0;
}
  • nlmsg_new using this create a netlink message.

struct sk_buff * skb_out;

skb_out = nlmsg_new(message_size, GFP_KERNEL);
  • nlmsg_put used to populate the message with data.

struct nlmsghdr * nlh = (struct nlmsghdr * ) skb->data;

nlh = nlmsg_put(skb_out, 0, 0, NLMSG_DONE, message_size, 0);
  • nlmsg_unicast used to send the message to the user space application.

result = nlmsg_unicast(socket, skb_out, pid);
  • netlink_kernel_create used to create a Netlink socket in the kernel.

socket = netlink_kernel_create(&init_net, NETLINK_TESTFAMILY, &config);
  • wake_up_interruptible is used to wake up any threads that are waiting on the specified wait queue wq. When a thread sets a condition and calls wake_up_interruptible(&wq), it signals to other threads waiting on the same condition that they can proceed.

wake_up_interruptible(&wq);
  • init_completion used to initialize the dynamically created completion variable. For example,

init_completion(&comp);
  • kthread_run is used to create and start a kernel thread. For example,

kthread = kthread_run(thread_func, NULL, "my_thread");
  • kthread_stop is used to stop and clean up a kernel thread created with kthread_run. For example,

kthread_stop(kthread);
  • In this example, kthread_run creates a kernel thread to execute thread_func and kthread_stop is used to stop and clean up the thread when it’s no longer needed. The thread checks kthread_shoul_stop() to determine when it should exit.

  • complete used to signal any waiting tasks to wake up. For example,

complete(&comp);
  • wait_for_completion used to waits for the given completion variable to be signaled. FOr example,

wait_for_completion(&comp);
  • netlink_kernel_release used to release the netlink socket created with netlink_kernel_create.

netlink_kernel_release(socket);
  • See the full program below,

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/netlink.h>
#include <linux/kthread.h>
#include <linux/completion.h>
#include <net/netlink.h>
#include <net/net_namespace.h>
#include <linux/delay.h>

#define NETLINK_TESTFAMILY 25
#define NETLINK_MYGROUP 2

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Linux_usr");
MODULE_DESCRIPTION("Netlink - Unicast");

struct sock *socket;

static struct completion comp;
static struct task_struct *thread1;

DECLARE_WAIT_QUEUE_HEAD(wq);

static void test_nl_receive_message(
struct sk_buff *skb)
{
  struct nlmsghdr *nlh = 
  (struct nlmsghdr *) skb->data;
  pid_t pid = nlh->nlmsg_pid;

  int result;
  char *message;
  size_t message_size;
  struct sk_buff *skb_out;

  pr_info("Entering: %s\n", __func__);
  pr_info("kernel Received message: %s\n", 
  (char *) nlmsg_data(nlh));

  message = "Hello from kernel unicast";
  message_size = strlen(message) + 1;
  skb_out = nlmsg_new(message_size, 
  GFP_KERNEL);

  if (!skb_out) {
    pr_err("Failed to allocate a new skb\n");
    return;
  }

  nlh = nlmsg_put(skb_out, 
  0, 0, NLMSG_DONE, message_size, 0);
  NETLINK_CB(skb_out).dst_group = 0;
  strncpy(nlmsg_data(nlh), 
  message, message_size);

  result = nlmsg_unicast(socket, 
  skb_out, pid);
  pr_info("Sent message: %s\n", 
  (char *)nlmsg_data(nlh));
  wake_up_interruptible(&wq);
}

static int thread_fun1(void *data)
{
  struct netlink_kernel_cfg config = {
         .input = test_nl_receive_message,
  };

  socket = netlink_kernel_create(
  &init_net, 
  NETLINK_TESTFAMILY, &config);
        
  if (socket == NULL)
     return -1;

  pr_info("Netlink initialized\n");

  while (!kthread_should_stop()) {
   pr_info("Thread 1 is running\n");
   wake_up_interruptible(&wq);
   msleep(1000);
  }  
       
  complete(&comp);

  return 0;
}

static int __init test_init(void)
{
  pr_info("Driver Loaded\n");

  init_completion(&comp);

  thread1 = kthread_run(
  thread_fun1, NULL, "thread1");

  if (IS_ERR(thread1)) {
      pr_alert("Failed to create thraed1");
      return PTR_ERR(thread1);
  }

  return 0;
}

static void __exit test_exit(void)
{
  if (socket)
    netlink_kernel_release(socket);
  kthread_stop(thread1);

  wait_for_completion(&comp);
  pr_info("Netlink released\n");
}

module_init(test_init);
module_exit(test_exit);

obj-m += netlink_kernel.o

all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

client:
	gcc nl_user.c -o nl_user

clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
	rm nl_user
$ make all

$ sudo insmod ./netlink_kernel.ko

$ gcc -o nl_user nl_user.c

$ ./nl_user

Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
Sending message to kernel
-------------------------
Sent message: Hello

Receving msg from kernel
------------------------
Received message: Hello from kernel unicast
^CCaught sigINT!

$ sudo rmmod netlink_kernel

$ dmesg

[45587.442654] Driver Loaded
[45587.442796] Netlink initialized
[45587.442799] Thread 1 is running
[45595.620384] Thread 1 is running
[45596.644279] Thread 1 is running
[45597.668497] Thread 1 is running
[45598.696484] Thread 1 is running
[45599.369656] Entering: test_nl_receive_message
[45599.369668] kernel Received message: Hello
[45599.369675] Sent message: Hello from kernel unicast
[45599.369741] Entering: test_nl_receive_message
[45599.369744] kernel Received message: Hello
[45599.369748] Sent message: Hello from kernel unicast
[45599.369802] Entering: test_nl_receive_message
[45599.369804] kernel Received message: Hello
[45599.369808] Sent message: Hello from kernel unicast
[45599.369853] Entering: test_nl_receive_message
[45599.369856] kernel Received message: Hello
[45599.369860] Sent message: Hello from kernel unicast
[45599.369892] Entering: test_nl_receive_message
[45599.369895] kernel Received message: Hello
[45599.369899] Sent message: Hello from kernel unicast
[45599.369944] Entering: test_nl_receive_message
[45599.369947] kernel Received message: Hello
[45599.369950] Sent message: Hello from kernel unicast
[45599.369999] Entering: test_nl_receive_message
[45599.370002] kernel Received message: Hello
[45599.370006] Sent message: Hello from kernel unicast
[45599.370048] Entering: test_nl_receive_message
[45599.370051] kernel Received message: Hello
[45599.370054] Sent message: Hello from kernel unicast
[45599.370102] Entering: test_nl_receive_message
[45599.370105] kernel Received message: Hello
[45599.370108] Sent message: Hello from kernel unicast
[45599.370146] Entering: test_nl_receive_message
[45599.370149] kernel Received message: Hello
[45599.370153] Sent message: Hello from kernel unicast
[45599.716442] Thread 1 is running
[45600.740260] Thread 1 is running
[45601.764221] Thread 1 is running
[45602.788468] Thread 1 is running
[45603.812238] Thread 1 is running
[45604.836370] Thread 1 is running
[45622.244072] Thread 1 is running
[45623.268359] Netlink released

Default Domain:

By default, the socket is configured to work in the AF_NETLINK domain, handling all types of network data.

Additional Domain Support:

We expand the socket’s capabilities to also function in the PF_NETLINK domain, allowing it to operate similarly to AF_NETLINK.

Socket Creation:

We set up a network connection point known as a socket using socket(PF_NETLINK, SOCK_RAW, NETLINK_TESTFAMILY).

Working Scenario:

Despite the change in domain to PF_NETLINK, the socket continues to operate the same way, handling general network data.

User Space API

Learning

socket

To create a socket

epoll

handles a set of file descriptors with different states, such as reading, writing, and exceptions, by using the struct epoll_event structure and the associated event flags..

sendmsg

To send netlink message

recvmsg

To receive netlink message

Kernel Space API

Learning

nlmsg_new

To create a netlink message

nlmsg_put

TO populate the message

nlmsg_unicast

To send the message to the user space application

netlink_kernel_create

To create a Netlink socket in the kernel

netlink_kernel_release

To release the netlink socket

wake_up_interruptible

To wake up any threads that are waiting on the specified wait queue

kthread_run

Create and wake a thread

kthread_should_stop

To determine when thread should exit

kthread_stop

Stop a thread created by kthread_create

init_completion

Initializes the given dynamically created completion variable

complete

Signals any waiting tasks to wake up

wait_for_completion

Waits for the given completion variable to be signaled

See Also
  • Previous topic

    • NamedPipes

  • Current topic

    • Netlink

  • Other IPCs

    • Shared_Memory

    • Shared_Memory_2_FDS

    • SocketPair

    • Timerfd

Previous Next

© Copyright 2023, c-pointers.