Blog/More random notes about network programming (Episode 2)

From ~esantoro
Revision as of 22:51, 4 November 2023 by Esantoro (talk | contribs) (→‎The cmsghdr structure and its usage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

As I've mentioned in the previous article (Blog/Random notes about network programming) I've decided to split these notes in multiple pages (I'll be calling them "Episodes").

In this episode, I'm taking notes about receiving and sending UDP traffic.

The goal for this page is to build up a basic udp server and client. I'm particularly interested in using the sendmsg and recvmsg functions. The next step would then be to use the Linux-specific sendmmsg and recvmmsg.

Creating a server socket

I did skip this topic in the previous episode, mostly because it was still a bit unclear to me, but in order to create a server socket the steps are fairly simple:

  • instantiate a socket
    • use socket from sys/socket.h with the appropriate parameters
  • bind the socket to an address
    • use bind from sys/socket.h
    • in order to tell the kernel what address you want the socket to be bound to you have to use the sockaddr_* structures described in the previous socket

Optionally, if you are working on a connected socket (most often, this means if you are going to do TCP traffic) you'll also need to perform a listen and then as many accepts as you'll need.

The msghdr structure

The msghdr structure is central to both sendmsg and recvmsg. It's a fairly big structure in order to limit the number of parameters that have to be passed to those two functions.

It's also a parameter-result value, meaning that those function will take inputs from there and also deposit some outputs in there.

Its fields are:

  • void* msg_name
    • the name is absolutely misleading
    • this is a pointer to something like a struct sockaddr_storage (but could as well be a sockaddr_in or sockaddr_in6 if you know you'll be receiving a specific type of traffic)
    • the caller is supposed to allocate/handle the memory that will be pointed by this field
    • the pointed memory will be filled with the address information of the sender
    • it's actually optional, if you don't care about the informations contained in this struct
  • socklen_t msg_namelen
    • the length of the memory area pointed by msg_name
  • struct iovec* msg_iov
    • this is an array of struct iovec objects. this is the area of memory where recvmsg will copy the contents of packets received.
    • it's exploiting the array-pointer equivalence in c - it would have been more intuitive if the type had been struct iovec msg_iov[]
    • More on struct iovec later
  • int msg_iovlen
    • this is the length of the msg_iov array
  • void* msg_control
    • this is an object of type struct cmsghdr
    • from what i understand, it's mostly useful when using msghdr with recvmsg.
    • From the recvmsg syscall manpage: «As an example, Linux uses this ancillary data mechanism to pass extended errors, IP options, or file descriptors over UNIX domain sockets»[1]
    • it's often described as "ancillary data"
    • sys/socket.h contains some macros (CMSG_*) to correctly parse this data
    • caller callocated/manged
  • socklen_t msg_controllen
    • the length of the msg_control field
  • int flags
    • Flags on received message. Set by the kernel.

As you can see, some of the naming here is not really helpful. Such is life, I guess.

The iovec structure

This structure is defined in sys/uio.h and it's supposed to support a technique called "scatter-gather i/o" where essentially you send data from multiple locations using a single syscall. Or at least, that's what I understood.

It has two fields:

  • void* iov_base
  • size_t iov_len

Each structure of this kind is essentially a buffer, an area of memory that starts at iov_base and is iov_len bytes long.

From my understanding, when an array of this structures are used for sending/receiving (or reading/writing) they will be used in the order they appear in the array.

It seems mostly like a "standard" way to have a buffer and its length in a single structure.

The cmsghdr structure and its usage

The msg_control field of struct msghdr is supposed to host "ancillary data". What is meant for "ancillary data" isn't well defined (or even exemplified). However I did find something.

The manpage for cmsg(3) contains some vague description on macros and function needed to handle such data.

The cmsghdr structure has the following definition:

struct cmsghdr {
    size_t cmsg_len;    /* Data byte count, including header
                           (type is socklen_t in POSIX) */
    int    cmsg_level;  /* Originating protocol */
    int    cmsg_type;   /* Protocol-specific type */
    unsigned char cmsg_data[]; /* the actual data*/
};

The cmsg_level field is essentially the layer ("layer" as in "layer of the ISO/OSI model), and the cmsg_type is a layer-specific type of data.

This mean, supposedly ("supposedly" as in "supposing that my understanding is correct") that each layer a packet goes through can leave some data attached to the packet.

An example of this (from the manpage) is getting the TTL field of the IP packet received through sendmsg.

In the example from the cmsg manpage, IPPROTO_IP (from netinet/in.h)is used to select messages from the IP layer (so below UDP), and IP_TTL to select the data about the ttl value.

The question now is obvious... What other values can I use instead of IP_TTL? And what other values can I use when I swap IPPROTO_IP with another protocol?

So far I've found some other definitions near the definition of IP_TTL in linux/in.h (/usr/include/linux/in.h on my system) but no mention of anything that could answer my questions.

Putting it all together

UDP Server

This simple C program creates an udp socket and binds it to port 5050.

It reads a packet at the time and prints its content, along with the address of the sender.

#include <stdio.h>

#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/uio.h>

// 64kb is the maximum size for an udp datagram AFAIK
#define RECEIVE_BUFFER_SIZE  ((2<<15)-1)

// if we ever wanted to have more than one receive buffer..
#define RECEIVE_BUFFER_NENTRIES 1

int main(int argc, char **argv){

  // information about where do we want to be bound
  struct sockaddr_in local_address ;
  local_address.sin_family = AF_INET;
  local_address.sin_port = htons(5050);

  // translating "0.0.0.0" to network-byte-order
  if (inet_pton(AF_INET, "0.0.0.0", &(local_address.sin_addr)) != 1){
    printf("Failed to fill local_address structure\n");
    exit(1);
  }

  // instantiating an udp datagram socket on ipv4
  int sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

  // bind that socket to the address we prepared earlier
  if (bind(sockfd,
	   (struct sockaddr*)&local_address,
	   sizeof(struct sockaddr_in))) {
    perror("Error binding socket to port");
  }
  
  
  printf("Setting up receive buffer sized at %d bytes\n", RECEIVE_BUFFER_SIZE);
  struct iovec receive_buffers[RECEIVE_BUFFER_NENTRIES];

  // if we had more than one receive buffer, they would all be set up here
  for (int i=0 ; i < RECEIVE_BUFFER_NENTRIES; i++){
    receive_buffers[i].iov_base = malloc(RECEIVE_BUFFER_SIZE);
    receive_buffers[i].iov_len = RECEIVE_BUFFER_SIZE;
  }
  
  // prepare the msghdr structure, and make sure it's empty (via memset)
  struct msghdr inmsg;
  memset(&inmsg, 0, sizeof(struct msghdr));

  // allocate memory for address information about the sender
  inmsg.msg_name = malloc(sizeof(struct sockaddr_storage)) ;
  inmsg.msg_namelen = sizeof(struct sockaddr_storage);

  // allocate and prepare memory for the "ancillary data"
  struct cmsghdr control_message;
  memset(&control_message, 0, sizeof(struct cmsghdr));
  inmsg.msg_control = &control_message;
  inmsg.msg_controllen = sizeof(struct cmsghdr);

  // tell the msghdr where the receive buffers are...
  inmsg.msg_iov = receive_buffers;
  // ...and how many of them are there
  inmsg.msg_iovlen = RECEIVE_BUFFER_NENTRIES ;

  // prepare a string large enough to hold an ipv4 address
  char *remote_addr = (char *)malloc(INET_ADDRSTRLEN);
  // and make sure it's clean
  memset(remote_addr, 0, INET_ADDRSTRLEN);
  
  while( 1 ) {
    // we can finally receive packets!
    int recv_size = recvmsg(sockfd, &inmsg, MSG_WAITFORONE);
    if (recv_size == -1 ) {
      perror("Something went wrong with recvmsg: ");
      continue ;
    }
    
    inet_ntop(AF_INET,
	      // this was messy to get right
	      (void*) &((struct sockaddr_in*) inmsg.msg_name)->sin_addr,
	      remote_addr,
	      INET_ADDRSTRLEN);
    printf("Received 1 packet from '%s' containing %d bytes\n", remote_addr, recv_size);

    // eh, we should not assume all the content is in the first receive buffer...
    // ... but whatever
    printf("Message in the udp packet: '%s'\n", (char *)inmsg.msg_iov[0].iov_base);

    // let's make sure this string doesn't keep data from previous operations
    memset(remote_addr, 0, INET_ADDRSTRLEN);
    
    // i should probably clean other structures, but whatever...

    // i should have some kind of exit condition here...
    //sleep(1);
  }

  shutdown(sockfd, SHUT_RDWR);
  close(sockfd);

  exit(0);
}

Testing the udp server

I'll be using the good old netcat, as I haven't written the udp client yet, and also netcat works for sure.

Compiling and running the server:

[manu@astrolabio network-programming]$ ./a.out 
Setting up receive buffer sized at 65535 bytes
Received 1 packet from '127.0.0.1' containing 12 bytes
Message in the udp packet: 'hello world!'
Received 1 packet from '172.12.3.248' containing 12 bytes
Message in the udp packet: 'hello world'

Sending udp packets with netcat:

[manu@astrolabio ~]$ echo -n 'hello world!' | nc -u 127.0.0.1 5050

UDP client

The client part has a fairly simple job: assemble an udp packet and send it.

The target ip address and the target port will be taken from command line arguments, and the payload will be read from stdin.

Needless to say, I'm capping the possible length to 64KB (the maximum size of an UDP datagram, afaik) and i'm not bothering too much with error and bounds checking (that's not really the point of this exercise, so it's low effort).

#include <stdio.h>

#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/uio.h>

#define SEND_BUFFER_SIZE  ((2<<15)-1)

int main(int argc, char **argv){

  if (argc < 3) {
    printf("usage: ./client host port\n");
    printf("payload will be read from stdin\n");
    exit(1);
  }

  char* payload = (char*) malloc(SEND_BUFFER_SIZE);
  if (fgets(payload, SEND_BUFFER_SIZE, stdin) == NULL) {
    perror("Failed to read payload from stdin");
    exit(1);
  }
  int payload_len = (int)strlen(payload);
  if (payload_len >= SEND_BUFFER_SIZE) {
    fprintf(stderr, "error: input exceeds send buffer size\n");
    exit(1);
  }

  printf("Target server: '%s'\n", argv[1]);
  printf("Target port: '%s'\n", argv[2]);
  printf("Payload: '%s'\n", payload);
  printf("Payload length: %d\n", payload_len);

  struct sockaddr_in target = {
    .sin_family = AF_INET,
    .sin_port = htons(atoi(argv[2]))
  };
  
  // translating "0.0.0.0" to network-byte-order
  if (inet_pton(AF_INET, argv[1], &(target.sin_addr)) != 1){
    printf("Failed to fill local_address structure\n");
    exit(1);
  }
  
  // instantiating an udp datagram socket on ipv4
  int sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

  struct iovec send_buffer = {
    .iov_base = payload,
    .iov_len = payload_len
  };
  
  // prepare the msghdr structure, and make sure it's empty (via memset)
  struct msghdr outmsg = {
    .msg_name = &target,
    .msg_namelen = sizeof(struct sockaddr_in),
    // .msg_control = &(struct cmsghdr){},
    // .msg_controllen = sizeof(struct cmsghdr);
    .msg_iov = &send_buffer,
    .msg_iovlen = 1,
  };

  if ((sendmsg(sockfd, &outmsg, 0)) == -1) {
    perror("sendmsg failed");
    exit(1);
  }
 
  close(sockfd);
  exit(0);
}

Interestingly, setting a cmsghdr structure in the msghdr structure for sending results in sendmsg failing with "Invalid argument" as reason (it's not a very helpful reason, by the way).

One of the most interesting things about the udp client code, for me, is that by copying from comparing to the previous code it seems to me I got a bit better at this endeavour.

Testing the UDP client

Fairly simple, as usual:

[manu@astrolabio network-programming]$ echo -n "hello world!"  | ./client 127.0.0.1 9095
Target server: '127.0.0.1'
Target port: '9095'
Payload: 'hello world!'
Payload length: 12
sendmsg failed: Invalid argument
[manu@astrolabio network-programming]$

Closing marks

I got carried away and I've written a very small sample udp server using io_uring :)

It will appear in episode three of this series, I guess.

References