Blog/Random notes about network programming - now with io uring

From ~esantoro

Ok, this was the real goal all along: getting make network traffic with io_uring.

The goals for this page are the following:

  1. Have links and reference i can go back to
  2. Have a simple, commented, udp server i can come back to
  3. Write notes and considerations

io_uring is one of the coolest and newest technologies in the Linux kernel. It's promises are essentially to provide an unified interface to make asynchronous I/O, to be simpler and deliver better performances of the other async APIs (mainly AIO, but also epoll etc).

I haven't used AIO/epoll/select etc so I don't have strong feelings in this sense.

Prerequisites

This article might be for you if you're trying to get some basic io_uring code to work and can't get it to fuc^H^H^H work. That was me until this afternoon, welcome to the club (i just left, sorry not sorry).

In terms of technical requirements, any recent GNU/Linux system should work. io_uring was introduced in Linux 5.1 IIRC, so it's been around for a while.

I just checked the releases notes for the current Debian stable and it seems that old stable ("bullseye") had Linux 5.10 and the current stable ("bookworm") has Linux 6.1. If Debian oldstable supports it... most likely the distro you're running on your laptop will support it too.

Regarding your server, RHEL9 (and derivatives) ships with Linux 5.14, so a number of things will work.

Regarding cloud images, Amazon Linux 2 is (afaik) on kernel 5.10 so most things except maybe the newest will work. Amazon Linux 2023 runs a kernel from the 6.x era, so again most things should work.

I don't have experience with other cloud providers' custom distros, so i can't really comment on those.

Generic stuff about io_uring

I'm not going to copy the description about io_uring here. Most other websites do that and it's quite boring, as you think you've found a nice article on io_uring but insted you end up reading the same things over and over again.

There are a few pictures I might copy over, just because it might be handy (for me) to have them in this article, ready to be referenced.

The important things i need to say for this article is that io_uring is mainly two components:

  • the kernel side stuff (the implementation in the Linux kernel)
  • the liburing library

The liburing library allows usage of io_uring while ignoring many details and repetitive work. There will still be repetitive work but hey, that's life.

Interestingly the liburing supports being linked into C and C++ programs out of the box.

Links and references

A small rant

io_uring is one of those things that is both greatly and poorly documented, at the same time.

It is greatly documented as in the fact that each new syscall and every library function from liburing has a manpage.

It is poorly documented in the sense that:

  • the manpages and stuff work very well if you already grokked it, not so much if you need to understand it (beyond intuitive understanding, i mean)
  • the examples in the examples/ folder of liburing aren't very helpful
    • basically no comments
    • no explaination
      • what is the code doing? help me understand how the code connect to the concepts you talk in the various talks and pdf documents
    • no reasoning about the code
      • why is the code in this example written like this?
    • they do too much stuff
    • don't really exemplify simple cases
    • too much ceremony, too much setup, too little actually io_uring stuff
  • references to liburing and references to the syscalls are intertwined
    • in sometimes in order to work with liburing you need to look up details from io_uring (the core)

Basically (as with many other stuff, sadly) you'll have to learn this from resources other than the original authors.

Actually links and references

Okay enough ranting, let's get to the useful stuff:

More on references later

A very basic udp server

This is a very basic udp server written in C using liburing.

It doesn't do anything advanced, it's not optimised for performance (it does get a single packet at the time) or even for strict correctness (i didn't bother doing proper command line argument parsing).

But it does work.

Important parts:

  • Lines 58-63:
    • io_uring initialisation: nothing fancy, we don't use flags.
    • 1024 as queue depth is largely oversized and basically chosen not to annoy me
    • very basic error checking is performed (no error recovery)
  • Lines 82-97 is where the actual stuff happens
    • get a submission queue entry (sqe)
    • prepare a sendmsg request (io_uring_prep_sendmsg)
      • we always reuse the same msghdr structure
      • since we're in a while (1) loop it would be better to use a multishot version, maybe in another revision
    • submit the submission queue entry (sqe)
    • wait for the completion queue event (cqe)
    • we get the payload from the iovec pointed in the msghdr structure
      • the very same way we would have done if we were using recvmsg regularly

Compile like this: gcc -luring udp-uring-server.c -o udp-uring-server

#include <liburing/io_uring.h>
#include <netinet/in.h>
#include <stdio.h>
#include <string.h>

#include <arpa/inet.h>
#include <sys/socket.h>

#include <stdlib.h>
#include <unistd.h>

#include <liburing.h>

int main(int argc, char **argv){
  char *binding_address = (char*) malloc(INET_ADDRSTRLEN);
  bzero(binding_address, INET_ADDRSTRLEN);
  strncpy(binding_address, "0.0.0.0", 7);
  int binding_port = 9095 ;
  
  int opt, res;

  while ((opt = getopt(argc, argv, "hb:p:")) != -1) {
    switch (opt) {
    case 'h':
      printf("Usage: uring-server -b local-addr -p local-port\n");
      printf("Use -h to show this help\n");
      exit(0);
      break ;
      
    case 'b':
      memcpy(binding_address, optarg, INET_ADDRSTRLEN);
      break;

    case 'p':
      binding_port = atoi(optarg);
      break;
    }
    
  }

  struct sockaddr_in local_address = {
    .sin_family = AF_INET,
    .sin_port = htons(binding_port),
    .sin_addr = {}
  };

  inet_pton(AF_INET, binding_address, &(local_address.sin_addr));

  int sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
  res = bind(sockfd,
	     (struct sockaddr*)&local_address,
	     sizeof(struct sockaddr_in));
  if (res == -1) {
    
  }

  printf("Will bind on %s:%d\n", binding_address, binding_port);

  struct io_uring ring;
  
  if ((res = io_uring_queue_init(1024, &ring, 0)) < 0){
    fprintf(stderr, "io_uring_queue_init: %s", strerror(-res));
    exit(1);
  }
  
  struct msghdr inmsg = {
    .msg_name = malloc(sizeof(struct sockaddr_storage)),
    .msg_namelen = sizeof(struct sockaddr_storage),
    .msg_control = malloc(sizeof(struct cmsghdr)),
    .msg_controllen = sizeof(struct cmsghdr),
    .msg_iov = &(struct iovec){
      .iov_base = malloc((2<<15)-1),
      .iov_len = (2<<15)-1
    },
    .msg_iovlen = 1
  };
  
  struct io_uring_sqe *sqe;
  struct io_uring_cqe *cqe;  
  
  while (1) {
    bzero(inmsg.msg_iov[0].iov_base, inmsg.msg_iov[0].iov_len);
    sqe = io_uring_get_sqe(&ring) ;
    io_uring_prep_recvmsg(sqe, sockfd, &inmsg, 0);
    if( (res = io_uring_submit(&ring)) == -1 ){
      fprintf(stderr, "io_uring_submit recvmsg: '%s'\n", strerror(-res));
      exit(1);
    } else {
      printf("Submitted recvmsg\n");
    }
    
    if ((res = io_uring_wait_cqe(&ring, &cqe)) != 0) {
      fprintf(stderr, "io_uring_wait_cqe: '%s'\n", strerror(-res));
      printf("io_uring_wait_cqe: '%s'\n", strerror(-res));
    }
    
    printf("Received: '%s'\n", (char*)inmsg.msg_iov[0].iov_base) ;
    io_uring_cqe_seen(&ring, cqe);
  }
  
  exit(0);
}