DPDK 02

- 12 mins read

Introduction

In this section we will write a smiple C application to recieve packets.

Let’s dive in!

PreReqs

Make sure that

  • DPDK is built

  • A DPDK compatible NIC is binded to the igb_uio driver

  • Hugepages are setup

  • Minimum system requirements are met

Please see DPDK-01 if any of the aforementioned pre-reqs are not set.

File Setup

From here on out we will refer to our dpdk installation directory as RTE_SDK.

  • Go into the your dpdk installation directory and run the following command
export RTE_SDK=$(pwd)
  • The directory should contain following folders
app/
build/
buildtools/
.ci/
config/
devtools/
doc/
drivers/
.editorconfig
examples/
.gitattributes
.gitignore
kernel/
lib/
license/
MAINTAINERS
Makefile
meson.build
meson_options.txt
README
.travis.yml
usertools/
VERSION
  • For our purposes, we are concerned with the examples directory and the usertools directory.

  • The examples directory contains the example codes and we will also create our application here

  • The usertools directory contains user-help scripts such as

    • cpu_layout.py; prints out the Architecture,
    • dpdk-devbind.py; for binding/unbinding PCI devices,
    • dpdk-hugepages.py; for setting up hugepages
  • some of these we used at the time of setup.

  • Now, go to the examples folder and create a folder for our app

cd $RTE_SDK/examples
mkdir my_app
cd my_app

MAKE File

  • lets create a simple Makefile
vim Makefile
  • we are going to name our app my_app and use main.c as the source c file
# binary name
APP = my_app

# all source are stored in SRCS-y
SRCS-y := main.c

# Build using pkg-config variables if possible
ifneq ($(shell pkg-config --exists libdpdk && echo 0),0)
$(error "no installation of DPDK found")
endif

all: shared
.PHONY: shared static
shared: build/$(APP)-shared
        ln -sf $(APP)-shared build/$(APP)
static: build/$(APP)-static
        ln -sf $(APP)-static build/$(APP)

PKGCONF ?= pkg-config

PC_FILE := $(shell $(PKGCONF) --path libdpdk 2>/dev/null)
CFLAGS += -O3 $(shell $(PKGCONF) --cflags libdpdk)
LDFLAGS_SHARED = $(shell $(PKGCONF) --libs libdpdk)
LDFLAGS_STATIC = $(shell $(PKGCONF) --static --libs libdpdk)

CFLAGS += -DALLOW_EXPERIMENTAL_API

build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
        $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)

build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
        $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)

build:
        @mkdir -p $@

.PHONY: clean
clean:
        rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
        test -d build && rmdir -p build || true

Main.c

  • Now lets move on to the main program
#include <stdio.h>
#include <stdlib.h>

#include <rte_eal.h>
#include <rte_common.h>


int main(int argc, char* argv[]){
  printf("\n");

  

  return 0;
}

EAL: “Environment Abstraction Layer”

  • The first thing that is required is initialising up the EAL. We use rte_eal_init() function.
    • It gets parameters from cli adn sets up a some of the following:
      • cpu_init: fill cpu_info structure
      • log_init
      • config_init: create memory configuration in shared memory
      • pci_init: scan pci bus
      • memory_init (hugepages)
      • memzone_init: initialize memzone subsystem
      • lcore_init: Create a thread per lcore
      • pci_probe: probel all physical devices
  • lets start by including rte_eal.h
    • contains the EAL configuration functions
    • Contains the defitnion for rte_eal_init()
      • The rte_eal_init returns -ive in case of an error.
    • See more here.
  • we will also include rte_common.h
    • It contains generic, commonly-used macro and inline function definitions for DPDK.
    • It contains the defitnion for rte_exit()
      • we would need to exit in case of an error.
    • See more here
#include <stdio.h>
#include <stdlib.h>

#include <rte_eal.h>
#include <rte_common.h>


int main(int argc, char* argv[]){
        
  printf("\n");
  int ret;

  /* The EAL arguments are passed when calling the program */   
  ret = rte_eal_init(argc,argv);
  if (ret<0)
    rte_exit(EXIT_FAILURE,"EAL Init failed\n");

  argc -= ret;
  argv += ret;
  return 0;
}

Get Port Count

  • We are going to be receiving on one port and transmitting on the other. For that we need even number of ports.
  • lets first include rte_ethdev.h in our program.
    • includes functions to setup/configure an Ethernet device.
    • Ethernet devices are represented by a generic data structure of type rte_eth_dev.
    • contains the definition of rte_eth_dev_count_avail()
      • returns the total number of dpdk binded devices.
    • see more here
  • lets do that nex within our main function
  nb_ports = rte_eth_dev_count_avail();
  if(nb_ports < 2 || (nb_ports & 1))
    rte_exit(EXIT_FAILURE,"Invalid port number\n");

Logs

  • We are going to include rte_log for logging.
    • contains log API to RTE applications.
    • contains the RTE_LOG() function we are going to use for logging.
  • define the User type logs as follows:
#define RTE_LOGTYPE_APP RTE_LOGTYPE_USER1  
  • now update the code as follows
/*...*/
nb_ports = rte_eth_dev_count_avail();
if(nb_ports < 2 || (nb_ports & 1))
  rte_exit(EXIT_FAILURE,"Invalid port number\n");
RTE_LOG(INFO, APP, "Number of ports:%u\n",nb_ports);
/*...*/

MBUF Pool Creation

We need to reserve some memory for holding the packets in our application. In this section we are going to do just that

  • We are going to include rte_mbuf.h for memory management.
    • The mbuf library provides the ability to create and destroy buffers that may be used by the RTE application to store message buffers.
    • we are going to use rte_pktmbuf_pool_create to create set up a buffer to hold packets.
    • it uses mempool library.
    • Read DPDK Docs regarding mbuf here
  • A good read about general concepts here
  • First We are going to define NUM_MBUFS and MBUF_CACHE_SIZE.
#include <rte_mbuf.h>
#define NB_MBUFS 8191
#define MBUF_CACHE_SIZE 250
#define RX_RING_SIZE 128
#define TX_RING_SIZE 512
  • Now we are going to call rte_pktmbuf_pool_create in the main function.
    • This function creates and initializes a packet mbuf pool.
    • This function uses rte_memzone_reserve() to allocate memory.
      • It reserves a portion of physical memory from hugepages
  • Required params are as follows
|---------------|-------------------------------------------------------|
| param         |                     Description                       |
|---------------|-------------------------------------------------------|
| name          | The name of the mbuf pool. we are setting it as       |
|               | "MBUF_POOL"                                           |
| n             | The number of elements in the mbuf pool. We are       |
|               | setting it as NB_MBUF * number of ports. The optimum  |
|               | size (in terms of memory usage) for a mempool is when |           
|               |  n is a power of two minus one: n = (2^q - 1).        |
| cache_size    | Size of the per-core object cache.                    | 
|               | Set to MBUF_CACHE_SIZE                                |
| priv_size     | Size of application private are between the rte_mbuf  |
|               | structure and the data buffer. Set to 0.              |
| data_room_size| Size of data buffer in each mbuf,                     |
|               | including RTE_PKTMBUF_HEADROOM.                       |
| socket_id     | The socket identifier where the memory should be      |
|               | allocated. The value can be SOCKET_ID_ANY             |
|               | if there is no NUMA constraint for the reserved zone. |
|---------------|-------------------------------------------------------|
  • If the creation is unsuccessful we are going to exit the program.
/* Create a new mbuf mempool */
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL",
    NB_MBUFS *nb_ports,
    MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
    rte_socket_id());

if (mbuf_pool == NULL)
  rte_exit(EXIT_FAILURE,"mbuff_pool create failed\n");

Ports Initialisation

  • We will create a function port_init for initialising the ports.
port_init(u_int8_t port,struct rte_mempool *mbuf_pool){

  return 0;
}
  • We will loop through all the ports and initialise them one by one.
int main(int argc, char* argv[]){
  /*...*/
  /* Initialize all ports */
  for (portid = 0; portid < nb_ports; portid++){
    if(port_init(portid,mbuf_pool) != 0)
      rte_exit(EXIT_FAILURE,"port init failed\n");
  }
  /*...*/
}
  • Now lets update the port_init function

  • First we set the rte_eth_conf

    • A structure used to configure an Ethernet port
    • We can set up Receive mode and Transmit mode flags
    • If you want to enable RSS (Receive Side Scaling) this would be the place to start.
        struct rte_eth_conf port_conf = {
          .rxmode = { .max_rx_pkt_len = RTE_ETHER_MAX_LEN 
          }
        };
  • Now we define the rx/tx queues we are going to use per port i.e. 1 per port, lcore
  const u_int16_t nb_rx_queues = 1;
  const u_int16_t nb_tx_queues = 1; 
  • Next we use the main function rte_eth_dev_configure() function to set up the ports.
    • configures the Ethernet device.
    • this function must be invoked first before any other function in the Ethernet API.
  /* configure the ethernet device */
  ret = rte_eth_dev_configure(port,
      nb_rx_queues,
      nb_tx_queues,
      &port_conf);
  • Now we allocate one RX queue per port.
  • We will use function rte_eth_rx_queue_setup()
    • The function allocates a contiguous block of memory for receive descriptors
    • Read more here
  • We will set up 1 rx queue per port
  for (q=0;q<nb_rx_queues;q++){
    ret=rte_eth_rx_queue_setup(port,q,RX_RING_SIZE,
        rte_eth_dev_socket_id(port),
        NULL, mbuf_pool);
    if (ret<0)
      return ret;
  }
  • Now we allocate one TX queue per port.
  • We will use function rte_eth_tx_queue_setup()
    • Allocate and set up a transmit queue for an Ethernet device.
    • Read more here
  • We will set up 1 tx queue per port
  for (q=0;q<nb_tx_queues;q++){
    ret=rte_eth_tx_queue_setup(port,q,TX_RING_SIZE,
      rte_eth_dev_socket_id(port),
      NULL);
    if (ret<0)
      return ret;
  }
  • All togethor now:
int port_init(u_int8_t port,struct rte_mempool *mbuf_pool){

  struct rte_eth_conf port_conf = {
        .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN }
        };
  
  const u_int16_t nb_rx_queues = 1;
  const u_int16_t nb_tx_queues = 1;
  int ret;

  /* configure the ethernet device */
  ret = rte_eth_dev_configure(port,
                nb_rx_queues,
                nb_tx_queues,
                &port_conf);
  if (ret != 0)
    return ret;
  
  /* Allocate and setup 1 RX queue per Ethernet port */
  for (q=0;q<nb_rx_queues;q++){
    ret=rte_eth_queue_setup(port,q,RX_RING_SIZE,
                        rte_eth_dev_socket_id(port),
                        NULL, mbuf_pool);
    if (ret<0)
     return ret;
  }

  /* Allocate and setup 1 RX queue per Ethernet port */
  for (q=0;q<nb_tx_queues;q++){
    ret=rte_eth_tx_queue_setup(port,q,TX_RING_SIZE,
      rte_eth_dev_socket_id(port),
      NULL);
  
    if (ret<0)
      return ret;
  }
  
  return 0;

}
  • Finally, we start the device using rte_eth_dev_start()
  • we will also enable promisuous mode rte_eth_promiscuous_enable()
    • in promiscuous mode every data packet transmitted can be received and read by a network adapter
  • Update the main function as follows
/* start the ethernet port */
ret = rte_eth_dev_start(port);

if (ret<0){
  return ret;
}

/* Enable RX in promiscuous mode for the Ethernet device */
rte_eth_promiscuous_enable(port);

  • We are going to create a new function to check the link status of the ports
  • It is going to tell us which of the links are up and which are down.
static int check_link_status(u_int16_t nb_ports)
{
  struct rte_eth_link link;
  u_int8_t port;
  for (port=0;port<nb_ports;port++){
    rte_eth_link_get(port,&link);
    if(link.link_status == ETH_LINK_DOWN){
      RTE_LOG(INFO,APP,"Port: %u Link DOWN\n",port);
      return -1;
    }
    RTE_LOG(INFO,APP,"Port: %u Link UP Speed %u\n",
        port,link.link_speed);
  }
}

  • Now we can call our function in the main()
ret = check_link_status(nb_ports);
  if ( ret < 0 ){
    RTE_LOG(WARNING,APP,"Some ports are down\n");
  }

Workers Init

  • We will create a worker function that will run all the cores
  • This worker will be responsible for
    • receiving packets
    • parsing packets
    • transmitting packets
int worker_main(void *arg){
  /* Run until app is killed or quit */
  for(;;){

  }
  return 0;
}
  • In the main program we will launch the woker_main() function on all the lcores available
  • We will use rte_eal_mp_remote_launch() for this
    • Launches a function on all lcores
    • first argument is the function name (worker_main) that needs to be launched on all cores
    • The next argrument takes in the arguments that need to be passed. Since we have nothing to pass we will set this as NULL.
    • The third arguemnt specifies whether the function needs to run on all cores
      • including MASTER/MAIN core (CALL_MAIN flag)
      • excluding MASTER/MAIN core (SKIP_MAIN flag)
  • Then we use rte_eal_mp_wait_lcore so that MASTER core waits for other worker cores.
    • This keeps the program from exiting.
    • We can print stats on master core if needed.
  ret=check_link_status(nb_ports);
  if (ret<0){
    RTE_LOG(WARNING,APP,"Some ports are down\n");
  }
  rte_eal_mp_remote_launch(worker_main,NULL,SKIP_MAIN);
  rte_eal_mp_wait_lcore();

Receive/Transmit Packets

  • Lets define the BURST_SIZE first. = This defines how many packets will the core pick up at a time.
  #define BURST_SIZE 32
  • In the worker_main we will set up struct rte_mbuf array that will be of the size BURST_SIZE
    • This will be defined by struct rte_mbuf *bufs[BURST_SIZE];
  • We will recieve packets using rte_eth_rx_burst function
    • It requires
      • port number on which to receive
      • queue number on which to receive
      • rte_mbuf structure to hold the packets
      • Number of packets to receive.
    • It will return the total number of packets received (nb_rx)
    • We will transmit the recieved packets on the opposite port.
      • Receive on 1, send on 0
      • Receive on 0, send on 1
    • For Transmitting we use the function rte_eth_tx_burst
      • port number on which to transmit
      • queue number on which to transmit
      • rte_mbuf structure that holds the packets
      • Number of packets to transmit
    • After that we will loop through packets again and free all the unsent packets
      • packets are freed using rte_pktmbuf_free() function
int worker_main(void *arg){
  const u_int8_t nb_ports = rte_eth_dev_count_avail();
  u_int8_t port;
  u_int8_t dest_port;

  /* Run until app is killed or quit */
  for(;;){
    /* Receive packets on port */
    for(port=0;port< nb_ports;port++){
      struct rte_mbuf *bufs[BURST_SIZE];
      u_int16_t nb_rx;
      u_int16_t buf;

      /* Get burst fo RX packets */
      nb_rx = rte_eth_rx_burst(port,0,
          bufs,BURST_SIZE);
      if (unlikely(nb_rx==0))
        continue;

      /* send burst of Tx packets to the 
       * second port
       */
      dest_port = port ^ 1;
      nb_tx= rte_eth_tx_burst(dest_port, 0,
          bufs, nb_rx);
      
      /* Free any unsent packets. */
      for (buf=0;buf<nb_rx;buf++)
        rte_pktmbuf_free(bufs[buf]);

    }
  }
  return 0;
}

Some Stats

  • Lets print some stats when we exit the program.
  • We’ll need to add the signal handler to first catch the interrupt.
    • First lets create a flag that we will change when the signal is caught

  static volatile bool force_quit;

  • Now the handler which updates the flag and displays the the stats
static void signal_handler(int signum){
  if(signum== SIGINT || signum== SIGTERM){
    printf("\n\nSignal %d received,preparing to exit...\n",signum);
    force_quit = true;
    print_stats();
  }
}
  • lets register
    • add following in the main function
   force_quit = false;
   signal(SIGINT,signal_handler);
   signal(SIGTERM,signal_handler);
  • update the for loop condition in worker_main
    • use while(!force_quit) instead of for(;;)
  • Lastly, lets create a function to display the stats
    • It uses the rte_eth_stats struct and rte_eth_stats_get function
    • the function takes in port_id for which the stats are required and rte_eth_stats struct
    • It fills the rte_eth_stats struct which contains
      • ipackets: packets received by the interface
      • opackets: packets sent by the interface
      • imissed: packets dropped by the interface
static void
print_stats(void){
  struct rte_eth_stats stats;
  u_int8_t nb_ports = rte_eth_dev_count_avail();
  u_int8_t port;

  for(port=0;port<nb_ports;port++){
    printf("\nStatistics for the port %u\n",port);
    rte_eth_stats_get(port,&stats);
    printf("RX:%911u Tx:%911u dropped:%911u\n",
        stats.ipackets,stats.opackets,stats.imissed);
  }
}

Now Lets Run this

  • First we build our program
  • this should have created a build directory within our directory
  make
  • We will pass our program arguments just like L2FWD.
    • -l will indicate cores
    • -p will indicate portmask
      • 0x1 indicates 1 port i.e binary mask 0001
      • 0x3 indicates 2 port i.e binary mask 0011
      • 0x7 indicates 3 port i.e binary mask 0111
  ./build/my_app -l 0-3 -n 3 -- -p 0x3
  • you should see something like
EAL: Detected 18 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL:   Invalid NUMA socket, default to 0
EAL:   Invalid NUMA socket, default to 0
EAL:   Invalid NUMA socket, default to 0
EAL:   Invalid NUMA socket, default to 0
EAL: No legacy callbacks, legacy socket not created
APP: Number of ports:2
APP: MAC address swapping enabledi
APP: Port: 0 Link UP Speed 10000
APP: Port: 1 Link UP Speed 10000
APP: Some ports are down
APP: lcore 2 exiting
APP: lcore 3 exiting
  • Press Ctrl-C to exit the program and print some stats
Signal 2 received,preparing to exit...

Statistics for the port 0
RX: 0           Tx: 13664640    dropped: 0

Statistics for the port 1
RX: 13664640    Tx: 0           dropped: 0

WORD

Meow

  • Give yourself a pat on the back you are done. (FOR NOW!)
  • We will parse a few layers in the next one.
  • I followed ferruhy/dpdk-simple-app and it was of a great help to me.
    • Please do check his repo out.
    • Infact this whole thing was inspired by this simple learning repo of his.
  • You can look at the code here
  • Checkout the DPDK documentation as well.