QEMU ivshmem Introducing interrupt capable virtual gpio concept


It's hard to underestimate the role of GPIO's, especially in ARM embedded systems. Apart of being a very popular material for starters, GPIO's provide a way for controlling the many peripheral devices, act as source of valuable interrupts, or even can be the only way to communicate the world for a SOC.

Based on my modest experience I can say that interrupts are not a very lit topic in the Linux community. Because of their nature and a very strong binding to the hardware, all educational topics that covers interrupts lack real easily reproduced example. This coincidence overshadows the fact that that in many ways and very often interrupts and GPIO's are inseparable, especially in world of embedded Linux. It makes people believe that the GPIO is very simple and boring thing (which has become so easy only through sysfs).

Even LDD3 example (snull driver) simulates irq by explicit call of paired device function. On the other hand there are examples in USFCA (http://cs.usfca.edu/~cruse/cs686s08/) courses, but they are implemented by hijacking someone else irq and are bound to Intel’s architecture.

Virtual gpio concept can solve these problems. Both from the user space, and largely on the internals, the driver is indistinguishable from the majority of existing implementations of interrupt capable general purpose I/O chip’s. The driver currently supports interrupts on falling and rising edges, and can be used as a source of interrupts for other virtual devices.

ivshmem - Inter-VM shared memory

Designed to share the shared memory (allocated on the host platform through the mechanism of POSIX shared memory API) between multiple QEMU guest processes with different platforms. To make all guest platform having access to a shared memory area, ivshmem models the PCI device providing access to memory as PCI BAR.

From the viewpoint of the virtual machine, ivshmem PCI device comprises three base address register (BAR):

  • BAR0 is a region supporting MMIO registers and interrupts for case if the MSI is not used, the size of it - one kilobyte
  • BAR1 used for MSI-X, MSI if support is enabled
  • BAR2 to access the shared memory object

This mechanism was introduced in the original Cam Macdonnell report “Nahanni - a shared memory interface for KVM” (later it became known as ivshmem), in which he put forward the following points:

  • zero-copy access to the data
  • interrupt system
  • guest/guest and host/guest interaction

and analyzed the performance in total.

In this moment, is not supported by anyone officially officially, nevertheless a great contribution to development is made by the Red Hat staff.


ivshmem can serve as a basis for simulation and debugging of many classes of devices.

In this article we examine virtual pci general-purpose input/output (GPIO) card, which is also the source of interrupts, and the appropriate driver for the provision of access and control mechanism through sysfs.


  • Qemu (do not take an earlier version)
  • linux-kernel 4.1 (sources or just headers)

For developing and testing versatilepb (virtual qemu board) was used (system ARM).

* arm-cross-toolchain
* nairobi-embedded - Guest-side ivshmem PCI device test sources


g>> - commands executed or ouput on guest.

h>> - on host.

Example and original code

For a start i will present the code, based on the original code ([https://github.com/henning-schild/ivshmem-guest-code] (https://github.com/henning-schild/ivshmem-guest-code)) and later modified by Siro Mugabi.

h>> qemu: +=  -device ivshmem,shm=ivshmem,size=1
g>> # insmod ne_ivshmem_ldd_basic.ko
ivshmem 0000:00:0d.0: data_mmio iomap base = 0xc8c00000
ivshmem 0000:00:0d.0: data_mmio_start = 0x60000000 data_mmio_len = 1048576
ivshmem 0000:00:0d.0: regs iomap base = 0xc88ee400, irq = 27
ivshmem 0000:00:0d.0: regs_addr_start = 0x50002400 regs_len = 256
g>> # ./ne_ivshmem_shm_guest_usr -w "TEST STRING"
h>> $ xxd -l 16 /dev/shm/ivshmem
0000000: 5445535420535452 494e 4700 0000 0000  TEST STRING.....

In principle, it is enough to emulate GPIO device already in this form. And in many cases it was done as is, when a rather simple input status or write to output is enough, but use of sysfs and interrupts suggest a small add-on for I/O mem.


Note that the /dev/ivshmem0 and ne_ivshmem_shm_guest_usr.c are no longer needed, as all work with the device from the guest machine from user space (user-space) will be carried by sysfs interface means.

Before marking our device in the memory, it should be noted that we just duplicate the scheme which is applied in most GPIO drivers.

All inputs/output pins are divided into ports usually 8, 16, 32 inputs/outputs each. Each port has, at least, the inputs status register (GPIO_DATA), direction register, if in/out switching is supported (GPIO_OUTPUT). Then (if there is a support in the device), the interrupt status register, register on the rising edge interrupt and a falling edge and level (high and low). The is usually one hardware interrupt supplied by the general interrupt controller (GIC) and is divided between all the port entrances.

Examples of existing implementations with comments

Sitara am335x

more known as being part of beaglebone

Developer: Texas Instruments
Documentation: AM335x Sitara™ Processors Technical Reference Manual (page 4865)
Corresponding gpio driver: linux/drivers/gpio/gpio-omap.c
Corresponding header: linux/include/linux/platform_data/gpio-omap.h
Number of inputs/outputs: 128 (4 gpio ports - 32 pins each)

am335x Sitara gpio register table - port A

Register name Offset Name in driver Description
GPIO_IRQSTATUS_0 0x02С OMAP4_GPIO_IRQSTATUS_0 Provides interrupt status information
GPIO_IRQSTATUS_1 0x030 OMAP4_GPIO_IRQSTATUS_1 Provides interrupt status information
GPIO_OE 0x134 OMAP4_GPIO_OE Output enable
GPIO_DATAIN 0x138 OMAP4_GPIO_DATAIN Input/Output status
GPIO_DATAOUT 0x13C OMAP4_GPIO_DATAOUT Set (low/high) output state
GPIO_LEVELDETECT0 0x140 OMAP4_GPIO_LEVELDETECT0 Enable/disable low level interrupt
GPIO_LEVELDETECT1 0x144 OMAP4_GPIO_LEVELDETECT1 Enable/disable high level interrupt
GPIO_RISINGDETECT 0x148 OMAP4_GPIO_RISINGDETECT Enable/disable rising edge interrupt
GPIO_FALLINGDETECT 0x14С OMAP4_GPIO_FALLINGDETECT Enable/disable falling edge interrupt

Notes: GPIO_IRQSTATUS_N also used for IRQ ack. Bounce management, and wake up are out of the scope in this article.

Availability of GPIO_CLEARDATAOUT and GPIO_SETDATAOUT registers besides GPIO_DATAOUT register, as well as GPIO_IRQSTATUS_SET_N and GPIO_IRQSTATUS_CLR_N beside GPIO_IRQSTATUS_N, Is explained in two ways of writing outputs status:

  • Standard: Reading and writing of whole register
  • Set and clear (recommended by SOC developer) : two corresponding register is used to set and clear the appropriate contact as a output, the same applies to the interrupt management.


Developer: Cirrus Logic
Documentation: EP9301 User’s Guide (page 523)
Corresponding gpio driver: linux/drivers/gpio/gpio-ep93xx.c
Corresponding header: linux/arch/arm/mach-ep93xx/include/mach/gpio-ep93xx.h
Number of inputs/outputs: 56 (7 gpio ports - 8 pins each)

ep9301 gpio register table - port A

Register name Offset Name in driver Description
PADR 0x00 EP93XX_GPIO_REG(0x0) high/low value status and control register
PADDR 0x10 EP93XX_GPIO_REG(0x10) input/output direction status and control register
GPIOAIntEn 0x9C int_en_register_offset[0] interrupts control register
GPIOAIntType1 0x90 int_type1_register_offset[0] level/edge interrupt type control register
GPIOAIntType2 0x94 int_type2_register_offset[0] sets high/rising or low/falling depending on interrupt type set
GPIOAEOI 0x98 eoi_register_offset[0] acknowledges edge triggered interrupt
IntStsA 0xA0 EP93XX_GPIO_A_INT_STATUS interrupt status

* 7 ports with 8, 8, 1, 2, 3, 2, 4 I/O are available and only the first, second and fifth ports have interrupt registers available.
* Only the port A is presented in the table. Other ports have different register names.
* One drawback of ep9301, is that the type both of interrupt is not supported in hardware, the driver is switched during the interrupt response in handler.
* On port F, each contact has its own interrupt


Last example: pci board Bt848, which has gpio pins.

Developer: Intel
Documentation: Bt848/848A/849A (page 68)
Corresponding gpio driver: linux/drivers/gpio/gpio-bt8xx.c
Corresponding header: linux/drivers/media/pci/bt8xx/bt848.h
Number of inputs/outputs: 24

Bt848 is a video adquisition board.

Bt848 gpio register table

Register name Offset Name in driver Description
BT848_GPIO_OUT_EN 0x118 BT848_GPIO_OUT_EN input/output direction status and control register
BT848_GPIO_DATA 0x200 BT848_GPIO_DATA high/low value status and control register

Bt848 is not interrupt capable. Only 2 registers available - status and in/out.

Marking up our device in the memory.

First - we allocate space for direction and state management.

Let the unit has 8 general purpose inputs/outputs, then the memory layout is the following:

Register name Offset Name in driver Description
DATA 0x00 VIRTUAL_GPIO_DATA high/low value status and control register
OUTPUTEN 0x01 VIRTUAL_GPIO_OUT_EN input/output direction status and control register

gpio interface quick reference


A quick reference on used struct gpio_chip fields used by our driver.

struct gpio_chip {
  /* gpio_chip name */
  const char *label;
  /* pointer to "set as input" function */
  int (*direction_input)(struct gpio_chip *chip, unsigned offset); 
  /* get pin state */
  int (*get)(struct gpio_chip *chip, unsigned offset); 
  /* pointer to "set as output" function */
  int (*direction_output)(struct gpio_chip *chip, unsigned offset, int value); 
  /* pointer to "set pin state" function */
  void (*set)(struct gpio_chip *chip, unsigned offset, int value);     
  /* number of first pin in port, dynamic if value equal to -1 */
  int base;
  /* pins number */
  u16 ngpio; 

Original source code:
linux-kernel 4.1

Pin state on setting as output

It is worth noting int value parameter in direction_output function pointer, that serves as a back end for /sys/class/gpio/gpioN/direction file. Which in its turn accepts not only "in"/"out" values but also "high"/"low", passed as value function parameter (this simple fact, for some reason, it is rarely mentioned in the manuals for beginners).

g>> /sys/class/gpio # echo low > gpio0/direction
g>> /sys/class/gpio # cat gpio0/value

g>> /sys/class/gpio # echo high > gpio0/direction
g>> /sys/class/gpio # cat gpio0/value

Dynamic int base acquisition and ARCH_NR_GPIOS legacy

Historically, the number of GPIO system was limited by ARCH_NR_GPIOS parameter, equal 256 by default, and increased to 512 later (version 3.18).

Its meaning is simple enough, the system can't have more GPIO's, than the value of ARCH_NR_GPIOS, if the planned amount was greater than the number by default, It was overridden in the respective target platform file.

The reason of such behavior was definition gpio_desc name-space as a static array and each provided base was checked against ARCH_NR_GPIOS.

static struct gpio_desc gpio_desc[ARCH_NR_GPIOS];

The GPIO chips themselves were defined as static in corresponding platform files, e.g. :


#define EP93XX_GPIO_BANK(name, dr, ddr, base_gpio)                       \
       {                                                                 \
               .chip = {                                                 \
                       .label             = name,                        \
                       .direction_input   = ep93xx_gpio_direction_input, \
                       .direction_output  = ep93xx_gpio_direction_output,\
                       .get               = ep93xx_gpio_get,             \
                       .set               = ep93xx_gpio_set,             \
                       .dbg_show          = ep93xx_gpio_dbg_show,        \
                       .base              = base_gpio,                   \
                       .ngpio             = 8,                           \
               },                                                        \
               .data_reg       = EP93XX_GPIO_REG(dr),                    \
               .data_dir_reg   = EP93XX_GPIO_REG(ddr),                   \

static struct ep93xx_gpio_chip ep93xx_gpio_banks[] = {
       EP93XX_GPIO_BANK("A", 0x00, 0x10, 0),
       EP93XX_GPIO_BANK("B", 0x04, 0x14, 8),
       EP93XX_GPIO_BANK("C", 0x08, 0x18, 40),
       EP93XX_GPIO_BANK("D", 0x0c, 0x1c, 24),
       EP93XX_GPIO_BANK("E", 0x20, 0x24, 32),
       EP93XX_GPIO_BANK("F", 0x30, 0x34, 16),
       EP93XX_GPIO_BANK("G", 0x38, 0x3c, 48),
       EP93XX_GPIO_BANK("H", 0x40, 0x44, 56),

Since 3.19 this static arrays was replaced by dynamic arrays for each GPIO chip, allocated on gpiochip_add (version 3.19).

Nevertheless ARCH_NR_GPIOS is still there (version 4.6), used for dynamic base allocation.

/* dynamic allocation of GPIOs, e.g. on a hotplugged device */
static int gpiochip_find_base(int ngpio);

The parameter base of gpio_chip structure may be defined as -1, in this case a base is assigned to the first free end of the range, i.e. ngpio to 8 base will be equal to 248 when ARCH_NR_GPIOS equal to 256 (ARCH_NR_GPIOS - ngpio).


Defining functions of our device

vgread and vgwrite are just a simple macros on top of iowrite8 and ioread8:

#define vgwrite(dat, adr)  iowrite8((dat), vg->data_base_addr+(adr))
#define vgread(adr)         ioread8(vg->data_base_addr+(adr))

Set the corresponding pin as an input:

static int virtual_gpio_direction_input(struct gpio_chip *gpio, unsigned nr)
   struct virtual_gpio *vg = to_virtual_gpio(gpio);
   unsigned long flags;
   u8 outen, data;

   spin_lock_irqsave(&vg->lock, flags);

   data = vgread(VIRTUAL_GPIO_DATA);
   data &= ~(1 << nr);
   vgwrite(data, VIRTUAL_GPIO_DATA);

   outen = vgread(VIRTUAL_GPIO_OUT_EN);
   outen &= ~(1 << nr);
   vgwrite(outen, VIRTUAL_GPIO_OUT_EN);

   spin_unlock_irqrestore(&vg->lock, flags);

   return 0;

Reading the current status of the pin:

static int virtual_gpio_get(struct gpio_chip *gpio, unsigned nr)
   struct virtual_gpio *vg = to_virtual_gpio(gpio);
   unsigned long flags;
   u8 data;

   spin_lock_irqsave(&vg->lock, flags);
   data= vgread(VIRTUAL_GPIO_DATA);
   spin_unlock_irqrestore(&vg->lock, flags);

   return !!(data & (1 << nr));

Set the corresponding pin as an output:

static int virtual_gpio_direction_output(struct gpio_chip *gpio, unsigned nr, int val)
   struct virtual_gpio *vg = to_virtual_gpio(gpio);
   unsigned long flags;
   u8 outen, data;

   spin_lock_irqsave(&vg->lock, flags);

   outen = vgread(VIRTUAL_GPIO_OUT_EN);
   outen |= (1 << nr);
   vgwrite(outen, VIRTUAL_GPIO_OUT_EN);

   data = vgread(VIRTUAL_GPIO_DATA);
   if (val)
       data |= (1 << nr);
       data &= ~(1 << nr);
   vgwrite(data, VIRTUAL_GPIO_DATA);

   spin_unlock_irqrestore(&vg->lock, flags);

   return 0;

Set pin current state:

static void virtual_gpio_set(struct gpio_chip *gpio, unsigned nr, int val)
   struct virtual_gpio *vg = to_virtual_gpio(gpio);
   unsigned long flags;
   u8 data;

   spin_lock_irqsave(&vg->lock, flags);

   data = vgread(VIRTUAL_GPIO_DATA);

   if (val)
       data |= (1 << nr);
       data &= ~(1 << nr);

   vgwrite(data, VIRTUAL_GPIO_DATA);

   spin_unlock_irqrestore(&vg->lock, flags);

Registration of our driver as a gpio_chip device:

static void virtual_gpio_setup(struct virtual_gpio *gpio)
   struct gpio_chip *chip = &gpio->chip;

   chip->label = dev_name(&gpio->pdev->dev);
   chip->owner = THIS_MODULE;
   chip->direction_input = virtual_gpio_direction_input;
   chip->get = virtual_gpio_get;
   chip->direction_output = virtual_gpio_direction_output;
   chip->set = virtual_gpio_set;
   chip->dbg_show = NULL;
   chip->base = modparam_gpiobase;
   chip->ngpio = VIRTUAL_GPIO_NR_GPIOS;
   chip->can_sleep = 0; // gpio never sleeps!

Passing module parameter gpiobase as a base value for our gpio_chip

Note: Such behavior is strongly discouraged since 4.2.

static int modparam_gpiobase = -1; /* dynamic */
module_param_named(gpiobase, modparam_gpiobase, int, 0444);
MODULE_PARM_DESC(gpiobase, "The GPIO base number. -1 means dynamic, which is the default.");

Probing and testing module

h>> $ rm /dev/shm/ivshmem 

h>> Adding parameters to qemu launch command line  += -device ivshmem,shm=ivshmem,size=1

g>> # ls /sys/class/gpio/
export    unexport

g>> # insmod virtual_gpio_basic.ko
PCI: enabling device 0000:00:0d.0 (0100 -> 0102)
ivshmem_gpio 0000:00:0d.0: data_mmio iomap base = 0xc8a00000
ivshmem_gpio 0000:00:0d.0: data_mmio_start = 0x60000000 data_mmio_len = 1048576
ivshmem_gpio 0000:00:0d.0: regs iomap base = 0xc88e6400, irq = 27
ivshmem_gpio 0000:00:0d.0: regs_addr_start = 0x50002400 regs_len = 256

g>> # ls /sys/class/gpio/
export       gpiochip248  unexport

g>> # cat /sys/class/gpio/gpiochip248/label

g>> # cat /sys/class/gpio/gpiochip248/base

g>> # cat /sys/class/gpio/gpiochip248/ngpio

g>> # rmmod virtual_gpio_basic
Unregister virtual_gpio device.

g>> # insmod virtual_gpio_basic.ko gpiobase=0
g>> # ls /sys/class/gpio/
export     gpiochip0  unexport

g>> # echo 0 > /sys/class/gpio/export
g>> # echo high > /sys/class/gpio/gpio0/direction

A simple check:

h>>  $ xxd -b -l 2 -c 2 /dev/shm/ivshmem
0000000: 00000001 00000001  ..


Adding interrupt generation support

Interrupt registers markup and basic irq handling

Note: Only EDGEDETECT_RISE and EDGEDETECT_FALL are considered in the virtual driver.

Note: Please us qemu version above 2.5.0 or qemu-linaro. Ivshmem interrupts are broken badly in 2.5.0 and simply doesn't work in some versions under 2.5.0. Or you can use the following patch ( http://lists.gnu.org/archive/html/qemu-stable/2015-12/msg00034.html ) for 2.5.0.

The following registers should be added:

Register name Offset Name in driver Description
INTERRUPT_EN 0x01 VIRTUAL_GPIO_INT_EN Enables interrupt for given input
INTERRUPT_ST 0x02 VIRTUAL_GPIO_INT_ST Interrupt state register
INTERRUPT_EOI 0x03 VIRTUAL_GPIO_INT_EOI End-of-interrupt notification register
EDGEDETECT_RISE 0x04 VIRTUAL_GPIO_RISING Enable/disable rising edge interrupt
EDGEDETECT_FALL 0x05 VIRTUAL_GPIO_FALLING Enable/Disable falling edge interrupt

The following function is responsible for processing interrupt's generated by pci bus, at the moment its only role of below code is notifying of received interrupt:

 static irqreturn_t virtual_gpio_interrupt(int irq, void *data)
       u32 status;             

       struct virtual_gpio *vg = (struct virtual_gpio *)data;

       status = readl(vg->regs_base_addr + IntrStatus);

       if (!status || (status == 0xFFFFFFFF))
               return IRQ_NONE;

       printk(KERN_INFO "VGPIO: interrupt (status = 0x%04x)\n", status); 

       return IRQ_HANDLED;

For interrupt capability testing, an external daemon is required -ivshmem-server, which is included in standard qemu distribution. A path to UNIX socket is added to qemu command line, message exchange between qemu guests and host machine is done via evenfd system.

h>> $ ivshmem-server -v -F -p ivshmem.pid -l 1M
# launching qemu with new command line parameters
h>> $ += -chardev socket,path=/tmp/ivshmem_socket,id=ivshmemid -device ivshmem,chardev=ivshmemid,size=1,msi=off

g>> # echo 8 > /proc/sys/kernel/printk
g>> # insmod virtual_gpio_basic.ko

h>> $ ivshmem-client
# each qemu guest registers itself within ivshmem-server and receives unique id
cmd> int 0 0

# Note: listing available commands is done via cmd> help

# guest machine kernel message:

g>> VGPIO: interrupt (status = 0x0001)

irq_chip and chained_interrupt concept

We're not going to dive into the details as this topic have been covered in irq_chip introducing patch, kernel documentation and in the "Professional Linux Kernel Architecture" book (it is quite outdated as well as LDD3, but irq_chip is also not a new thing).

All we currently need to know, that GPIO chips providing interrupts cascaded from parent interrupt controller are quite common in linux kernel nowadays.

This is why part of GPIO driver responsible for providing interrupts is done on the top of irq_chip. In other words such a driver uses two subsystems at the same time: gpio_chip and irq_chip.

A shallow look on irq subsystem gives us the following picture:

High-Level Interrupt Service Routines (ISRs) — Perform all necessary work caused by the interrupt on the device driver’s (or some other kernel component’s) side. If, for instance, a device uses an interrupt to signal that some data have arrived, then the job of the high-level ISR could be to copy the data to an appropriate place.

Interrupt Flow Handling — Takes care of handling the various differences between different interrupt flow types like edge- and level triggering. Edge-triggering means that hardware detects an interrupt by sensing a difference in potential on the line. In level-triggered systems, interrupts are detected when the potential has a specific value — the change in potential is not relevant. From the kernel viewpoint, level-triggering is more complicated because, after each interrupt, the line must be explicitly set to the potential that indicates ‘‘no interrupt.’’

Chip-Level Hardware Encapsulation — Needs to communicate directly with the underlying hardware that is responsible to generate interrupts at the electronic level. This layer can be seen as some sort of ‘‘device driver‘‘ for interrupt controllers.

As we can see kernel takes care of specific implementation and difference in flow types (edge and level), if provided appropriate infrastructure.

IRQ Domains

IRQ Domain subsystem introduced in irq: add irq_domain translation infrastructure patch allowed to separate local interrupt interrupt controller number from interrupt number in the kernel providing global irq numbers pool. Citing official documents, in modern versions of the kernel: "Nowdays IRQ number is just a number".

Prior to that patch hardware irq numbers were mapped 1:1 and cascaded interrupts were not supported. By hardware irq numbers is assumed a local to interrupt controller irq number, which in our case coincides with local gpio number.

IRQ Domain provides following types of mapping:

  • Linear domain mapping
  • Tree mapping
  • And "No map" mapping

Since our interrupt vector is small and we really have no interest in "No map" mapping, our mapping is linear i.e. interrupts are mapped 1:1, but with an offset.

To each irq function a pointer to struct irq_data is passed, where irq_data->irq is linux kernel irq number, and irq_data->hwirq is our local irq number, often it is also called hardware interrupt number in linux driver context. Also, unsurprisingly, there is a pointer to our driver structure struct virtual_gpio.

irq_chip and gpio_chip binding

If we aimed older kernel versions we would have to use irq_domain_add_simple for irq mapping but with gpio: add IRQ chip helpers in gpiolib patch there is no no need to use IRQ Domain interface directly.

So instead of invoking IRQ Domain functions and providing .map() ops for irq mapping, we will use gpiochip_irqchip_add and gpiochip_set_chained_irqchip functions (theses function depend on GPIOLIB_IRQCHIP Kconfig parameter).

A great example for understanding easiness and a way of usage, as well as migrating, is gpio-pl061 driver.

Binding our irq_chip to the existing gpio_chip:


handle_edge_irq is on of the built-in flow handlers which takes care of interrupt flow control.

Note: edge triggered interrupts are the most common on modern hardware. The main difference from level triggered interrupts is that no masking is required in the beginning of handle routine - there is no such need.


By invoking gpiochip_set_chained_irqchip we are telling kernel that our irq_chip sits on the top PCI device interrupt and our irq's are cascaded from parent interrupt, i.e. pdev->irq.

Reworking our interrupt handler irqreturn_t virtual_gpio_interrupt to invoke interrupt depending on our interrupt register state gives us the following:

pending = vgread(VIRTUAL_GPIO_INT_ST);
/* check if irq is really raised */
if (pending)
    for_each_set_bit(i, &pending, VIRTUAL_GPIO_NR_GPIOS)       
        generic_handle_irq(irq_find_mapping(vg->chip.irqdomain, i));

irq_find_mapping is a helper to translate our local irq number to linux kernel irq space.

Putting all together

First of all our irq chip interface will look like:

static struct irq_chip virtual_gpio_irq_chip = {
    .name           = "GPIO",
    .irq_ack        = virtual_gpio_irq_ack,
    .irq_mask       = virtual_gpio_irq_mask,
    .irq_unmask     = virtual_gpio_irq_unmask,
    .irq_set_type   = virtual_gpio_irq_type,

ack() function is closely related to the hardware specific of the interrupt controller. Some devices require confirmation for each interrupt request to subsequent requests can be serviced.

static void virtual_gpio_irq_ack(struct irq_data *d)
   unsigned long flags;
   u8 nr = d->hwirq;
   u8 mask = 1 << nr;

   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
   struct virtual_gpio *vg = to_virtual_gpio(gc);

   spin_lock_irqsave(&vg->lock, flags);
   vgwrite(mask, VIRTUAL_GPIO_INT_EOI);
   spin_unlock_irqrestore(&vg->lock, flags);

In this case, a quite rough emulation of eoi register is used in vg_get_set program. After issuing the interrupt status flag, eoi register is constantly polled in a cycle. When the notification bit is set to notify us that interrupt was processed by driver, we perform zeroing of eoi register and clear interrupt state.

Masking and unmasking is made make by writing the corresponding value to INTERRUPT_EN the register.

Interrupt masking:

static void virtual_gpio_irq_mask(struct irq_data *d)
  u8 mask;
  unsigned long flags;
  u8 nr = d->hwirq;

  struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
  struct virtual_gpio *vg = to_virtual_gpio(gc);

  spin_lock_irqsave(&vg->lock, flags);
  mask = vgread(VIRTUAL_GPIO_INT_EN);
  mask &= ~(1 << nr);
  vgwrite(mask, VIRTUAL_GPIO_INT_EN);
  spin_unlock_irqrestore(&vg->lock, flags);

Interrupt unmasking:

static void virtual_gpio_irq_unmask(struct irq_data *d)
  u8 mask;
  unsigned long flags;
  u8 nr = d->hwirq;

  struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
  struct virtual_gpio *vg = to_virtual_gpio(gc);

  spin_lock_irqsave(&vg->lock, flags);
  mask = vgread(VIRTUAL_GPIO_INT_EN);
  mask |= (1 << nr);
  vgwrite(mask, VIRTUAL_GPIO_INT_EN);
  spin_unlock_irqrestore(&vg->lock, flags);

irq_type sets the type of interrupt trigger supported by interrupt controller, currently kernel supports the following types:

  • IRQ_TYPE_NONE - no irq
  • IRQ_TYPE_EDGE_RISING - rising edge irq
  • IRQ_TYPE_EDGE_FALLING - falling edge irq
  • IRQ_TYPE_EDGE_BOTH - both edges irq
  • IRQ_TYPE_LEVEL_HIGH - high level
  • IRQ_TYPE_LEVEL_LOW - low level
static int virtual_gpio_irq_type(struct irq_data *d, unsigned int type)
  unsigned long flags;

  struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
  struct virtual_gpio *vg = to_virtual_gpio(gc);

  u8 mask;
  u8 nr = d->hwirq;

  spin_lock_irqsave(&vg->lock, flags);
  switch (type) {
      mask = vgread(VIRTUAL_GPIO_RISING);
      mask |= (1 << nr);
      vgwrite(mask, VIRTUAL_GPIO_RISING);

      mask = vgread(VIRTUAL_GPIO_FALLING);
      mask &= ~(1 << nr);
      vgwrite(mask, VIRTUAL_GPIO_FALLING);
      mask = vgread(VIRTUAL_GPIO_FALLING);
      mask |= (1 << nr);
      vgwrite(mask, VIRTUAL_GPIO_FALLING);

      mask = vgread(VIRTUAL_GPIO_RISING);
      mask &= ~(1 << nr);
      vgwrite(mask, VIRTUAL_GPIO_RISING);
      retval = -EINVAL;
      goto end;

  /* enable interrupt */
  mask = vgread(VIRTUAL_GPIO_INT_EN);
  mask &= ~(1 << nr);
  vgwrite(mask, VIRTUAL_GPIO_INT_EN);

  spin_unlock_irqrestore(&vg->lock, flags);
  return retval;

Testing and results

For user-space interrupts testing we will use vg_guest_client utiltily. According gpio sysfs kernel documentation : “If you are using select monitoring interrupts set appropriate descriptor in exceptfds”.

Corresponding code:

 maxfd = 0;

 for(i = 0; i < gpio_size; i++)
     FD_SET(gpios[i].fd, &efds);
     maxfd = (maxfd < gpios[i].fd) ? gpios[i].fd : maxfd;

 ready = pselect(maxfd + 1, NULL, NULL, &efds, NULL, NULL);

 if(ready > 0)
     for(i = 0; i < gpio_size; i++)
         if(FD_ISSET(gpios[i].fd, &efds)) {
             read(gpios[i].fd, &value, 1);
 /* for explanation of lseek see http://lxr.free-electrons.com/source/fs/kernfs/file.c?v=4.1#L769 */ 
             if(lseek(gpios[i].fd, 0, SEEK_SET) == -1)
             printf("gpio number=%d interrupt caught\n", gpios[i].number);

Preparing gpios via sysfs:

g>> # echo 504 > /sys/class/gpio/export
g>> # echo 505 > /sys/class/gpio/export
g>> # echo 506 > /sys/class/gpio/export
g>> # echo rising > /sys/class/gpio/gpio504/edge
g>> # echo rising > /sys/class/gpio/gpio505/edge
g>> # echo rising > /sys/class/gpio/gpio506/edge

Note: gpio are initialized as inputs by default in most cases.

g>> # ./vg_guest_client 504
       base: 504
       ngpio: 8
Added gpio 504 to watchlist.
Added gpio 505 to watchlist.
Added gpio 506 to watchlist.
Entering loop with 3 gpios.

h>> $ ./vg_get_set -p 1 -i 0
g>> gpio number=504 interrupt caught

Finally, for those who are interested, the chain from our irq handler to sysfs notification looks like below:

static irqreturn_t virtual_gpio_interrupt (int irq, void *data)
int generic_handle_irq(unsigned int irq);
static irqreturn_t gpio_sysfs_irq(int irq, void *priv);
static inline void sysfs_notify_dirent(struct kernfs_node *kn);
void kernfs_notify(struct kernfs_node *kn);
static void kernfs_notify_workfn(struct work_struct *work);


This article is meant to me to serve as the basis for the subsequent material, presentation of which is difficult or even impossible without some general introduction. Qemu together with ivshmem served excellent, easily understandable basis for this purpose. The main reason for choosing exactly this bundle, is the presence of sane documentation and application transparency.

Driver himself, despite the unconditional educational value, is far from ideal in current state in terms of modern kernel context. For such simple driver it is worth considering the use of generic-gpio driver, written for ease and for avoiding of duplicating code in mmio gpio driver implementation, although it is not so easily understandable. Also interrupt processed notification technique could be more elegant for this driver.

Nevertheless, working with gpio sysfs is the same on all devices with gpio sysfs support implemented, any instruction on the use of the general purpose inputs/outputs can be successfully transferred to another device, as intended in the design of the interface. All the differences end at the level of implementation of the specific device driver.

Based on this driver following, certainly worthy of study, topics can be covered and explained:

  • Integration with Device Tree and usage
  • Building gpio drivers on top of generic-gpio driver
  • Implementation on the top of non-typical bases, such as ADC
  • Special gpio drivers such as leds, buttons, power and reset functions

Nor should we lose sight of the gpiolib recent changes - sysfs gpio is deprecated. New ioctl based interface is on the way to become new standard for gpio interaction. But legacy kernels are still rolling, i have boards still running 2.6.34 kernel version.

List of materials:

  1. http://nairobi-embedded.org/category/device-drivers.html [Siro Mugabi]
  2. http://lxr.free-electrons.com/source
  3. Professional Linux Kernel Architecture [Wolfgang Mauerer]
  4. LDD3 [Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman]

Materials recommended for review:

  1. http://derekmolloy.ie/writing-a-linux-kernel-module-part-1-introduction/ (3 parts)
  2. https://developer.ridgerun.com/wiki/index.php?title=Gpio-int-test.c
  3. http://www.assert.cc/2015/01/03/selects-exceptional-conditions.html

Source codes, Makefile and README: