On peripherals and abstractions

I’ve been working on a C++ embedded hardware abstraction layer as a side project for a little while and am trying to make use of as many zero-cost abstractions as possible. The ultimate goal is to have a low-level yet still easy-to-use HAL which lets me take full advantage of C++’s advanced type system and higher level abstractions.

An interesting question came up when I was working on implementing the UART driver for my chip as a testbed. Clearly the abstraction we wish to expose is one where bytes can eventually be read from and written to the UART peripheral, but how should the function signatures look?

The “obvious” answer is a very conventional fixed-size read or write operation, like this:

void read(uint8_t* data, std::size_t size);
void write(const uint8_t* data, std::size_t size);

This is instantly recognizable to any programmer as a request to read (up to) size bytes into data and write size bytes from data from/to the UART peripheral. It works, obviously, but consider that this is what every single higher-level component will have to deal with. Even if the protocols they implement have absolutely nothing to do with passing fixed-size byte streams through a pipe, they still need to operate using fixed-size byte reads and writes… why, exactly?

For instance, consider the simplest protocol of them all, where you send textual commands over UART separated by newlines. This should be trivial, and it is already a huge pain with this interface, as you basically need to issue very inefficient little one-byte read commands which all involve preparing the UART FIFO, getting an interrupt when some data is received, and popping exactly one byte from the FIFO. You decide to “optimize” this, so now you have to read the largest command you expect, find the newline, parse the received command, and carefully append incoming data to the remaining bytes on the next read. Or you can chunk them, making things even more complex.

Is this really the abstraction we wish to impose on everything that is going to be building on top of UART and possibly other byte-oriented peripherals? Suddenly it doesn’t seem so obvious.

Iterators

I briefly considered using the C++ iterator pattern instead. This is certainly better than passing raw memory buffers, in the sense that it is strictly a superset of the latter. However, it solves a different problem of where the bytes should be read or written in memory (e.g. ring buffer instead of a plain array) as opposed to the issue described above which is of a more transactional nature, namely when do read/write operations end, and under which conditions?

Callbacks

After thinking about this for a while, it became increasingly clear that the only way forward was to give the calling code control over the read or write process, sort of like a callback. This callback could then read incoming bytes (or write out bytes it wants to send) and also indicate when the operation should complete.

We wish, of course, to use C++ lambdas, because plain C callbacks would be quite tedious to use, and also dreadfully inefficient due to their limited inlining capabilities by the compiler.

template <typename T> async_t read(T callback);
template <typename T> async_t write(T callback);

Here async_t represents an object that can be waited on, kind of like a promise or a future and T is the lambda’s type, automatically deduced when instantiating one of the functions.

As an example of usage, we could have something like this:

char buffer[10];
std::size_t pos = 0;

read([&](uint8_t byte){
  buffer[pos++] = byte;
  return (byte == '\n');
}).wait();

This would read from the UART interface until a newline is encountered, at which point it would stop, the async_t returned would complete/fulfill/resolve and the .wait() call would return. Easy! Furthermore, you could extend this so that the caller could return failure, you could have the callback return as many bytes as are available for improved efficiency, and so on. The point is that the caller is not limited by the semantics of the UART’s interface, because it is the one defining them. More significantly, it means that any higher-level abstraction like “write this buffer” or “read until this pattern is encountered” can be implemented as a function that returns an appropriate functor. For instance:

auto read_until(char* buffer, std::size_t& length, uint8_t pattern) {
  return [buffer, &length, pattern](uint8_t byte) {
    buffer[length++] = byte;
    return byte == pattern;
  };
}

The astute reader will observe that the char* buffer here could be substituted with an iterator to further enhance this functor pattern (also this needs bounds checking, but you get the idea).

Implementation

This is nice and all, but how do we go about implementing this on a microcontroller? It is not exactly trivial, but not incredibly hard either. There are two major problems to overcome:

the lambda is not in the interrupt handler’s scope, so the lambda must be kept alive;
the lambda’s type T must somehow be made available within the interrupt handler;

For the first issue it suffices to note that the async_t object returned needs to outlive the lambda it is connected to, by necessity. Therefore the solution is to store the lambda on the async_t object itself. This is actually efficient, because if the lambda is inlined in the read/write call, it is a temporary and can be moved into the async_t, which itself will likely benefit from RVO.

Now, this async_t object is the one which is going to be handling interrupt requests for the UART, so we need to get the IRQ to call into the active async_t instance (there can only be one at a time, since UART only has one channel). That means transforming it into a function pointer with an associated this pointer. In order to retain type information, we can use a static helper conceptually similar to std::mem_fn except it’s an actual function instead of a functor:

template <typename T, void (T::*Fn)(Args...)>
static auto member_function(void* object, Args&& ...args) {
  return ((static_cast<T*>(object))->*Fn)(std::forward<Args>(args)...);
}

Now you can store member_function<async_t, &async_t::interrupt> and &my_async in static variables and call async_t::interrupt from the interrupt handler. This is done in:

the async_t constructor
the async_t destructor (optional, for bookkeeping)
the async_t move constructor and assignment operator, if you define them

The last point is important: suppose the async_t is moved while the read/write is in progress, then we will need to update the this pointer for the interrupt handler to call into the new one. This is fine and can easily be made atomic if it isn’t already, however there is one critical gotcha: if the lambda captures by value, then the this pointer and the lambda must be moved atomically, unfortunately. There is no problem if the lambda captures everything by reference, however, and you don’t have to provide move semantics for this object if you don’t want to.

Performance

In terms of performance overhead, this solution is pretty good; you do have to pay one indirect function pointer call in the static interrupt handler, however after that you jump inside your async_t object’s IRQ handler which has complete compile-time type information about the lambda and can fully inline it.

Memory-wise it is also quite respectable; the static function pointer uses up a few bytes of statically allocated memory, however beyond that you don’t pay anything more than what your lambda uses. Since the lambda is never copied and goes out of scope with its async_t object, its memory is reclaimed as soon as the program acknowledges that the read/write is complete.