Embassy is a project to make async/await a first-class option for embedded development

For more information and instructions to


Embassy is the next-generation framework for embedded applications. Write safe, correct and energy-efficient embedded code faster, using the Rust programming language, its async facilities, and the Embassy libraries.

Documentation - API reference - Website - Chat

Rust + async ❤️ embedded

The Rust programming language is blazingly fast and memory-efficient, with no runtime, garbage collector or OS. It catches a wide variety of bugs at compile time, thanks to its full memory- and thread-safety, and expressive type system.

Rust's async/await allows for unprecedently easy and efficient multitasking in embedded systems. Tasks get transformed at compile time into state machines that get run cooperatively. It requires no dynamic memory allocation, and runs on a single stack, so no per-task stack size tuning is required. It obsoletes the need for a traditional RTOS with kernel context switching, and is faster and smaller than one!

Batteries included

  • Hardware Abstraction Layers - HALs implement safe, idiomatic Rust APIs to use the hardware capabilities, so raw register manipulation is not needed. The Embassy project maintains HALs for select hardware, but you can still use HALs from other projects with Embassy.

    • embassy-stm32, for all STM32 microcontroller families.
    • embassy-nrf, for the Nordic Semiconductor nRF52, nRF53, nRF91 series.
  • Time that Just Works - No more messing with hardware timers. embassy::time provides Instant, Duration and Timer types that are globally available and never overflow.

  • Real-time ready - Tasks on the same async executor run cooperatively, but you can create multiple executors with different priorities, so that higher priority tasks preempt lower priority ones. See the example.

  • Low-power ready - Easily build devices with years of battery life. The async executor automatically puts the core to sleep when there's no work to do. Tasks are woken by interrupts, there is no busy-loop polling while waiting.

  • Networking - The embassy-net network stack implements extensive networking functionality, including Ethernet, IP, TCP, UDP, ICMP and DHCP. Async drastically simplifies managing timeouts and serving multiple connections concurrently.

  • Bluetooth - The nrf-softdevice crate provides Bluetooth Low Energy 4.x and 5.x support for nRF52 microcontrollers.

  • LoRa - embassy-lora supports LoRa networking on STM32WL wireless microcontrollers and Semtech SX127x transceivers.

  • USB - embassy-usb implements a device-side USB stack. Implementations for common classes such as USB serial (CDC ACM) and USB HID are available, and a rich builder API allows building your own.

  • Bootloader and DFU - embassy-boot is a lightweight bootloader supporting firmware application upgrades in a power-fail-safe way, with trial boots and rollbacks.

Sneak peek


Examples are found in the examples/ folder seperated by the chip manufacturer they are designed to run on. For example:

  • examples/nrf run on the nrf52840-dk board (PCA10056) but should be easily adaptable to other nRF52 chips and boards.
  • examples/stm32xx for the various STM32 families.
  • examples/rp are for the RP2040 chip.
  • examples/std are designed to run locally on your PC.

Running examples

  • Setup git submodules (needed for STM32 examples)

  • Install probe-run with defmt support.

  • Change directory to the sample's base directory. For example:

  • Run the example

For example:

Developing Embassy with Rust Analyzer based editors

The Rust Analyzer is used by Visual Studio Code and others. Given the multiple targets that Embassy serves, there is no Cargo workspace file. Instead, the Rust Analyzer must be told of the target project to work with. In the case of Visual Studio Code, please refer to the .vscode/settings.json file's rust-analyzer.linkedProjectssetting.

Minimum supported Rust version (MSRV)

Embassy is guaranteed to compile on the latest stable Rust version at the time of release. It might compile with older versions but that may change in any new patch release.

Several features require nightly:

  • The #[embassy::main] and #[embassy::task] attribute macros.
  • Async traits

These are enabled by activating the nightly Cargo feature. If you do so, Embassy is guaranteed to compile on the exact nightly version specified in rust-toolchain.toml. It might compile with older or newer nightly versions, but that may change in any new patch release.

Why the name?

EMBedded ASYnc! :)


This work is licensed under either of

  • Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
  • MIT license (LICENSE-MIT or #404)

at your option.


Collection of the latest Issues




Hi everyone!

We're trying to set up an interrupt for flash programming with a Nucleo-f767zi, to be triggered when when the flash has finished writing (the BSY bit is cleared). So far we've managed to make it work, but it only triggers once.

For now, we're trying the following:

    • Unmask the flash interrupt in the NVIC (using cortex-m).
    • Unlock the flash.
    • Write in the cr register and enable the EOPIE (End Of Operation Interrupt Enable).
    • Erase the flash. (This should trigger and exception interrupt and run the ISR, and it does).
    • Write to the flash. (This also should trigger an interrupt, and run the ISR, but it only triggers the interrupt).

We know the interrupt is triggered because we are checking via a print.

  • The first interrupt fires and prints as expected.
  • When the write finished, we print and it works, but the ISR is not called.
  • We can see that the exception is pending though. (via cortex-m::NVIC::is_pending(embassy_stm32::pac::Interrupt::FLASH);).
  • We can also see that the EOP bit (end of operation) is set to 1 after both erase and write, as expected.

Is there something we're missing? All the help is appreciated. Thanks for the help!!




It appears the math is wrong for the timeout in UarteWithIdle on the NRF. It attempts to calculate the time for 40bits of idle but it ends up about half of that, at least for 115200 baud.

At 115200 baud the current timeout is 2788 cycles. Calculated with 0x8000_0000 / (30801920 / 40); 30801920 is Baudrate::BAUD115200 as a u32.

The timeout should be about 5,555 cycles if my math is correct. Math detailed below:

  • 115200 baud is 115200 bits per second
  • take 1 / 115200 to get the seconds per bit
  • multiply by 40 to get the total time in seconds
  • multiply by the timer frequency of 16 million cycles / second to get the number of cycles
  • This should be equal to 5,555.5 cycles

I noticed this because the uarte would drop bytes randomly during usage. I increased the timeout by a massive amount (to 100_000 cycles) to see if it fixed the issue and it did. Looking at the math this seems to be the culprit.




We're looking to implement the required traits/executors for Espressif chips, do you have any pointers on what we'll need to implement in our HAL's?

So far it looks like we'll need:

  • Embassy executor for RISCV arch.
  • Embassy executor for Xtensa arch.
  • Add embedded-hal alpha traits.
  • Implement embedded-hal-async where appropriate.



We want to add some form of first-class double-buffering support, to allow endless streaming of data.

Example use cases

  • Streaming samples from ADC
  • Streaming samples to DAC
  • Streaming to/from I2S/SAI


  1. Support double-buffering.
  2. Gap between transfers must be as small as possible (Ideally none, max the latency of an irq. The latency of a wake is too much.)
  3. There must not be UB even if irqs are delayed arbitrarily long (DMA must not wrap around and start overwriting the slice the user code is touching)

How to do this?

Satisfying the requirements is tricky. 3 essentially means we can't use DMA modes that "wrap around by default". For example, with circular buffer you might do this:

Start read onto a buffer in circular mode
Loop {
    Wait for HTIE, this means the 1st half is filled
    Hand the 1st half to the user, they process it
    Wait for TCIE, this means the 2nd half is filled
    Hand the 2nd half to the user, they process it

However, if user takes too long to process the 1st half, DMA might wrap around and overwrite it from under them -> UB.

Unfortunately I believe it's "fundamentally impossible" to wrap DMA circular mode in a safe rust API :'(

The way we use DMA has to be something like "start writing to buf1, queue a write to buf2. When you're done with buf1 or buf2 tell me. but DO NOT wrap around back to buf1 until I tell you to do so", so if user code takes too long, DMA just stops (and maybe loses data) but there's no UB.

Idea 1: use M0AR/M1AR

There's some interesting ideas around on how to use M0AR/M1AR for this: writing a "poison" address to the next buffer (like 0xFFFF_FFFF) to get DMA to error and stop, then overwrite the poison with the real addr when it's safe to continue.

I'm not sure if this actually works in practice, or if it does it avoid UB in all cases.


  • Only works when the two bufs have the same length. Hardware has 2 addr regs but only 1 len reg :(
  • and only on chips with M0AR/M1AR (F2, F4, F7, H7, L5)

Idea 2: transfer queuing

Add a way in trait Channel to queue transfers. You start one transfer, queue the next. When a transfer finishes, the IRQ handler starts the next transfer if queued. DMA stops if there's no queued transfer.

This allows code (e.g. the ADC hal) to:

  • Start transfer to buf1
  • Queue transfer to buf2
  • When buf1 is filled, hand it to user code, then queue it again
  • When buf2 is filled, hand it to user code, then queue it again
  • Repeat

If user code is slow or IRQs are delayed, DMA loses data but there's no UB.


  • Time gap is the irq latency, it's not zero.

Original discussion in Matrix




As mention by @Dirbaio in #640, writing large amounts of data (n > FORCE_COPY_BUFFER_SIZE > EASY_DMA_SIZE) with methods that use EasyDMA (SPIM, UARTE, TWIM) will yield a panic. This is related to the fact that the internal buffers used can not hold the amount of data passed in.

Both issues can be remedied by adding another layer on-top which splits the data into chunks and transmits those individually.

Originally posted by @Dirbaio in https://github.com/embassy-rs/embassy/issues/640#issuecomment-1049510130




We are using a W5500 ethernet controller, which uses a slightly different SPI protocol. Instead of only supporting 1/2/4 byte read and writes, it also supports N-byte read and writes. Likely to make it more efficient to send/receive bigger network packets. For example, sending a packet might look like this:

Reading incoming data looks similar, where the master sends 00 bytes and the W5500 returns a byte of packet data for each 00 byte.

If I want to send a packet, and only send a write operation, the operation seems to hang because there are unread bytes in the receive buffer. My current workaround is just reserving a large buffer and doing a read_write, and then discarding the buffer which filled up with 00 bytes:

I was wondering if it'd be possible to add a method (if there isn't one already that I missed) to only write bytes, while ignoring the bytes sent in response.




Currently blocking impls are only enabled when creating drivers with NoDma. Or they are always enabled, but they don't use DMA, which is confusing.

Instead, we should make it so that drivers can work in blocking mode with DMA too. Blocking operations would start DMA then spin-wait for it to finish.


  1. data keeps being copied even while the current thread is preempted.
  2. allows mixing blocking and async calls, can improve code size in some cases.



Recently we've unified most basic drivers like usart, spi across different peripheral versions, adding cfg's as needed. It turned out to be a good thing, considerably reducing code duplication.

i2c v1 and v2 look quite different, but perhaps it's still worth it to try to unify them.




I believe our driver has the same bug.


ryan-summers We found an issue in the H7 where the ethernet peripheral has by-default enabled counters that trigger an ISR whenever they're half full, so after you get 2^31 good packets, the ISR flips and your ETH handler now never exits ryan-summers That was a fun one to debug when it only failed after ~5 days of continuous heavy data livestreaming :^) ryan-summers There's a workaround in the H7 HAL now, I think it got released in 0.11 ryan-summers Well, by workaround I mean we disable the ISRs ryan-summers Ref https://github.com/stm32-rs/stm32h7xx-hal/pull/295




I'm seeing a weird problem when I try to interface two STM32 Nucleo boards using a UART.

  • The application consists of a server and a client that sends strings to each other in non-blocking mode. (https://github.com/titanclass/embassy-start)

  • I have wired up the boards with common ground and just an RX/TX cable. It's one Nucleo H7432zi board and one F767zi board.

  • I'm using the USART2 and DMA.

When running in debug mode I see sporadic errors, sometimes I see the server receiving the string with a leading zero. Sometimes the client receives nothing.

I have been using the trace debug level to check what data that goes in and out, and the receiving side seems to get a bit of garbage, the data looks fine when sending it.

When running in release mode, or in blocking mode, I have not seen these errors.

It feels like some timing error or some baud rate error.



Client: embassy-client




It might be useful to read only the data that the BufferedUart accumulated, without waiting for any data to come in.

I have found the following way to do this:

It basically creates a no-op waker, polls the read_byte future one time and drops it afterwards, canceling the async operation. It works, but requires quite a lot of boilerplate and executor-agnostic feature of embassy, which ¿might? produce less efficient code.

It would be nice to have a function purposed for reading the buffer without actually touching the uart hardware




It loops forever here

while self.check_and_clear_error_flags()?.sb() == i2c::vals::Sb::NOSTART {}

Probably because the chip doesn't confirm that it has generated START condition when no pullups are used?

This probably can be handled in a way similar to the way stm32f1xx-hal does it, always having a bound on a busy wait. Maybe even port the implementation :thinking:




Currently one can pass any DMA channel to Uart::new, no matter the hardware requirements. Though no DMA functions be available on the returned type, as uart::{Write, Read} trait implementations do constrain DMA channels to be correct.

The current situation is not ideal IMO, compile-time error would be nicer. Not sure if it can be expressed through rust type system though




There's two watchdogs WDT0, WDT1, while the existing driver assumes only one named WDT. Needs some refactor, should be easy.

Information - Updated May 27, 2022

Stars: 679
Forks: 94
Issues: 56


Boilerplate für Embedded Entwicklung mit Rust mit Flash und Debug automatik


Embedded rust HAL (hardware abstraction layer) for the STM32WL

This is a work in progress, it is unstable, incomplete, and (mostly) untested

Embedded rust HAL (hardware abstraction layer) for the STM32WL

Embedded Rust Template

This template is based on stm32f4xx-hal

Embedded Rust Template

A Rust embedded-hal HAL for all MCUs in the STM32 F7 family

This crate is largely inspired by the awesome work done here:

A Rust embedded-hal HAL for all MCUs in the STM32 F7 family

For use with the AnyLeaf pH and RTD sensors in Rust on embedded systems, and...

For use with the AnyLeaf pH and RTD sensors in Rust on embedded systems, and single-board computers

For use with the AnyLeaf pH and RTD sensors in Rust on embedded systems, and...

An embedded rust no_std driver for the AHT20 temperature and humidity sensor, forked from Anthony...

An embedded rust no_std driver for the AHT20 temperature and humidity sensor, forked from Anthony Romano's docs

An embedded rust no_std driver for the AHT20 temperature and humidity sensor, forked from Anthony...


Hardware abstraction layer - abstraction layer


cargo-pio = Cargo + PlatformIO

Build Rust embedded projects with PlatformIO!

cargo-pio = Cargo + PlatformIO
Facebook Instagram Twitter GitHub Dribbble