Stop worrying about blocking: the new async-std runtime, inspired by Go

Note: This blog post describes a proposed scheduler for async-std that did not end up being merged for several reasons. We still believe that the concept holds important insight and may circle back to it in the future, so we keep it online as it.

async-std is a mature and stable port of the Rust standard library to its new async/await world, designed to make async programming easy, efficient, worry- and error-free.

We announced async-std on August 16th - exactly 4 month ago. Our focus for the initial release was providing a stable and reliable API for users to build async applications on, modelled after Rust standard library. It came with a number of innovative implementations: The first implementation of a JoinHandle based task API and single-allocation tasks.

Today, we’re introducing the new async-std runtime. It features a lot of improvements, but the main news is that it eliminates a major source of bugs and performance issues in concurrent programs: accidental blocking.

In summary:

  • The new runtime is really fast and outperforms the old one.
  • The new runtime is universal in the sense that it adapts to different workloads automatically, becoming single-threaded or multi-threaded on demand. If a single thread can handle all the work, we don’t pay the price of work-stealing.
  • The new runtime is conceptually simpler, by removing the split between the non-blocking and the blocking threadpool.
  • The new runtime detects blocking automatically. We don’t need spawn_blocking anymore and can simply deprecate it.
  • The new runtime makes blocking efficient. Rather than always paying the cost of spawn_blocking, we only offload blocking work to a separate thread if the work really blocks.

The new task scheduling algorithm and the blocking strategy are adaptations of ideas used by the Go runtime to Rust.

The problem of blocking

Until today, a constant source of frustration when writing async programs in Rust was the possibility of accidentally blocking the executor thread when running a task. To avoid that problem, async-std used to provide the unstable spawn_blocking function that moves potentially blocking work onto a special thread pool so that the executor thread can keep making progress.

However, spawn_blocking is not a bulletproof solution because we have to remember to invoke it manually every time we expect to block. But it’s very difficult to even reliably predict what kind of code can block. Programmers have to carefully separate async and blocking code, which is an infamous problem discussed by the blog post titled What Color Is Your Function.

Separation of code gets even harder when you consider that operations that may block the current executor thread include plain expensive computations, such as transforming an image or sorting lots of data. Conversely, sometimes we’ll pessimistically assume some code blocks while in practice it may be very quick, which means we’re paying the price of spawn_blocking even when don’t have to.

With the new async-std runtime, such difficulties become a thing of the past: should a task execute for too long, the runtime will automatically react by spawning a new executor thread taking over the current thread’s work. This strategy eliminates the need for separating async and blocking code.

A concrete example

To illustrate what all this means in practice, let’s use std::fs::read_to_string as an example of a blocking operation. The async version of it used to be implemented as follows:

async fn read_to_string<P: AsRef<Path>>(path: P) -> io::Result<String> {
    let path = path.as_ref().to_owned();
    spawn_blocking(move || std::fs::read_to_string(path)).await
}

Note two crucial things:

  • We call spawn_blocking to isolate the blocking operation.
  • We always clone the path and execute the blocking operation on a separate threadpool, even if the file is cached and will be read in an instant.

The new runtime relieves you of these concerns and allows you to do the blocking operation directly inside the async function:

async fn read_to_string(path: impl AsRef<Path>) -> io::Result<String> {
    std::fs::read_to_string(path)
}

The runtime measures the time it takes to perform the blocking operation and if it takes a while, a new thread is automatically spawned and replaces the old executor thread. That way, only the current task is blocked on the operation rather than the whole executor thread or the whole runtime. If the blocking operation is quick, we don’t spawn a new thread and therefore no additional cost is inflicted.

If you still want to make sure the operation runs in the background and doesn’t block the current task either, you can simply spawn a regular task and do the blocking work inside it:

async fn read_to_string(path: impl AsRef<Path>) -> io::Result<String> {
    let path = path.as_ref().to_owned();
    spawn(async move { std::fs::read_to_string(path) }).await
}

Note the use of spawn instead of spawn_blocking.

Web frameworks commonly need asynchronous I/O while performing substantial work per request. The new runtime enables you to fearlessly use synchronous libraries like diesel or rayon in any async-std application.

Benchmarks

In our initial tests, the new scheduler performs better then our old one, while still staying small and well understandable.

The following benchmark was run on two EC2 instances . A minihttp server based on the old runtime (master branch) and the new runtime (new-scheduler branch) runs on an m5a.8xlarge instance, while wrk (a benchmark tool) runs on a separate m5a.16xlarge instance.

The benchmark has three different scenarios with different arguments passed to wrk. Option -t configures the number of threads, -c configures the number of TCP connections, and -d configures the duration of benchmark in seconds. See the readme for more details on how to run the benchmark.

Graph of different benchmarks. wrk -t1 -c50 -d10: new scheduler is 2 times faster. wrk -t10 -c50 -d10: new scheduler is 5 times faster. wrk -t1 -c500 -d10: new scheduler is 15 times faster

The new runtime is faster in general and scales way better to take available resources.

Small and well-documented

The new runtime is small, uses no unsafe and is documented. Please take a look at the source to see how it works. Feel free to ask questions on the pull request! There are still plenty of optimization opportunities and we will continue to blog about those details!

Trying it out

To try out the new scheduler before release, modify your Cargo.toml this way:

async-std = { git = 'https://github.com/stjepang/async-std', branch = 'new-scheduler' }

Please report on your experiences - and report potential bugs!

Summary

The new async-std runtime relieves programmers of the burden of isolating blocking code from async code. You simply don’t have to worry about it anymore.

The adaptive nature of the new runtime allows it to use less resources when multithreading does not bring any benefit. This improves performance in CLI tools and lowers the latency in web servers. At the same time, the runtime will scale up to use all available resources during intense workloads.

All these changes make it easier to write async programs while also making them more efficient and reliable!

We would like to thank all contributors to async-std, big and small, new and long-term and all the library authors building great stuff for the Rust async ecosystem!


Announcing async-std 1.0

async-std is a port of Rust’s standard library to the async world. It comes with a fast runtime and is a pleasure to use.

We’re happy to finally announce async-std 1.0. As promised in our first announcement blog post, the stable release coincides with the release of Rust 1.39, the release adding async/.await. We would like to thank the active community around async-std for helping get the release through the door.

The 1.0 release of async-std indicates that all relevant API is in place. Future additions will be made on these stable foundations.

Why async-std?

There are five core values behind building async-std:

Stability

The Rust async ecosystem has been in flux and has seen a lot of churn during the last three years. async-std takes the experiences gained during that time, especially out of building crossbeam and tokio and wraps them into a package with strong stability guarantees. We are committed to lowering churn in the fundamental parts of the ecosystem.

Ergonomics

async-std should be easy to use and understandable, providing a clear path to solve problems at hand. async-std does so by relying on familiar and proven interfaces from the standard library, combined with an API surface that solves all concerns related to async/.await with one dependency.

Accessibility

async-std is an accessible project. As a start, it comes with full documentation of all functions, along with a book. We welcome contribution and especially like to assist people in writing additional supporting libraries.

Integration

async-std wants to integrate well into the wider ecosystem and is compatible with all libraries based on futures-rs. We believe futures-rs is the cornerstone of async Rust ecosystem because it allows implementation of libraries independent of executors.

Speed

async-std does not compromise on speed by shipping a fast executor that will be constantly improving over time and tweaked with incoming production feedback. async-std's goal is to ship an executor that gives great performance out of the box without the need for tuning.

Stability guarantees

What does 1.0 mean? It means that all API that is not feature gated is now publicly committed and documented and users are encouraged to rely on its stability. We will continue to add features in following releases over the coming week.

These improvements will follow familiar patterns: a period of being feature gated through the unstable feature and then stabilisation.

Due to language changes coming down the line (mostly async closures and async streams), there is a high likeliness of a 2.0 release in the future. In this case, the 1.0 line will continue being maintained while we'll provide upgrade instructions for smooth transition.

Highlights of async-std

Easy to get started

The async-std interface makes it easy to start writing async programs because it uses a familiar API. This is the classic file-reading example from the stdlib:

use std::fs::File;
use std::io::{self, Read};

fn read_file(path: &str) -> io::Result<String> {
    let mut file = File::open(path)?;
    let mut buffer = String::new();
    file.read_to_string(&mut buffer)?;
    Ok(buffer)
}

With async-std​, all that’s needed is replace std​ with async_std​, add the prelude​, and sprinkle in a few .await​s:

use async_std::prelude::*;
use async_std::fs::File;
use async_std::io;

async fn read_file(path: &str) -> io::Result<String> {
    let mut file = File::open(path).await?;
    let mut buffer = String::new();
    file.read_to_string(&mut buffer).await?;
    Ok(buffer)
}

The only other addition is the prelude import.

The task system

async-std comes with an innovative task system found in the async_std::task module, shipping with an interface similar to std::thread.

use async_std::task::{self, JoinHandle};
use std::time::Duration;

fn main() -> io::Result<()> {
    task::block_on(async {
        let checking: JoinHandle<()> = task::spawn(async {
            task::sleep(Duration::from_millis(1000)).await;
        });

        checking.await?;
    });
}

The JoinHandle makes it easy to spawn tasks and retrieve their results in a uniform fashion. Also, it allocates every task in one go, this process is quick and efficient. JoinHandles themselves are future-based, so you can use them for directly waiting for task completion.

Futures-aware sync module

async-std ships with a number of futures-aware types in the async_std::sync module. An example:

use async_std::sync::{Arc, Mutex};
use async_std::task;

let m1 = Arc::new(Mutex::new(10));
let m2 = m1.clone();

task::spawn(async move {
    *m1.lock().await = 20;
}).await;

assert_eq!(*m2.lock().await, 20);

Note the await after lock. The main difference between the futures-aware Mutex and the std one is that locking becomes an awaitable operation - the task will be descheduled until the lock is available.

A fully documented API surface

async-std comes with complete documentation of all available modules. We invite you to take a close look around and learn the finer details! With async/.await stable now, we want to make sure that you are fully informed on how to use it.

We also offer a book, which we will continuously expand.

The best way to use futures-rs

async-std relies on futures-rs for interfacing with other libraries and components. async-std re-exports traits Stream, AsyncRead, AsyncWrite, AsyncSeek in its standard interface. It fully relies on futures-rs to define its types.

All async-std types can be used both directly as well as through the generic interfaces, making it play well with the general ecosystem. For an example of how library development on async-std could look like, have a look at async-tls, a TLS library that works with any futures-rs-compatible library.

Benchmarks

Over the last weeks, we got a lot of requests for comparative benchmarks. We believe there is currently a hyperfocus on benchmarks over ergonomics and integration in some Rust spaces and don’t want to enter the benchmark game. Still, we think it is useful for people to know where we currently stand, which is why we wanted to publish some rough comparative numbers. Posting benchmarks usually leads to other projects improving theirs, so see those numbers as the ballpark we are playing in.

Mutex benchmarks

The speed of our concurrent structures can be tested against a number of implementations. Please note that especially futures-intrusive gives some options, so we tested against a similarly tuned Mutex.

async_std::sync::Mutex:

contention    ... bench:     893,650 ns/iter (+/- 44,336)
create        ... bench:           4 ns/iter (+/- 0)
no_contention ... bench:     386,525 ns/iter (+/- 368,903)

futures_intrusive::sync::Mutex with default Cargo options and with is_fair set to false:

contention    ... bench:   1,968,689 ns/iter (+/- 303,900)
create        ... bench:           8 ns/iter (+/- 0)
no_contention ... bench:     431,264 ns/iter (+/- 423,020)

tokio::sync::Mutex:

contention    ... bench:   2,614,997 ns/iter (+/- 167,533)
create        ... bench:          24 ns/iter (+/- 6)
no_contention ... bench:     516,801 ns/iter (+/- 139,907)

futures::lock::Mutex:

contention    ... bench:   1,747,920 ns/iter (+/- 149,184)
create        ... bench:          38 ns/iter (+/- 1)
no_contention ... bench:     315,463 ns/iter (+/- 280,223)

async_std::sync::Mutex is much faster under contention - at least 2x faster than all other implementations - while keeping a similar performance to all competitors under no contention.

Task benchmarks

The benchmarks test the speed of:

  • Tasks spawning other tasks
  • Tasks sending a message back and forth
  • Spawning many tasks
  • Spawning a number of tasks and frequently waking them up and shutting them down
name           tokio.txt ns/iter  async_std.txt ns/iter speedup

chained_spawn  123,921            119,706              x 1.04
ping_pong      401,712            289,069              x 1.39
spawn_many     5,326,354          3,149,276            x 1.69
yield_many     7,640,958          3,919,748            x 1.95

async-std is up to twice as fast as tokio when spawning tasks.

You can find the benchmark sources here: https://github.com/matklad/tokio/

Run them using:

$ git checkout async-std-1.0-bench
$ cargo bench --bench thread_pool
$ cargo bench --bench async_std

NOTE: There were originally build issues with the branch of tokio used for these benchmarks. The repository has been updated, and a git tag labelled async-std-1.0-bench has been added capturing a specific nightly toolchain and Cargo.lock of dependencies used for reproduction

Summary

We present these benchmarks to illustrate that async-std does not compromise in performance. When it comes to the core primitives, async-std performance is as good or better than its competitors.

Note that these are microbenchmarks and should always be checked against behaviour in your actual application. For example, an application with low contention on mutexes will not benefit from their performance.

Recognition

Since our release, we had 59 people contributing code, documentation fixes and examples to async-std. We want to specifically highlight some of them:

  • taiki-e for keeping dependencies up to date, setting up continuous integration, and writing amazing crates like pin-project that make writing async libraries so much easier
  • k-nasa for work contributing stream combinators and a lot of other pull requests
  • montekki for implementing stream combinators and bringing Stream close to parity with Iterator
  • zkat for early testing, benchmarks, advice, and cacache, the first library written on top of async-std
  • sunjay for authoring almost 60 FromStream implementations, making our collect method as easy to use as std's.
  • Wassasin for work on streams and implementing the path module.
  • dignifiedquire for early testing, continuous feedback, implementing some async trait methods, as well as core async primitives such as Barrier.
  • felipesere for their work on stream adapters.
  • yjhmelody for their work on stream adapters.

Thank you! ❤

Upcoming Features

Many teasing new features are currently behind the unstable feature gate. They are mainly there for final API review, and can be used in production.

Fast channels

async-std implements fast async MPMC (Multiple Producer, Multiple Consumer) channels based on the experience gained in crossbeam.

use std::time::Duration;

use async_std::sync::channel;
use async_std::task;

let (s, r) = channel(1);

// This call returns immediately because there is enough space in the channel.
s.send(1).await;

task::spawn(async move {
    // This call will have to wait because the channel is full.
    // It will be able to complete only after the first message is received.
    s.send(2).await;
});

task::sleep(Duration::from_secs(1)).await;
assert_eq!(r.recv().await, Some(1));
assert_eq!(r.recv().await, Some(2));

MPMC channels solve all important use cases naturally, particularly also multiple producer, single consumer use-cases.

All async-std channels are bounded, which means the sender has to wait with sending if the channel is over capacity, leading to natural backpressure handling.

More task spawning APIs

The task module has been extended with the spawn_blocking and yield_now functions which are now up for stabilisation.

spawn_blocking allows you to spawn tasks which are known to be blocking the currently running thread (which is the current executor thread).

yield_now allows long running computations to actively interrupt themselves during execution, giving up time to other concurrent tasks cooperatively.

Conclusion

In this post, we have presented the ergonomics and performance characteristics of async-std, as well as its stability guarantees. We want to spend the next few weeks with the following tasks:

  • Holidays
  • Stabilizing unstable APIs at a regular cadence
  • Fill remaining API gaps
  • Extending the book, especially around general usage patterns
  • Starting to work on additional ecosystem libraries, for example async-tls

async-std is funded by Ferrous Systems and Yoshua Wuyts personally. ​If you'd like to support further development and keep it sustainable, we have an OpenCollective page. Thank you!

We're incredibly happy to bring async-std to stability. We hope you enjoy building on top of it as much as we enjoyed building it! async-std is a step forward for async/.await ergonomics in Rust and enables you to build both fast and maintainable asynchronous Rust programs.


  • 1 of 2