Speed up Rust Builds with Cachepot-- 1195 Words
One of the most effective ways for speeding up Rust builds is to cache the compiled artifacts of crate dependencies. Cargo does this automatically for local builds, but this quickly breaks down for distributed scenarios.
In this post, I will share my experiences with configuring and using Cachepot, a tool which wraps the Rust compiler and automatically caches build artifacts using a variety of cloud storage options. This creates a cache which can be shared amongst teams, used in ephemeral CI/CD environments, and even used for distributed builds.
When building Rust crates locally, the simplest option is to use Cargo’s built-in caching functionality. In some cases this approach can also be used in CI/CD pipelines. Many CI/CD tools allow you to manually specify cache paths, and preserve their contents across builds. A few tools are able to configure this automatically, such as the Rust Cache action for GitHub Actions.
Unfortunately Cargo’s local cache is not intended to be used in a distributed
fashion. Additionally, depending on how your CI/CD tool persists these files, it
may not be possible to use them in parallel builds. These limitations likely
drove Mozilla to begin work on
sccache in late 2016.
sccache was designed to be similar to
ccache, but with support for
Rust and cloud storage backends. At the time of this writing,
become a mature project, and appears to be fairly well known, but is somewhat
notoriously difficult to configure. Recently, the pace of development appears to
have slowed, and a number of critical updates have not been accepted.
One important update is the support of Amazon’s
signature version 4
for authenticating requests to private S3 buckets. Regions built after 2013 only
support version 4, and Amazon
stopped supporting version 2
in the remaining regions for buckets created after June 24, 2020. As a result,
it is impossible to use
sccache with buckets created today. Several PRs have
attempted to fix this issue by switching to the
rusoto crate, however
was blocked in favor of waiting for the
new official AWS SDK
sccache is still considered to be actively developed,
Parity Technologies has forked the project under the
name Cachepot. This effort appears to have started around April 30, 2021 and be
lead by Igor Matuszewski (@Xanewok) and Bernhard
Schuster (@drahnr). In addition to the S3 patch
mentioned above, Parity says that Cachepot includes “improved security
properties and improvements all-around the code base”, which they share upstream
when possible. Given the impasse I had reached with
sccache, I decided to
give Cachepot a try. For simplicity, I will refer only to Cachepot, but many of
the features I will describe here are
sccache features which Cachepot
Installing Cachepot can be done easily using Cargo:
For use in a CI/CD pipeline, it is preferable to install pre-compiled binaries, however the repository does not currently include these in releases. This appears to be coming soon, but for now I had to install locally and then upload the resulting binary to S3 for use in my pipeline.
Cachepot supports a number of backends, including AWS, GCP, and Azure object storage, Redis and Memcached, as well as local storage. Since I use AWS CodeBuild, an S3 bucket in the same region seemed to be the best choice.
- Create an S3 bucket in the region you are building in.
- Create a new IAM user and access credentials for your build jobs.
- Create an IAM policy for your user, which grants access to the bucket:
Note: If you are using
docker build inside of CodeBuild, it may be tempting
to use the IAM role attached to your build job. This
can be done
, but you must pass the
variable as a Docker build argument. Unfortunately, this completely invalidates
your Docker cache! For this reason, I chose to use static credentials.
Now that you have Cachepot installed and storage provisioned, you can use it by setting just a few environment variables:
That’s it! I believe a lot of people (myself included) are thrown off by parts of the documentation related to starting the Cachepot server. This is not necessary (especially in CI), because Cachepot will automatically start the server if it is not already running.
You can verify that Cachepot is running most easily by checking that files have been written to your S3 bucket.
If you are having trouble getting Cachepot to work, you can enable debug logging
by setting the environment variable
CACHEPOT_LOG=debug. This is the best you
can do in CI, but it does not paint the full picture because the most important
logs (such as S3 authentication errors) will be from the server which is running
as a daemon. To view these, you can build locally, with Cachepot server running
in the foreground:
There are a few things to keep in mind when using Cachepot:
- The absolute path of the build directory must match for cache hits to occur. If you wish to share a cache between local and CI, or across developer machines, be sure to use the same absolute path in all cases.
- Cachepot cannot cache “linked” crates, such as “bin” and “proc-macro” crates. Dependencies will never include binary crates, but might include some proc-macro crates.
- Cachepot does not support incremental compilation. In my experience, this is not an issue because the primary goal is to cache dependencies, which are (almost) never compiled incrementally.
- You may want to disable the use of debug information (
-C debuginfo=0), to reduce the size of the cached artifacts, and reduce upload/download time.
Note that incremental compilation and debug information are already disabled
--release profile, so if you are only building a release binary in CI,
then you may not have to make any changes here!
sccache) can seem daunting to set up at first, but offers
significant performance improvements in ephemeral build environments. For even
more caching goodness, I was able to use Cachepot with
cargo chef, right out of the
box! I have now been using Cachepot for several weeks, and have not run
into a single issue, with a reduction in average build times of 60%!
For persistent build servers (with plenty of memory), the Redis or Memcached backends can offer an even greater boost in performance. Finally, if you are interested in distributed builds, check out the distributed quickstart guide , which seems ripe for a follow-on post involving Kubernetes!