Rust项目使用musl部署时的内存分配器优化

以前使用 Go 开发 Web 应用时，我很喜欢使用 Alpine 镜像作为部署镜像，因为它的体积非常小。Alpine 默认使用 musl 作为 C 标准库实现。

而在使用 rust 做开发的时候，我并没有实际使用过 alpine 镜像。原因是我经历的 rust 项目都比较复杂，依赖相对来说也更为麻烦，并不太适合做交叉编译到 x86_64-unknown-linux-musl。比如 openssl 这个依赖在 rust crate 里非常的流行，如果 crate 不提供 rustls 选项的话，那么编译的时候只能配置 openssl 环境。这还是比较麻烦的，于是就偷懒使用 ubuntu 镜像了。

之前有看过一些 rust 项目在使用 alpine 镜像的时候，会使用第三方的内存分配库，比如 jemalloc 。我对这个行为一直不太理解，直到最近看到了 musl 默认的内存分配器存在性能问题的文章。

测试

为了验证这个问题有多严重，我也测试了一下。

硬件: amd ryzen 9 7945hx3d 系统: ubuntu 24.04 构建镜像： ghcr.io/rust-cross/rust-musl-cross:x86_64-unknown-linux-musl 运行时镜像：gcr.io/iguazio/alpine:3.20

这里对比的是 musl 默认的分配器与 mimalloc。

测试代码一样:

fn main() {
    println!("=== Memory Allocator Benchmark ===");
    
    // 测试参数
    let num_threads = std::thread::available_parallelism().map_or(8, |x| x.get());
    let iterations = 100000;
    
    println!("Threads: {}, Iterations per thread: {}", num_threads, iterations);
    
    // 基准测试
    let start = Instant::now();
    
    let mut handles = vec![];
    for _ in 0..num_threads {
        let handle = std::thread::spawn(move || {
            let mut counter = 0;
            for _ in 0..iterations {
                let data = vec![1u8; counter % 1000 + 1]; // 动态分配不同大小的内存
                counter += usize::from(data.get(100).copied().unwrap_or(1));
            }
            counter
        });
        handles.push(handle);
    }
    
    let mut total_counter = 0;
    for handle in handles {
        total_counter += handle.join().unwrap();
    }
    
    let duration = start.elapsed();
    
    println!("Total counter: {}", total_counter);
    println!("Time elapsed: {:.2?}", duration);
    println!("Throughput: {:.2} operations/sec", 
             (num_threads * iterations) as f64 / duration.as_secs_f64());
}

唯一不同的是，使用 mimalloc 分配器的话，需要添加:

[dependencies]
mimalloc = { version = "0.1.48", features = ["secure","v3"] }

在main.rs 里添加：

// 使用 mimalloc 作为全局分配器
#[cfg_attr(target_env = "musl", global_allocator)]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

测试结果如下：

benchmark-test
=== Standard Allocator (musl default) ===
=== Memory Allocator Benchmark ===
Threads: 32, Iterations per thread: 100000
Total counter: 3200000
Time elapsed: 7.12s
Throughput: 449749.72 operations/sec

=== Mimalloc Allocator ===
=== Memory Allocator Benchmark (with mimalloc) ===
Threads: 32, Iterations per thread: 100000
Allocator: mimalloc with secure+v3 features
Total counter: 3200000
Time elapsed: 660.86ms
Throughput: 4842149.67 operations/sec

性能对比测试结果

项目	标准分配器 (musl默认)	Mimalloc分配器 (secure+v3特性)	性能提升
总操作数	3,200,000	3,200,000	相同
耗时	7.12秒	660.86毫秒 (0.66秒)	约 10.8倍
吞吐量	449,749.72 操作/秒	4,842,149.67 操作/秒	约 10.8倍

性能提升分析

mimalloc相比标准分配器的性能提升：

速度提升: 约 10.8倍 (7.12秒 vs 0.66秒) 吞吐量提升: 约 10.8倍 (4,842,149 vs 449,749 操作/秒)

上面的测试是 mimalloc 带了 secure feature 的，当我去除掉 secure feature 的时候，性能提升如下：

=== Standard Allocator (musl default) ===
=== Memory Allocator Benchmark ===
Threads: 32, Iterations per thread: 100000
Total counter: 3200000
Time elapsed: 7.05s
Throughput: 453651.14 operations/sec

=== Mimalloc Allocator ===
=== Memory Allocator Benchmark (with mimalloc) ===
Threads: 32, Iterations per thread: 100000
Allocator: mimalloc with v3 feature
Total counter: 3200000
Time elapsed: 19.62ms
Throughput: 163117709.56 operations/sec

对比结果如下：

项目	标准分配器 (musl默认)	Mimalloc分配器 (仅v3特性)	性能提升
总操作数	3,200,000	3,200,000	相同
耗时	7.05秒	19.62毫秒 (0.01962秒)	约 359倍
吞吐量	453,651.14 操作/秒	163,117,709.56 操作/秒	约 360倍

这个结果更夸张，我甚至在怀疑测试代码是不是有问题。不过这个结果应该是合理的，因为有人尝试在 48 核上做对比，得到了 700 倍的差异。

所以，如果你使用 musl 部署 rust 项目的话，就尽量使用第三方的内存分配器吧，比如 mimalloc 。

最小化部署镜像

Alpine 镜像受欢迎是因为它非常小。但是如果项目足够简单，其实可以尝试使用空镜像(scratch)。比如下面的写法：

# 使用多阶段构建来生成完全静态的二进制文件
FROM ghcr.io/rust-cross/rust-musl-cross:x86_64-unknown-linux-musl AS builder

# 设置工作目录
WORKDIR /app

# 复制Cargo配置文件（使用国内镜像加速）
COPY .cargo/config.toml /root/.cargo/config.toml

# 复制Cargo文件（利用Docker缓存层）
COPY Cargo.toml Cargo.lock ./

# 创建虚拟main.rs来预下载依赖
RUN mkdir src && \
    echo "fn main() {}" > src/main.rs && \
    cargo build --release --target x86_64-unknown-linux-musl && \
    rm -rf src

# 复制源代码
COPY src/ ./src/

# 构建应用（使用musl目标）
RUN cargo build --release --target x86_64-unknown-linux-musl

# 运行时阶段 - 使用scratch镜像（最小化镜像大小）
FROM scratch AS runtime

# 复制SSL证书（从构建镜像中获取）
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# 从构建阶段复制二进制文件
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/tinyserver /tinyserver

# 暴露端口
EXPOSE 3000


# 启动应用
CMD ["/tinyserver"]

采用这种方式，镜像的大小几乎就是程序本身的大小。

前提是要判断部署的项目是否足够简单，否则可能会遇到一些预料之外的运行时错误。稳妥起见，使用 Alpine 镜像比较保险。