Let’s take a look at the following code:
let vb = unsafe {
VarBuilder::from_mmaped_safetensors(&["./model.safetensors"], DType::F32, &Device::Cpu)?
};
The above is an example of how Candle loads large model weight files. This code achieves zero-copy loading of large model files using memory mapping (mmap) and the safetensors format. This raises a question:
How to elegantly read an extremely large file?
Assume a service has a requirement for reading files (e.g., uploading). Loading the entire file into memory at once is not a viable option. A typical file upload operation reads the file from disk into a user-space buffer, directly consuming memory.
(For normal large file uploads, chunked uploads should be used.)
Below, we compare the memory usage of different methods when handling large files.
Memory usage statistics:
struct TrackingAllocator;
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
unsafe impl GlobalAlloc for TrackingAllocator {
unsafe fn alloc(&self, layout: std::alloc::Layout) -> *mut u8 {
let ret = unsafe { System.alloc(layout) };
if !ret.is_null() {
ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
}
ret
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: std::alloc::Layout) {
unsafe { System.dealloc(ptr, layout) };
ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst);
}
}
#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;
fn print_memory_usage() {
println!("Allocated memory: {} bytes", ALLOCATED.load(Ordering::SeqCst));
}
Generate a 1GB file:
vdd if=/dev/urandom of=testfile.bin bs=1M count=1024
# -rw-rw-r-- 1 staf staf 1.0G Mar 29 07:38 testfile.bin
Using the standard library:
/// This function reads the content of a specified file and writes it into another file, while printing memory usage.
fn normal(file_path: &str)->anyhow::Result<()> {
let mut file = File::open(file_path)?;
let mut buffer = Vec::new();
file.read_to_end(&mut buffer)?; // Read file content into buffer
let mut wf = File::create("testfile222.bin").expect("failed to create the file");
wf.write_all(&buffer)?;
print_memory_usage();
Ok(())
}
Memory usage obtained by calling:
Allocated memory: 1073741824 bytes
Using the asynchronous library Tokio:
async fn tokio_io(file_path: &str) -> anyhow::Result<()> {
// Asynchronously open the file
let mut file = tokio::fs::File::open(file_path).await?;
let mut wf = tokio::fs::File::create("testfile222.bin").await?;
let _ = tokio::io::copy(&mut file, &mut wf).await?;
print_memory_usage();
Ok(())
}
Memory usage obtained by calling:
Allocated memory: 172952 bytes
Tokio seems to implement chunked circular buffer optimization internally, resulting in much lower memory usage compared to the standard library.
mmap
Memory-mapped (mmap) is a technique where the operating system maps a file or device directly into the virtual memory space of a process, allowing file content to be accessed like memory without explicitly invoking read/write system calls.
The general flow of copying a file:
[Source file page cache] → [Process virtual memory] → [Target file page cache]
↑--------- User-space copy ---------↑
Since it bypasses user-space buffering, it uses almost no memory.
Using mmap:
cargo add memmap2
/// Map the file into memory and write its content to a new file.
fn io_mmap(file_path: &str)->anyhow::Result<()> {
let file = File::open(file_path).expect("failed to open the file");
let mmap = unsafe { Mmap::map(&file).expect("failed to map the file") };
let mut wf = File::create("testfile222.bin").expect("failed to create the file");
let _ = wf.write_all(&mmap[..]);
print_memory_usage();
Ok(())
}
Memory usage obtained by calling:
Allocated memory: 1024 bytes # Memory mapping
Comparison (1GB file)
| Method | Memory |
|---|---|
| Standard Library | 1073741824 B |
| Tokio | 172952 B |
| mmap | 1024 B |