To achieve the convenience of accessing a compressed file as if it were a folder at the operating system level, we can use virtual file system technology (such as FUSE) to mount the compressed file as a directory, allowing direct read and write operations.
What is FUSE?
FUSE (Filesystem in Userspace) is a framework that allows ordinary users or developers to create custom file systems in user space (rather than the operating system kernel). It delegates file system operations (such as read, write, open file, etc.) to user-space programs without modifying the operating system kernel code.
For the requirement mentioned in the title, we only need to use FUSE to mount a ZIP file as a directory to achieve this functionality.
The Three Steps to Achieve the Goal
Following the Three Steps to Put an Elephant in the Refrigerator principle, we can break it down into:
- Retrieve the contents of the compressed file.
- Mount the directory.
- Access it like a regular directory.
Preparation
Here, we use Linux as the experimental environment. Windows does not support FUSE, and macOS can try MacFUSE, though I haven’t tested it.
First, FUSE is available on most Linux distributions (older kernels may not support it). For Ubuntu, install it as follows:
sudo apt-get install fuse3 libfuse3-dev
Implementation in Rust
First, create a project:
cargo new zipfs
Add dependencies related to ZIP and FUSE:
cargo add fuser zip libc anyhow
In theory, as long as we implement the Filesystem interface of fuser, the program will function as a file system. For simplicity, we aim to implement a read-only file system based on ZIP.
First, define a struct:
pub struct ZipFS {
attrs: HashMap<u64, FileAttr>,
contents: HashMap<String, (usize, Vec<u8>)>,
}
attrs records some file attributes. contents stores the file name, index, and file content, mainly to implement commands like cd, ls, and cat. The contents are retrieved from the compressed file, so we need to fetch the ZIP file’s data during instantiation:
impl ZipFS {
pub fn new<P: AsRef<Path>>(zip_path: P) -> anyhow::Result<Self> {
let file = File::open(zip_path)?;
let mut archive = ZipArchive::new(file)?;
let mut attrs = HashMap::new();
let mut contents = HashMap::new();
for i in 0..archive.len() {
let mut entry = archive.by_index(i)?;
let path = entry.name().to_string();
let mut cur = Cursor::new(vec![]);
std::io::copy(&mut entry, &mut cur)?;
contents.insert(path, (i + 2, cur.into_inner()));
attrs.insert(
i as u64 + 2,
FileAttr {
ino: i as u64 + 2,
size: entry.size(),
..FILE_ATTR
},
);
}
dbg!(&attrs);
// let attrs = HashMap::new();
// let contents = HashMap::new();
Ok(Self { attrs, contents })
}
}
Next, implement the Filesystem trait. The Filesystem trait has a whopping 43 methods! But don’t panic. Since we’re only implementing a read-only file system, we only need to implement four methods: lookup, getattr, read, and readdir.
lookup
The lookup method is responsible for finding an entry in the file system based on the parent node ID and file name, and returning the result via reply.
fn lookup(&mut self, _req: &Request, parent: u64, name: &OsStr, reply: ReplyEntry) {
let name = name.to_str().unwrap();
println!("[DEBUG] lookup: parent={}, name={}", parent, name,); // Add logs
let ino = self
.contents
.iter()
.find(|(x, _)| **x == name)
.map(|(_, y)| y.0);
let Some(ino) = ino else {
reply.error(ENOENT);
return;
};
let attr = self.attrs.iter().find(|x| *x.0 == (ino as u64));
if let Some((_, attr)) = attr {
reply.entry(&TTL, attr, 0);
} else {
reply.error(ENOENT);
}
}
getattr
The getattr method handles file system attribute retrieval requests.
fn getattr(&mut self, _req: &Request, ino: u64, _fh: Option<u64>, reply: ReplyAttr) {
// println!("[DEBUG] getattr: ino={}", ino);
match ino {
1 => reply.attr(&TTL, &DIR_ATTR),
2.. => reply.attr(&TTL, &FILE_ATTR),
_ => reply.error(ENOENT),
}
}
read
The read method is mainly used to read files. It returns the file content or an error based on the file node ID ino.
fn read(
&mut self,
_req: &Request,
ino: u64,
_fh: u64,
offset: i64,
_size: u32,
_flags: i32,
_lock: Option<u64>,
reply: ReplyData,
) {
println!("[DEBUG] read: ino={}", ino);
let content = self
.contents
.iter()
.find(|(k, v)| v.0 == ino as usize)
.map(|(_, v)| v.1.clone());
if let Some(content) = content {
reply.data(&content[offset as usize..]);
} else {
reply.error(ENOENT);
}
}
readdir
The readdir method is mainly used to read directory contents.
fn readdir(
&mut self,
_req: &Request,
ino: u64,
_fh: u64,
offset: i64,
mut reply: ReplyDirectory,
) {
if ino != 1 {
reply.error(ENOENT);
return;
}
let mut entries = vec![
(1, FileType::Directory, "."),
(1, FileType::Directory, ".."),
];
for (filename, entry) in self.contents.iter() {
entries.push((entry.0, FileType::RegularFile, filename));
}
for (i, entry) in entries.into_iter().enumerate().skip(offset as usize) {
// i + 1 means the index of the next entry
if reply.add(entry.0 as u64, (i + 1) as i64, entry.1, entry.2) {
break;
}
}
reply.ok();
}
Mounting and Invocation
The remaining steps are straightforward. First, manually create a ZIP file for testing:
echo 'test1' > test1.txt
echo 'i love rust' > test2.txt
zip test.zip test1.txt test2.txt
Mount it in the main function:
fn main() {
let mountpoint = "/tmp/zip";
let fs = ZipFS::new("./test.zip").unwrap();
fuser::mount2(
fs,
mountpoint,
&[
MountOption::RO,
MountOption::FSName("zipfs".to_string()),
MountOption::AutoUnmount,
MountOption::AllowRoot,
],
)
.unwrap();
}
Note: If you don’t want to manually unmount, it’s best to add MountOption::AutoUnmount.
Open a terminal, run cargo run, and then open another terminal:
cd /tmp/zip
ls
cat test2.txt
If you see the content i love rust, it means the operation was successful.
Why Go Through All This Trouble?
In fact, many others have had the same idea, and someone has already built a solution. It’s called archivemount. On Ubuntu, simply install it:
sudo apt install archivemount
Then directly mount the compressed file:
archivemount test.zip /tmp/zip
# Directly access /tmp/zip
cd /tmp/zip && ls
(base) ➜ zip ls
test1.txt test2.txt
What is FUSE Used For?
In fact, many cloud storage clients or object storage clients (such as OSS, AWS) use FUSE to mount files from the network to the local system. The difference is that they don’t mount compressed files but directly mount HTTP protocols.