Engram
A space-efficient and highly performant version control system Engram is a fast and space-efficient version control system for portable file backups, inspired by git and rsnapshot. It creates snapshots of directories and stores them in a compressed and portable delta-based format. Engram can be used in a cron job to automatically backup files, and the backup can be stored remotely with a tool like rclone. It will only update internal files when necessary, so modification times can be factored when using rsync-like tools. Engram does not encrypt snapshots, so this should be handled externally if desired. Unlike other backup tools, engram allows for deletion of any number of previous revisions, as it stores instructions on how to create previous snapshots from the current state. Engram is also heavily optimized for performance, and is capable of processing files at multiple GB/s on modern hardware. My work on Engram began out of personal necessity. My server was previously using rsnapshot for directory backups, however, rsnapshot makes heavy use of hard-links to deduplicate and preserve storage space. Hard-links are not at all portable and as such cannot be uploaded to cloud storage solutions for offsite backups. The core idea of Engram is to create patches that store instructions on how to return to the previous version of a directory, instead of how to build the current version. Each patchfile has a list of entries storing the necessary data and operations required to revert specific files or directories. This backwards approach has two key benefits over other version control systems: This project crossed the perfect intersection of personal and academic interests for me. I always love optimization and algorithmic problems, and it was extremely interesting to spend hundreds of hours perfecting this system as much as possible. I highly recommend reading the paper for exact details, but I'll list a few of the biggest optimizations below: There was a lot, and I mean a lot of debugging related to this project. I had actually started work on Engram over the summer, and was running it on my server before I realized it only worked half the time. My professor eventually managed to convince me to do some more thorough testing, so I wrote a python fuzzer to automate the process. This caught a ton of bugs. It was both extremely exciting and extremely frustrating to watch the rollbacks occur one by one and pray the next didn't fail. Each time I would think I found the issue, only to realize there were five other problems I hadn't accounted for. Aside from all the technical developments (i.e. a lot about Rust, system architecture, memory mapping, fuzzy hashing, etc.), I learned a lot about myself. This, along with the other project here, are the most invested I’ve been in a project in my life.