First Iteration of My Free Software Mirror

As I’m gearing towards setting up a Free Software download mirror in India, it occurred to me that I haven’t chronicled the work and motivation behind setting up the original mirror in the first place. Also, seems like it would be good to document stuff here for observing the progression, as the mirror is going multi-country now. Right now, my existing mirror i.e., mirrors.de.sahilister.net (was mirrors.sahilister.in), is hosted in Germany and serves traffic for Termux, NomadBSD, Blender, BlendOS and GIMP. For a while in between, it hosted OSMC project mirror as well.

To explain what is a Free Software download mirror thing is first, I’ll quote myself from work blog -

As most Free Software doesn’t have commercial backing and require heavy downloads, the concept of software download mirrors helps take the traffic load off of the primary server, leading to geographical redundancy, higher availability and faster download in general.

So whenever someone wants to download a particular (mirrored) software and click download, upstream redirects the download to one of the mirror server which is geographical (or in other parameters) nearby to the user, leading to faster downloads and load sharing amongst all mirrors.

Since the time I got into Linux and servers, I always wanted to help the community somehow, and mirroring seemed to be the most obvious thing. India seems to be a country which has traditionally seen less number of public download mirrors. IITB, TiFR, and some of the public institutions used to host them for popular Linux and Free Softwares, but they seem to be diminishing these days.

In the last months of 2021, I started using Termux and saw that it had only a few mirrors (back then). I tried getting a high capacity, high bandwidth node in budget but it was hard in India in 2021-22. So after much deliberation, I decided to go where it’s available and chose a German hosting provider with the thought of adding India node when conditions are favorable (thankfully that happened, and India node is live too now.). Termux required only 29 GB of storage, so went ahead and started mirroring it. I raised this issue in Termux’s GitHub repository in January 2022. This blog post chronicles the start of the mirror.

Termux has high request counts from a mirror point of view. Each Termux client, usually checks every mirror in selected group for availability before randomly selecting one for download (only other case is when client has explicitly selected a single mirror using termux-repo-change). The mirror started getting thousands of requests daily due to this but only a small percentage would actually get my mirror in selection, so download traffic was lower. Similar thing happened with OSMC too (which I started mirroring later).

With this start, I started exploring various project that would be benefit from additional mirrors. Public information from Academic Computer Club in Umeå’s mirror and Freedif’s mirror stats helped to figure out storage and bandwidth requirements for potential projects. Fun fact, Academic Computer Club in Umeå (which is one of the prominent Debian, Ubuntu etc.) mirror, now has 200 Gbits/s uplink to the internet through SUNET.

Later, I migrated to a different provider for better speeds and added LibreSpeed test on the mirror server. Those were fun times. Between OSMC, Termux and LibreSpeed, I was getting almost 1.2 millions hits/day on the server at its peak, crossing for the first time a TB/day traffic number.

Next came Blender, which took the longest time to set up of around 9–10 months. Blender had a push-trigger requirement for rsync from upstream that took quite some back and forth. It now contributes the most amount of traffic on the mirror. On release days, mirror does more than 3 TB/day and normal days, it hovers around 2 TB/day. Gimp project is the latest addition.

At one time, the mirror traffic touched 4.97 TB/day traffic number. That’s when I decided on dropping LibreSpeed server to solely focus on mirroring for now, keeping the bandwidth allotment for serving downloads only.

The mirror projects selection grew organically. I used to reach out many projects discussing the need of for additional mirrors. Some projects outright denied mirroring request as Germany already has a good academic mirrors boosting 20-25 Gbits/s speeds from FTP era, which seems fair. Finding the niche was essential to only add softwares, which would truly benefit from additional capacity. There were months when nothing much would happen with the mirror, rsync would continue to update the mirror while nginx would keep on serving the traffic. Nowadays, the mirror pushes around 70 TB/month. I occasionally check logs, vnstat, add new security stuff here and there and pay the bills. It now saturates the Gigabit link sometimes and goes beyond that, peaking around 1.42 Gbits/s (the hosting provider seems to be upping their game). The plan is to upgrade the link to better speeds.

vnstat yearly

Yearly traffic stats (through `vnstat -y`)

On the way, learned quite a few things like -

GeoIP Map of Clients from Yesterday Access Logs

GeoIP Map of Clients from Yesterday's Access Logs. Click to enlarge
Generated from IPinfo.io

In hindsight, the statistics look amazing, hundreds of TBs of traffic served from the mirror, month after month. That does show that there’s still an appetite for public mirrors in time of commercially “donated” CDNs and GitHub. The world could have done with one less mirror, but it saved some time, lessened the burden for others, while providing redundancy and traffic localization with one additional mirror. And it’s fun for someone like me who’s into infrastructure that powers the Internet. Now, I’ll try focusing and expanding the India mirror, which in itself started pushing almost half a TB/day. Long live Free Software and public download mirrors.