I think of Fetch as a reliable FTP client.
For twenty years we've tuned and tweaked its code to handle new situations, and users regularly tell us that Fetch has worked when other alternatives didn’t. Nonetheless, from time to time we have received a particular and troubling sort of user report.
The user would be uploading a big file, or a bunch of files, and somewhere in the process the upload would stall, or fail with an error. It didn’t happen every time, or always in the same place, and my colleagues and I could never reproduce it ourselves. Most users would never see this issue — out of the hundreds of thousands of Fetch users we were only getting two or three reports a month. But we knew that for every user who took the time to contact us there were probably several more who didn’t.
I was itching to go after this problem; writing some code to make a user’s life a little better is the best part about being a programmer. But first I needed to get a good look at it. If I couldn’t reproduce the problem, my attempted solutions would be shots in the dark, and I’d never be sure that I’d actually fixed it. I needed a reliable way to make Fetch behave unreliably.
10,000 Files To Fetch On A Wall
We never saw this problem with the server we used for much of our Fetch testing, which (like our website) is hosted at Pair Networks. For the first time I cursed Pair’s excellent reliability. Our support staff started asking users who reported these problems where they were hosting their sites, and I bought accounts from each company. We are now the proud owners of accounts at some of the worst hosting providers around. Since the problem didn’t happen all the time, or even most of the time, I wrote scripts to upload 10,000 files to each of our many test accounts, one after another, hoping that the problem would appear (preferably before the Comcast bandwidth police appeared at my door).
Hosting From Hell
After some false alarms I finally hit the jackpot — a hosting service that randomly but reliably failed every time I tried our standard 10,000 file upload. I tried uploading with other FTP clients, and they all failed as well. The best part was that it failed in completely unpredictable ways: sometimes during a transfer, sometimes setting up the transfer, sometimes getting a file list, sometimes deleting files. I never knew how it would break, but I knew that if I tried to upload 10,000 files it was sure to fail in some way. It was a hosting service you wouldn’t wish on your worst enemy, and I was thrilled to find it.
Step by step I refined Fetch’s error handling to keep it going in the face of the demonic server’s errors. Several times I was sure that I’d fixed the last remaining issue, only to have another appear. Our QA engineer, Doug Grinbergs, and I varied our test routine, uploading lots of small files, lots of empty folders and very deep hierarchies of 100,000 folders.
At last I had a Fetch version that would reliably upload 10,000 files to all of our test servers, including the cursed one. In fact I ran tests until I saw one million straight uploads to that server without an error. Big businesses like Motorola and GE talk about reducing the rate of defects to under 3.4 in a million (they call it Six Sigmas™). When those million uploads were done I felt that we’d earned our sigmas.
But all this work was based on a hypothesis, that the solution to uploading problems with our cursed test server was also the solution to the more elusive problems seen by some of our users. We started sending a pre-release Fetch version to every user who reported a similar sounding problem, and asking them to try it. At first we found a few more issues, which we were able to address. And then ... nothing but positive feedback. We’ve now been distributing these special versions for over a year. In all that time we’ve yet to come across a user whose random upload problems weren’t solved by the new code.
Today we’re releasing Fetch 5.5, and we’ll find out if we can keep that streak going with many, many more users. I can’t wait.