Showing posts from March, 2013

Mirrors vs. automatic backups

There was an incident recently, which involved a near-loss of some important git repository. The incident involved the distributed system of multiple servers, one being the designated master and multiple slaves. Slaves pulled data automatically, and that was apparently done quite often. You guess what happens next, the master copy got corrupt, and before anyone knows all the slaves pull the corrupt copy. In the follow up ( here ) they state that they have a backup system in place which is principially different from a RAID 1. That is obviously not so. Any system with automatic replication is subject to the following failure mode - the master copy is damaged, and the damage is then automatically replicated to slaves (mirrors). The automatic replication systems are designed around the assumption that all the master failures are fail-stop - the master either fails mechanically and ceases to perform completely, or the master can detect any and all cases of corruption in it and sutdown