I past few weeks my colleagues at work and I have been testing resistance of our applications on the power failure. Most times we convince clients to have UPS with every computer which runs database server, but it cannot be done everywhere.
We are using default 3.6 ReiserFS supplied by Slackware 10.2. We powered off the machines by force - pulling out the plug. Some of the apps. did printing on printers, so even if filesystem would loose data, we would have printed log on the printer. The show begins: in about 20 power offs, we had lost parts of some files 3 times. Ok, perhaps that was expected, since filesystem data is kept in cache, and not committed each time write happens. Since we anticipated this, we weren’t much upset. The apps. are built in such way to be able to ignore this and keep working.
However, there are bugs in ReiserFS that do some really bad things. On one of the systems, a log file we were examinating (after power off) was missing a part at the end. Instead of not having anything, it had some garbage characters and parts of some other file! I guess we were “lucky” to have a textual file “inserted” so we noticed it. The file was a SiS graphic card include file (.h) which is (I think) part of kernel source, found in completely different part of the hard disk (same filesystem though). It wasn’t a whole file, just the part of it, approximately the same size as the missing part of log.
On the other system, we had a problem of some files in user’s home directory getting mysteriously corrupt. For example, file /home/omega/.ICEauthority got corrupted in such way that we can’t read, write, rename or delete it. We keep getting “permission denied” even when we set 777 permissions to both file and parent directory.
It is pretty absurd that they claim ReiserFS 3.6 stable, when such things can occur. I have seen systems that run Reiser for years, without troubles (notebook I’m writing this on for example), but to be honest, those had one or none forced power offs so far. One more interesting thing is that in all those problems, when we run reiserfsck (with various options), it wouldn’t detect the errors. It would just say that everything is ok.
I urge developers and system administrators not to use ReiserFS for important data (like databases for example). If there is any chance of power failures and you don’t have UPS, use some other filesystem. Which one? I don’t know. I made a list containing ext3, jfs and xfs. We’ll try those in the following weeks and see which one shows to be robust enough. Stay tuned…