Not logged in
Log in now
Create an account
Subscribe to LWN
Pencil, Pencil, and Pencil
Dividing the Linux desktop
LWN.net Weekly Edition for June 13, 2013
A report from pgCon 2013
Little things that matter in language design
Though it's hard to beat the robustness against corruption of an uncompressed text file. Entire sectors can be missing, and the rest is still easily readable without requiring specialized skills.
The Journal - a proposed syslog replacement
Posted Nov 22, 2011 5:45 UTC (Tue) by slashdot (guest, #22014)
A binary format with record hashes is also similarly recoverable, since you can just try all record start positions until one hashes properly (much more expensive computationally, but on current CPU it won't be noticeable if records aren't huge).
Add sync markers and tags to binary files.
Posted Nov 22, 2011 6:31 UTC (Tue) by eru (subscriber, #2753)
Binary format can easily have the same property: A synchronization marker (not necessarily a single byte) that is guaranteed to not appear in the data. This means the actual data needs some processing to avoid the marker, but this can be cheaper than a conversion to text. Eg. if your sync marker is 0x55, double it if it appears in the payload data. Some other byte combinations starting with 0x55 could tag the type of following data (date, numbers of different sizes, string etc), which also helps parse possibly corrupted files.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds