Or maybe an argument in favor of using error correcting codes in that binary format. :)
Though it's hard to beat the robustness against corruption of an uncompressed text file. Entire sectors can be missing, and the rest is still easily readable without requiring specialized skills.
Posted Nov 22, 2011 5:45 UTC (Tue) by slashdot (guest, #22014)
[Link]
This is because text has a record delimiter that is not used within the records (the newline character), making synchronization trivial.
A binary format with record hashes is also similarly recoverable, since you can just try all record start positions until one hashes properly (much more expensive computationally, but on current CPU it won't be noticeable if records aren't huge).
Add sync markers and tags to binary files.
Posted Nov 22, 2011 6:31 UTC (Tue) by eru (subscriber, #2753)
[Link]
This is because text has a record delimiter that is not used within the records (the newline character), making synchronization trivial.
Binary format can easily have the same property: A synchronization marker (not necessarily a single byte) that is guaranteed to not appear in the data. This means the actual data needs some processing to avoid the marker, but this can be cheaper than a conversion to text. Eg. if your sync marker is 0x55, double it if it appears in the payload data. Some other byte combinations starting with 0x55 could tag the type of following data (date, numbers of different sizes, string etc), which also helps parse possibly corrupted files.