|
|
Subscribe / Log in / New account

Preserving the global software heritage

Preserving the global software heritage

Posted Jul 8, 2016 6:51 UTC (Fri) by robbe (guest, #16131)
Parent article: Preserving the global software heritage

> Users can search for specific files by their SHA-1 hashes, but
> cannot browse.
To be specific: all you can get currently is a yes/no answer to the question: „is that SHA-1 hash contained in the archive?“

(It was not clear to me.)

SHA-1, while still absolutely enough for this kind of application, seems a strange choice today. Is git compatibility an issue?


to post comments

Preserving the global software heritage

Posted Jul 8, 2016 12:50 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link] (4 responses)

"seems a strange choice today"

What else might one recommend?

Preserving the global software heritage

Posted Jul 8, 2016 20:26 UTC (Fri) by robbe (guest, #16131) [Link] (3 responses)

sha-2 256, I guess. But that would also bloat their postgres DB…

Preserving the global software heritage

Posted Jul 9, 2016 0:48 UTC (Sat) by flussence (guest, #85566) [Link] (2 responses)

If size is a concern, RIPEMD-160 is the same as SHA1 while being a bit less broken and widely available. SHA1 has hardware acceleration though, probably significant for a dataset this huge.

Preserving the global software heritage

Posted Jul 11, 2016 14:22 UTC (Mon) by hkario (subscriber, #94864) [Link] (1 responses)

the problem is that malicious users can create SHA-1 collisions, RIPEMD-160 is not much better (yes, it moves the problem few years in the future, but it does not eliminate it)

you simply should not use any kind of 160bit hash in current time, especially for a project that is just being deployed

Preserving the global software heritage

Posted Jul 11, 2016 22:09 UTC (Mon) by flussence (guest, #85566) [Link]

I'm at a loss to what the real security issue of weak hashes on a public dataset is. Can you give examples?

Preserving the global software heritage

Posted Jul 11, 2016 10:33 UTC (Mon) by zack (subscriber, #7062) [Link]

> SHA-1, while still absolutely enough for this kind of application, seems a strange choice today. Is git compatibility an issue?

To clarify, we offer only SHA1 as lookup mechanism in the current (very minimal for now) Web UI, but we do not rely on the fact that we will not encounter SHA1 collisions in the wild. (Even though I personally do agree that SHA1 is still absolutely enough for this kind of applications, we are trying to be future proof and we know we will eventually need to move away from SHA1 even for integrity checking purposes.)

Internally in our DB we currently use 3 kinds of checksums—SHA1, SHA2 (256), "salted" SHA1 (a-la git hash-object)—and we do cross checks to spot collisions on a single one of them.

We would like to add SHA3 in the mix (possibly dropping SHA2), but for that we were waiting for a stable SHA3 implementation to land in Python 3.x (we're currently on 3.4).

Hope this clarifies.
/me, wearing his Software Heritage hat


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds