|
|
Subscribe / Log in / New account

Yet again, a significant root cause of issues is C here

Yet again, a significant root cause of issues is C here

Posted Jun 27, 2025 9:07 UTC (Fri) by chris_se (subscriber, #99706)
In reply to: Yet again, a significant root cause of issues is C here by wahern
Parent article: Libxml2's "no security embargoes" policy

> While WUFFS includes a JSON decoder, it's just a JSON tokenizer, not a parser. (It includes an example JSON parser, but only the tokenizer is using WUFFS.)

Just took a look at the JSON examples in WUFFS after you mentioned this, and yeah...

For JSON the difference between a pure tokenizer and a SAX-like parser are small enough that it doesn't really matter, which is why that's fine. But I don't think this still holds for XML, especially if you include all the features libxml2 supports.

Plus the main appeal of libxml2 is the support for DOM and other more advanced features, not just having a SAX parser, those are a dime a dozen, so even a SAX parser would maybe be at most 10% of libxml2...

> I would be surprised if there were any security CVEs in major implementations purely rooted in tokenization. In my recollection, bugs in libxml2 and libxslt (and libexpat and others, for that matter) have been in the higher levels of the implementation stack.

Yes, the pure tokenization of XML is probably the easiest part of parsing XML, so I don't expect that any mature XML parser will have any remaining bugs remaining in the tokenization logic.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds