Yet again, a significant root cause of issues is C here
Yet again, a significant root cause of issues is C here
Posted Jun 27, 2025 9:07 UTC (Fri) by chris_se (subscriber, #99706)In reply to: Yet again, a significant root cause of issues is C here by wahern
Parent article: Libxml2's "no security embargoes" policy
Just took a look at the JSON examples in WUFFS after you mentioned this, and yeah...
For JSON the difference between a pure tokenizer and a SAX-like parser are small enough that it doesn't really matter, which is why that's fine. But I don't think this still holds for XML, especially if you include all the features libxml2 supports.
Plus the main appeal of libxml2 is the support for DOM and other more advanced features, not just having a SAX parser, those are a dime a dozen, so even a SAX parser would maybe be at most 10% of libxml2...
> I would be surprised if there were any security CVEs in major implementations purely rooted in tokenization. In my recollection, bugs in libxml2 and libxslt (and libexpat and others, for that matter) have been in the higher levels of the implementation stack.
Yes, the pure tokenization of XML is probably the easiest part of parsing XML, so I don't expect that any mature XML parser will have any remaining bugs remaining in the tokenization logic.