Debian Linux Security Advisory 3057-1 – Sogeti found a denial of service flaw in libxml2, a library providing support to read, modify and write XML and HTML files. A remote attacker could provide a specially crafted XML file that, when processed by an application using libxml2, would lead to excessive CPU consumption (denial of service) based on excessive entity substitutions, even if entity substitution was disabled.
Entities in principle are similar to simple C macros. An entity defines an abbreviation for a given string that you can reuse many times throughout the content of your document. Entities are especially useful when a given string may occur frequently within a document, or to confine the change needed to a document to a restricted area in the internal subset of the document (at the beginning). Example:
1 <?xml version="1.0"?> 2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [ 3 <!ENTITY xml "Extensible Markup Language"> 4 ]> 5 <EXAMPLE> 6 &xml; 7 </EXAMPLE>
Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing its name with ‘&’ and following it by ‘;’ without any spaces added. There are 5 predefined entities in libxml2 allowing you to escape characters with predefined meaning in some parts of the xml document content: < for the character ‘<‘, > for the character ‘>’, ' for the character ”’, " for the character ‘”‘, and & for the character ‘&’.
One of the problems related to entities is that you may want the parser to substitute an entity’s content so that you can see the replacement text in your application. Or you may prefer to keep entity references as such in the content to be able to save the document back without losing this usually precious information (if the user went through the pain of explicitly defining entities, he may have a a rather negative attitude if you blindly substitute them as saving time). The xmlSubstituteEntitiesDefault() function allows you to check and change the behaviour, which is to not substitute entities by default.
Here is the DOM tree built by libxml2 for the previous document in the default case:
/gnome/src/gnome-xml -> ./xmllint --debug test/ent1 DOCUMENT version=1.0 ELEMENT EXAMPLE TEXT content= ENTITY_REF INTERNAL_GENERAL_ENTITY xml content=Extensible Markup Language TEXT content=
And here is the result when substituting entities:
/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1 DOCUMENT version=1.0 ELEMENT EXAMPLE TEXT content= Extensible Markup Language