XML standard allows the use of entities, declared in the DOCTYPE of the document, which can be internal or external.
When parsing the XML file, the content of the external entities is retrieved from an external storage such as the file system or network, which may lead, if no restrictions are put in place, to arbitrary file disclosures or server-side request forgery (SSRF) vulnerabilities.
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE person [ <!ENTITY file SYSTEM "file:///etc/passwd"> <!ENTITY ssrf SYSTEM "https://internal.network/sensitive_information"> ]> <person> <name>&file;</name> <city>&ssrf;</city> <age>18</age> </person>
It’s recommended to limit resolution of external entities by using one of these solutions:
lxml module:
parser = etree.XMLParser() # Noncompliant: by default resolve_entities is set to true
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=True) # Noncompliant
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=True) # Noncompliant
treexsd = etree.parse('ressources/xxe.xsd', parser)
rootxsd = treexsd.getroot()
schema = etree.XMLSchema(rootxsd)
ac = etree.XSLTAccessControl(read_network=True, write_network=False) # Noncompliant, read_network is set to true/network access is authorized transform = etree.XSLT(rootxsl, access_control=ac)
xml.sax module:
parser = xml.sax.make_parser()
myHandler = MyHandler()
parser.setContentHandler(myHandler)
parser.setFeature(feature_external_ges, True) # Noncompliant
parser.parse("ressources/xxe.xml")
lxml module:
resolve_entities and network access:
parser = etree.XMLParser(resolve_entities=False, no_network=True) # Compliant
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=False) # Compliant: by default no_network is set to true
treexsd = etree.parse('ressources/xxe.xsd', parser)
rootxsd = treexsd.getroot()
schema = etree.XMLSchema(rootxsd) # Compliant
parser = etree.XMLParser(resolve_entities=False) # Compliant
treexsl = etree.parse('ressources/xxe.xsl', parser)
rootxsl = treexsl.getroot()
ac = etree.XSLTAccessControl.DENY_ALL # Compliant
transform = etree.XSLT(rootxsl, access_control=ac) # Compliant
To prevent xxe attacks with xml.sax module (for other security reasons than XXE, xml.sax is not recommended):
parser = xml.sax.make_parser()
myHandler = MyHandler()
parser.setContentHandler(myHandler)
parser.parse("ressources/xxe.xml") # Compliant: in version 3.7.1: The SAX parser no longer processes general external entities by default
parser.setFeature(feature_external_ges, False) # Compliant
parser.parse("ressources/xxe.xml")