WEB_SECURITY

Imperva Reversal: I

Reversing Imperva's reese84 Advanced Bot Protection challenge - from obfuscated VM to semantically faithful IR listing

Introduction

Being greeted by a wall of seemingly never-ending Base64 is usually the start of either a very easy reverse engineering challenge, or a headache-inducing one. Incapsula's reese84 masks itself as the simple kind, but it's far from it.

Dissection

(function () {
  var A = window.atob("b64_string");
  var E = new window.Uint8Array(A.length);
  for (var B = 0; B < A.length; B++) {
    E[B] = A.charCodeAt(B);
  }
  E = new window.Uint16Array(E.buffer);
  var Q = [
    null,
    null,
    [],
    function (A) {
      return A();
    },
    function (A) {
      return function (E) {
        return A()();
      };
    },
    function (A) {
      return function (E) {
        return (function (A) {
          return function () {
            return A(arguments);
          };
        })(A(E));
      };
    },
    function (A) {
      return function (E) {
        return (
          (function (A) {
            return !(function (A) {
              return null;
            })();
          })() | A()
        );
      };
    },
    ...
  ];
  Q[0] = Q;
  var B = 0;
  while (B < E.length) {
    Q[E[B++]] = Q[E[B++]](Q[E[B++]]);
  }
})();

Looks simple, right? Some basic IIFEs, type coercion, b64 decoding and mapping it to u16s, and a triplet execution loop. But look closer at the functions in Q: none of them actually do anything on their own. Each takes one argument and hands back another function, and nothing fires until enough arguments have been supplied - the function is fully saturated. They're curried thunks, and the real behaviour lives in how the triplet loop composes them, not in any single body.

Some of them are a mess to read raw, so let's build a basic unary solver (which evaluates single-argument calls down to a concrete value where it can) and an IIFE folder.

Before we jump in, there's one trick they pull that trips a lot of people up when folding these curried bodies.

function (A) {
  return function (E) {
    return (function (A) {
      return function () {
        return A(arguments);
      };
    })(A(E));
  };
},

A lot of people might miss that their IIFE folder substitutes the body with

return function () {
  return A(E)(arguments)
}

The core issue is the deferred evaluation of A(E). The inner IIFE forces A(E) once, at the moment the closure is built. The naive fold above re-defers it, so A(E) re-runs on every call - with a different arguments each time. The correct fold binds A(E) once and evaluates it at depth 2 (variables renamed for sanity):

function(v7) {
  return function(v8) {
    var v9 = v7(v8);
    return function() {
      return v9(arguments);
    };
  };
},

Once you fold and solve the type coercion, you'll notice something about the VM's skeleton: there's no constant pool. Does that mean the VM just doesn't use constants? No.

function(v66) {
  return function(v67) {
    return "false[object Window]"[v66()];
  };
},

false + window coerces to the string "false[object Window]", and indexing it with v66() extracts a single character - that index is the constant. Plenty of other opcodes do the same thing. So the VM does use constants, they're just synthesized on the fly from coerced primitives.

Which means the opening stretch of instructions is probably just assembling a character lookup map.

We could verify that by mapping the opcodes and emulating the triplet loop. But that wouldn't scale - the b64 string is large, and a curried VM almost certainly means an instruction stream in the MBs. We can do better and stay faithful to the VM's semantics at the same time.

In theory, unraveling the currying and producing a flattened instruction stream is really simple. But first, we need to re-visit opcode mapping.

Mapping Opcodes

Those source transformations give us a clean look at what the opcode bodies actually do - and they're pretty simple. So rather than writing a bespoke matcher for each of the ~120 opcode indices by hand, we lower them generically: walk the AST, translate each expression into our own expression type, one that only knows the handful of things these opcodes actually do.

That sounds like a lot of ground to cover, but the opcode pool collapses into something much smaller. Plenty of opcodes lower practically one-to-one. A BinaryExpression would become our abstracted BinOp, an indexed coercion becomes an Index that gets optimized into a Const - and the rest are just small combinations of the same vocabulary: Call, Force, Index, Cond, etc.

The real work the translator does isn't actually the lowering, it's counting. Every opcode is curried, so the actual opcode index means nothing until it's been handed enough arguments and called enough to fire. As we lower a body we count two things: the arity (how many curried arguments the opcode takes), and whether the fully-applied body still hands back a thunk that needs one more forcing call. Their sum is exactly how many cycles the opcode needs before it's saturated and ready to emit.

Classifying Shapes

Lowering gives us a clean expression tree per opcode, but a tree of BinOp/Index/Call nodes still doesn't tell us what an opcode is. Is let l0 = arg0(arg1) in λ.l0(arguments) a closure builder, an IIFE, or a frame push? Two opcodes can lower to structurally similar trees and mean completely different things to the VM. We need a second, coarser pass that looks at a lowered body and answers "what family does this belong to?".

That pass is the classifier. It runs once over every opcode body and tries to match it against a fixed catalogue of roles - not one matcher per opcode index, but one matcher per kind of thing the VM does. The catalogue is small and finite, because the VM's vocabulary is small and finite:

  • Leaves: literals (LiteralNumber, LiteralBool, LiteralNull, LiteralWindow), slot/argument/window reads and writes (SlotRead, ArgsRead, IndexWrite, …), BinOp/UnaryOp/Ternary, and the call shapes (CallN, NewN, MethodCallN, Apply/Apply2/Apply3, …).
  • Forcing glue: Force, DoubleForce, PassthroughArg0 - the no-op-looking opcodes whose entire job is to pull a value out of a thunk.
  • Combinators: Compose, Pipe, Seq, Iife, Apply, With - the higher-order plumbing that wires other opcodes together.
  • Frame ops: NewFrame, NewFrameWithArg, Link, FramePush, Body - the calling-convention machinery around the VM's slot[2] activation record.
  • Control flow: While, TryCatch - recognised by their thunk-wrapped shape (a thunk whose body is a while/try over forced arguments).

Each match is purely structural. Ternary, for instance, is "a Cond whose test forces argument 2, whose consequent forces argument 0, and whose alternate forces argument 1". Compose is "apply argument 0 to the result of applying argument 1 to argument 2". Nothing here is keyed to an opcode's index; the same matcher fires for every opcode that happens to share the shape, which is exactly why it survives the polymorphism between samples.

Alongside the role, the classifier records one more bit per opcode: whether the body needs a frame. An opcode needs a frame if its body actually invokes one of its arguments with arguments, or touches slot[2][0] - i.e. if firing it can run user code that reads or writes the activation record. That single boolean later decides whether the saturator is allowed to freely inline a body or has to treat it as an opaque, effectful call.

The role table is what everything downstream leans on. The flattener uses it to decide what to force; the saturator uses it to decide what's safe to inline; the pretty-printer uses leaf roles to turn v3[18](a)(b) back into (a + b) and combinator roles to turn a compose chain into a { …; …; } block; the control-flow roles are what let while and try/catch reappear at all. When we said earlier that an IIFE is "an opcode whose role we've tagged as iife" - this is where that tag comes from.

Flattening

We left off knowing exactly how many cycles it takes to saturate an opcode. That number controls when we emit - and it's what lets us transcribe the VM rather than just emulate it.

First, we need to look at the bytecode. The decode loop hands us a Uint16Array, and the triplet loop reads it three slots at a time - Q[E[B++]] = Q[E[B++]](Q[E[B++]]). JavaScript evaluates assignment targets before the right-hand side, so the three reads land in a fixed order: destination, operator, and operand.

To make things cleaner, we parse that flat u16 stream into triplets - dest, operator, operand - giving us: Q[dest] = Q[operator](Q[operand]).

So every triplet is effectively "apply whatever's in the operator slot to whatever's in the operand slot, and write the result to the destination slot". Simple, right?

But instead of actually running that, we do something called shadowing. We build a frame that mirrors Q - slot 0 points back to the frame itself, the leading null/[] slots get seeded, and every remaining slot starts with its designated opcode.

This allows us to linearly walk the triplets in the exact order the VM would, except the slots don't actually hold live functions - instead they hold abstracted values: an opcode, a partially-applied opcode with some arguments collected, or a finished expression. For each triplet, we look at what's sitting in the operator slot:

  • an opcode → start a fresh partial application, with one argument applied
  • an already-partial opcode → apply another argument
  • an already-finished value → it's an actual call, so emit it

And we drop that result into the destination slot. This is the part that actually pays off. A partial just sits there accumulating arguments, triplet after triplet, which ultimately does nothing - precisely as the curried thunks do nothing until they're saturated. Only when its argument count reaches the correct number of cycles needed does it fire: we lower the opcode body into a concrete IR expression and the slot graduates from a partial to a finished value.

This is exactly where those aforementioned megabytes of data shrink. The overwhelming majority of those triplets exist purely to thread one more argument into a partial. None of them survive on their own; they fold into the single instruction that gets emitted the moment the opcode is finally saturated.

When an opcode does fire, inlining the result everywhere would be a mistake. Instead, if it's something cheap (let's say a literal, a bare slot reference, etc), we assign it straight to the slot. Otherwise, we bind it to a fresh temporary and point the slot at that. Either way we record the slot's current definition so that any later triplet reading that slot picks up the latest expression rather than a stale one. Think of it as an SSA-flavoured value numbering over the VM's register file, that gets built up as we go.

But just walking the execution loop isn't enough. Most opcode bodies force the arguments they're supplied (calling them) - the VM passes thunks, not finished values, so the body has to call them to pull the value out. Take the opcode we showed earlier:

function(v66) {
  return function(v67) {
    return "false[object Window]"[v66()];
  };
},

See how v66 is called? That's because they don't supply a number or a character so that the VM can look that up, instead they supply a thunk with the underlying value, which is one away from being saturated. So this means we have to also track when a thunk is forced or called within an opcode body.

Sure enough, as the opening stretch fires, the synthesized-constant bodies fold down to single characters - the bootstrap really is just assembling a character lookup table.

Something else interesting to note: they use type coercion (again) by adding two opcode-18 thunks together, which results in a string for the VM to use to extract more characters.

Saturation and Convergence

The linear walk gets us a long way, but it doesn't get us all the way. Plenty of opcodes never reach their cycle count during the walk - they're handed to other opcodes as values, not called directly. The VM composes opcodes: it passes opcode B as an argument to combinator A, and only the combinator decides when (or whether) B fires. After flattening, those show up in the IR as residual partials - things that still read v3[op](x)(y) because, locally, we never saw the final forcing call. They're correct, but they're opaque.

So we run a second phase that does on the IR what the walk did on the triplet stream: keep firing opcodes until nothing is left to fire. Concretely it's a fixpoint loop of a few passes:

  • saturate - find any residual partial whose collected argument count has reached the opcode's cycle count, and substitute the opcode's lowered body inline, replacing each Arg(n) with the supplied argument. If an argument is used more than once in the body, we hoist it into a temporary first so we don't duplicate work or re-trigger effects.
  • fold - constant-fold and simplify the result (string-method folding, coercion collapse, the unary/const arithmetic).
  • inline - the SSA-style cleanup: a slot or temporary written once and read once (or holding something cheap) gets propagated into its single use; a definition with zero remaining reads is dropped if it's pure, or demoted to a bare effect if it isn't.

Each round feeds the next, and we count residual partials between rounds. The moment a round fails to reduce the count, we've converged and stop (capped, by default, at six rounds). There's also an optional lambda pass for the cases where a partial is genuinely under-applied and can't be saturated - rather than leave a raw v3[op] reference, we eta-expand it into an explicit arrow with fresh parameters, so even the leftover plumbing reads as real JavaScript.

This is the difference between "we emulated the loop" and "we devirtualized it". The flatten walk transcribes the trace; saturation collapses the composition the trace was threading together, so a deeply-curried combinator nest becomes a single readable expression.

Recovering Control Flow

Pure expressions and effects aren't enough on their own - the interrogator loops and it catches exceptions, and a faithful transcription has to show that. The VM has no jump opcodes; control flow is encoded the same way as everything else, as opcodes whose bodies happen to have a particular shape.

A loop opcode is a thunk wrapping a while whose test and body are both forced arguments. A try/catch opcode is a thunk wrapping a try whose handler re-invokes its argument with the caught value. The classifier recognises exactly these shapes as the While and TryCatch roles, and the lowering keeps while, for…in and try/catch as first-class IR nodes (WhileStmt, ForInStmt, TryCatchStmt) instead of flattening them into calls. When such an opcode saturates, we emit a real while (…) { … } or try { … } catch (__err) { … } rather than an opaque combinator call.

Loops force one extra bit of care in the optimizer. Our value propagation normally threads a slot's known value forward to its next read - but inside a loop body, a slot written on one iteration is read on the next, so propagating a single-pass value would be wrong. Before propagating into a while, we scan the loop for every slot it writes and mark those as off-limits, so the loop's own state never gets constant-folded out from under it.

Bootstrapping

Now technically, they could make any ASCII character they wanted if they had a String.fromCharCode opcode, but that would be too telling. Instead, they construct it themselves with those extracted characters they got from the former type coercion.

How? It's simple. They already have all of the characters they need: c, o, n, s, t, r, u, f, m, C, h, a, d, e. Of course, with these you could only spell constructor and fromCharCode - but that's all they need. When you index a string with ["constructor"] in JavaScript, you get the String constructor function, and fromCharCode is a static method hanging off it - so a second index, ["fromCharCode"], reaches it. Which is exactly what the VM does.

You might be asking where they got some of their vocabulary from, like the C. That actually comes from an interesting usage of window.atob - JavaScript's built-in base64 decoder. They'd encode things like "128false" and use the characters from the decoded output in their lookups.

From here, they have all they need to do absolutely anything they want. So, what do they actually do?

Program Synthesis

So what do they actually do with fromCharCode? More vocabulary, predictably.

Every constant we'd seen so far was a single character - coerced out of a primitive, one at a time. That's fine for spelling constructor, fromCharCode and atob, but the interrogator needs a lot more than that. DOM method names, WebGL constants, font names, automation framework identifiers. Building those character-by-character would be brutal. So they don't.

Instead, the entire vocabulary comes from encoded base64 strings that the VM assembles during the bootstrapping phase.

It isn't just base64 though. If it were, you'd see the words in plaintext after a simple atob call and the game would pretty much be over. Instead, each of these blobs - yes, there are multiple - are first base64-decoded and then run through a short chain of byte transformations before they become readable.

The passes of byte transformations these blobs go through are polymorphic. In one of our samples, the first pool is 2444 base64 characters and decodes to 1833 bytes. The passes it uses are:

SwapAdjacent, RotateRight(251, 7), XorChain(28, 113, ...), XorChain(19, 113, ...)

Keep these pass names in mind - they'll come in handy later.

But here's the part that matters for our tool: we don't just guess the pass chain. We read it straight out of our IR - one of the many benefits of using our abstracted expression types. By the time we've flattened and named the fold opcodes, the exact sequence - including which primitive, in what order, and with which immediate and which key length - is sitting right there in the decode routine.

And we recognise each pass by what its loop body does, not by where it sits. Each transform opcode is a while over a byte buffer, and the set of operators and constants it uses is a fingerprint. A loop that combines |, <<, >>, & with the constants 255 and 7 is a bit-rotate. A loop whose only meaningful operator is ^ against a key byte (chaining through the constant 113) is an XorChain. A loop that writes to both index i and index i+1 is SwapAdjacent; one that writes to 2i and 2i+1 is an Interleave; one that writes two distinct slots per iteration is a Reverse. So the classifier walks each fold opcode's body, collects its operators, constants and the shape of the index it writes to, and maps that to a known primitive. The result is a per-slot table of "slot N is a SwapAdjacent, slot M is an XorChain with this key", emitted to JSON next to the synthesized program.

So we re-derive that chain per pool, pull the immediates and keystreams out of the IR, and just run the primitives ourselves - we keep a byte-for-byte Rust reimplementation of each one (rotate, xor-chain, swap-adjacent, interleave, reverse, …) so the chain replays exactly. Out the other end falls a single packed blob, and after reversing it we get:

…getImageDatargb(255,0,255)threshold…createOscillator…
fromCharCode…__SENTRY__…webdriver…__selenium_evaluate…_phantom…
MAX_VERTEX_TEXTURE_IMAGE_UNITS…OfflineAudioContext…userAgent…

So the VM does have a constant pool - it's just encoded. And this is where things get useful. Before, a signal read was an opaque slot index that meant nothing. Once we substitute all the .substrings and .slices, we get window["webdriver"], canvas["getContext"], "__SENTRY__" in window.

Retroactively, it also helps to explain the opening stretch we hand-waved earlier. The bootstrapping phase wasn't just assembling a character lookup map - it was assembling the lookup map and the cipher, along with the precise mechanism required to unpack and decode these blobs.

Side Effects: the constructor install

Up to now every opcode we've folded has been pure - it takes thunks, returns a value and the value is all that matters. But the VM has to actually do something eventually, and the first thing it does is install itself:

function(v24) {
  return function(v25) {
    return function(v26) {
      return window[v24()] = v25();
    };
  };
},

This is the opcode that assigns window.reese84interrogatorconstructor. It's curried just like the others - three nested thunks - so it sits inert accumulating arguments triplet after triplet and fires nothing until it's saturated. There's no special "install" instruction; the side effect is a saturated opcode.

When we transcribe it, the assignment can't be treated like a value. A pure result folds into the slot graph and may vanish if nothing reads it and we have a DCE pass - but an Assign, Call, or New is an observable effect, and dropping or reordering it changes what the program does.

So the rule is: the moment an opcode body produces an Assign/Call/New, we pin it into the instruction stream in execution order as an explicit effect, and let only its result flow onward. Value propagation is allowed to thread constants through pure slots, but it is never allowed to cross an effect it can't prove pure - the instant we hit an opaque call or a window write, we forget what we knew about the affected slot and resume from the effect.

IIFEs follow the same model. An IIFE is just an opcode whose body is a block and whose role we've tagged as iife. Instead of inlining it into an expression, we decompose its body back into a sequence of instructions and re-emit it as (function(){...})() in our IR. The frame-pushing variants get an explicit scope so the VM's slot[2] save/restore stays visible instead of being flattened into the surrounding code.

Cleaning Up

Faithfulness leaves litter. The bootstrap, in particular, writes a constant into a slot[2] cell, reads it back a few instructions later, writes the next one, and so on - hundreds of times. After saturation the values are concrete, so we run a dedicated slot[2] propagation pass: walk the instruction stream tracking the current known value of each frame cell, and substitute reads with that value. The instant we hit something that could invalidate a cell - a full slot[2] replacement, a frame push, an opaque call - we forget what we knew, exactly like the effect rule from the last section. Crucially this is the frame register file, separate from the SSA-style slot/temp inlining that runs in the saturation loop; the two together are what shrink the character-table churn down to the handful of lines that actually matter.

Once values stop flowing into a cell, the writes that produced them become dead. A dead-store pass sweeps the stream: a pure slot[2] write with no subsequent read before the next write (and no intervening invalidation) is removed. Effectful writes are never touched - dropping an observable effect would break the very faithfulness we've been guarding.

Checking Our Work

At this point we've made a lot of claims: that the bootstrap builds a cipher, that pool N decodes to this vocabulary, that these fold opcodes are these primitives. It would be easy to be confidently wrong. So we don't just transcribe the VM, we also run our transcription - against a small abstract interpreter that understands the same value universe the VM lives in: numbers, byte strings, ArrayBuffer/Uint8Array views, objects, arrays, atob, String.fromCharCode, and the frame model.

The interpreter executes the recovered IR directly. It resolves the synthesized constants, materialises the pools, follows the fold chains, and reports back what actually landed in each slot - pool previews, user-function call counts, the works. If our recovered decode chain is right, the interpreter spits out the same readable vocabulary we derived statically; if we'd misread a pass or an immediate, the bytes would come out as garbage and we'd know immediately. It's the difference between "this looks like a SwapAdjacent" and "running it produces webdriver".

Closing

Zooming out: lower each curried body into a tiny expression IR, classify every body into a role, shadow-execute the triplet stream to transcribe the trace, then saturate and clean until the composition collapses into readable JavaScript - all statically, without ever running the original sample. What started as a wall of base64 and a pile of functions that "don't do anything" ends as a program we can read top to bottom: the cipher it builds, the pools it unpacks, the window.reese84interrogatorconstructor it installs, and the list of every automation framework, WebGL constant and DOM probe it goes looking for.

And that readability is the actual prize. Once the signals are no longer opaque slot indices but window["webdriver"] and canvas["getContext"] and "__SENTRY__" in window, the interrogation stops being a black box and becomes something you can reason about - and reproduce.

If you want to read the recovered IR yourself, the dump is up at debug.cat/recapsula.ir.

Before you click it: this isn't the polished output. It's an earlier, experimental build with fewer cleanup passes applied, so you'll see more of the VM's clothing left on - residual v3[N](...) partials that never got saturated, explicit force(...) calls, and a scattering of debugging artifacts (__op(...) references, leftover scope labels, the odd commented-out opcode id). The constant pools are decoded and the structure is all there, but consider it a window into the machine mid-disassembly rather than a finished, runnable program. If something looks half-transcribed, it probably is - that's the point of shipping the raw thing.

There should be another post up shortly that details the process of actually reproducing your own cookie.