Aller au contenu

The engine assumed one screen. The job needed four.

Ce contenu n’est pas encore disponible dans votre langue.

Seven hours. That’s how long a visual test suite for a Fortune 500 retail group used to take — roughly 300 tests driving point-of-sale terminals, run one after another, against a regulatory compliance window that does not move.

Today it finishes in under two. Four sessions in parallel, on a single 8 GB Azure VM running Windows Server 2019 — a swap file, basic provisioning, nothing exotic. Runtime divided by five. No test farm, no per-minute cloud bill, no new hardware.

The speed is the headline. It’s not the interesting part. The interesting part is that the engine was never built to do this — and neither is almost any visual-automation tool. They assume one screen. I needed four. This is the story of the gap between those two numbers, and what I had to build to close it.

Visual automation was born in a world of one physical screen. One machine, one desktop, one cursor. SikuliX — which OculiX continues — inherited that worldview, and for fifteen years it was the correct one. The engine keeps a notion of the current screen, and very reasonably it keeps it in one place: shared, global state. When there is only ever one screen, a global is not a flaw. It’s the right design.

That assumption is invisible right up until the moment you break it. VNC breaks it.

A VNC session isn’t a screen — it’s a network framebuffer. No monitor, no desktop login, no DISPLAY. You can open a dozen, each on its own port, each pointing at a different remote machine. Suddenly “the current screen” stops being a singleton and becomes a question: which one?

The first time I declared two sessions and ran two test flows at once, they walked all over each other. Not with an error — worse, with quiet nonsense. Clicks landing on the wrong register. A framebuffer read mid-repaint. Results that came back green for entirely the wrong reasons.

I tested it carefully, because I didn’t believe it at first. Two flows, two ports, two remote machines. They polluted each other every single time.

The robot and the client were per-instance — that part was clean. But the port and the registry of live sessions lived in shared, static state. Two threads reaching for the same global. The single-screen assumption, baked into the data model, surfacing at exactly the moment I asked for more than one screen.

What I built — around the engine, not inside it

Section titled “What I built — around the engine, not inside it”

So I built the missing layer myself, on top of the engine rather than in it. Four pieces, each one fixing a failure I had actually watched happen.

Isolation per flow. Each test flow got its own session, bound to its own thread — its own port, its own remote, its own everything. No flow could see another’s state, because there was no shared “current session” left to see. The engine didn’t hand me thread-scoped sessions, so I made them thread-scoped from the outside.

A port is a promise. Two flows must never claim the same port, and “probably free” is not good enough at startup when four of them race. So allocation became atomic: a small dedicated range, and a rule — a flow claims a port through a lock no two flows can win simultaneously, or it doesn’t run at all. A port stops being a guess and becomes a contract.

One tunnel per session. Each remote register was reachable only through an SSH tunnel exposing its framebuffer on a local port. One tunnel per flow, opened on demand, torn down after — and, the detail that cost me a late night, closed before the next one opened. Orphaned tunnels hold ports hostage long after the process that spawned them is gone. Close-before-open, not close-after. Order matters.

Readiness, not optimism. The engine waited a fixed few seconds for a session to come up and hoped for the best. Hope does not survive four sessions starting at once. So I waited for proof instead: the RFB banner — the three bytes a VNC server sends to announce it’s ready for pixels. No banner, no test. A probe, not a sleep.

Four workers, five times faster, looks like broken arithmetic. It isn’t.

The killer in a sequential suite isn’t the total work — it’s the variance in test durations. My ~300 tests ranged from a few seconds to many minutes. Run in a single line, a long test sits in front of forty short ones and they all just wait. Sequential execution doesn’t only pay for the work; it pays for every test queued behind a slower one. With uneven durations that waiting tax is enormous, and invisible, because the CPU is idle the whole time.

I split the load into batches sized by functional area — thirty tests here, ninety there, a hundred and twenty somewhere else, the way the business actually reasons about its registers. Balanced, not just parallel. The four sessions finished at roughly the same time instead of one straggler holding the line.

So the 5× isn’t four workers doing 4× the work. It’s four balanced workers deleting waste the single queue was silently burning. When the thing you replaced is that wasteful, even modest, well-distributed parallelism overshoots the naive ratio.

I want to be precise here, out of respect for the engine and the people who built it. The core does one thing extremely well: clean, reliable, single-session VNC. It kept a single-screen promise and it kept it honestly. The orchestration — the thread isolation, the atomic ports, the tunnels, the readiness probe — was mine, and it lived in my harness. I made a different promise, one layer up, and kept that one too.

For a long time that was the right division of labour. The engine stayed pure; the parallelism was an application concern. But once you’ve named the gap, you can’t un-see it — and you start to wonder why the engine shouldn’t simply do this itself.

The roadmap: making parallel sessions first-class

Section titled “The roadmap: making parallel sessions first-class”

Here’s where it gets exciting, because the next step is obvious now that the gap has a name. Everything I built outside the engine is the engine’s natural next form.

Thread-scoped sessions

No more global “current screen.” A session belongs to the flow that opened it, by construction — so two flows can never reach for the same state, because there is no shared state to reach for.

Port allocation, built in

The engine hands you an isolated session and owns the atomic port claim itself. No range to reserve, no lock-files to babysit, no collisions to debug at 23h.

Readiness on the RFB banner

The optimistic startup sleep replaced by a real probe — the session reports ready when the server actually says it’s ready, not when a timer guesses it might be.

A clean SSH tunnel, already shipped

A pure-Java tunnel (no WSL, no external sshpass) already lives in the core. The next step is wiring it into a documented, first-class parallel pattern.

The goal for the next major version is simple to state and worth the work: declare N sessions, get N isolated sessions. No harness, no lock-files, no late nights over ghost tunnels. The single-screen assumption retired — gently, and with full respect for the fifteen years it served exactly right.

OculiX is MIT, open source, built in the open. If you drive screens for a living — registers, kiosks, terminals, application instances by the dozen — this is the corner of the roadmap I’d most like company on.

Visual automation has a reputation for not scaling. It earned that reputation honestly: the tools were built for one screen, and most of them still are. But the limit was never the pixels, and it was never the matching. It was an assumption — one screen — that nobody had yet needed to question.

Question it. Give the engine real sessions, isolate them, balance the load across uneven work — and a bored 8 GB VM does in two hours what used to take seven. No farm. No new hardware. No per-minute bill.

The screens were never the bottleneck. The assumption was.


OculiX is open-source under the MIT license. The parallel VNC engine — and the work to make multi-session parallelism native in the core — is described in the Parallel VNC execution guide.