Aller au contenu

Storing spatial memory in a PNG: how we made visual UI matching 6.5× faster in OculiX

Ce contenu n’est pas encore disponible dans votre langue.

This is a long technical post about a small change. It explains a measurement we never thought to make, the discovery it produced, the implementation we built around it, and what it means for visual automation suites running in CI.

The headline number is real but not particularly dramatic: visual UI matching in OculiX became roughly 6.5 times faster, measured on 50 cold-start JVM runs on an Intel i3 7th generation laptop. The interesting part is not the speedup itself. It is the path that led to it, and the structural lesson that came with it.

If you maintain or use a visual automation framework, a test suite that watches pixels, or any tool that needs to find an image inside another image repeatedly across separate process invocations, the pattern we describe here will probably apply to you. The implementation took about 200 lines of Java, no external dependencies, and three micro-modifications inside the existing codebase. The standards we relied on have been published since 1996.

What follows is the full story, ordered the way the investigation actually unfolded, not the way a marketing post would tell it.

OculiX is the active continuation of the Sikuli and SikuliX visual automation lineage, MIT-licensed and used in production by close to 100 organizations across banking, defense, healthcare, manufacturing, and retail. Its core operation is a function called find: given a small image (a button, an icon, a region of UI), find it inside the current screen capture and return its coordinates. Every other operation in the public API contains at least one call to find underneath.

Inside the codebase, find is well understood. It uses OpenCV template matching via JNI bindings, with five fallback strategies cascaded inside Finder.java:

Mode 1: Standard match

Exact template matching with the configured similarity threshold. The fast path for stable UI elements.

Mode 2: DPI-aware rescale

If the screen DPI differs from the pattern capture DPI, the template is rescaled before matching.

Mode 3: Tolerant blur

GaussianBlur applied to both source and target. Tolerates antialiasing and subtle color variations.

Mode 4: Grayscale smart

Conversion to grayscale before matching. Tolerates color theme changes.

Mode 5: Multi-scale brute force

Last resort. Tries multiple scales (0.5x to 2x) to catch significantly resized elements.

The code is twenty years old at this point, refined incrementally by the original Sikuli authors at MIT in 2009, by Raimund Hocke from 2010 to 2025 under SikuliX, and now by the OculiX maintainers.

Performance, in this kind of codebase, is rarely benchmarked from scratch. Everyone assumes it is whatever it has always been. A find call takes “some milliseconds”, or “a few hundred milliseconds” on a slow machine, and life goes on.

So I sat down on a Sunday afternoon to actually measure it. Not because there was a problem. Because the question of how fast it really was had never been answered with a number on the current hardware I had in front of me.

The setup was deliberately simple. A standalone Java class called FindTiming, compiled directly against the OculiX complete-win jar, performing exactly one find per JVM invocation, then exiting. A batch script wrapping that class in a loop of fifty separate executions.

So fifty cold starts. Each one paid the JVM startup cost, the OpenCV library load, the Tesseract OCR engine load, the OculiX framework initialization, then performed exactly one find and reported the elapsed time before exiting.

The result, on this five-year-old i3 laptop, was extremely consistent:

MetricValue
Mean502 ms
Median502 ms
Minimum480 ms
Maximum545 ms
Range65 ms
Standard deviation18 ms

Half a second to find a small image on a 1920 by 1080 screen. On a modern Intel i7 or Apple Silicon machine the number would be lower, probably by a factor of two or three. On a GitHub Actions standard runner it would be roughly comparable to the i3. On an older corporate desktop running a Citrix client through a VPN it would be slower again.

Take 500 milliseconds and project it across a test suite:

Suite scaleFind callsPure find time @ 500ms
Small functional suite200100 seconds
Medium regression suite1 0008 min 20 s
Large nightly suite5 00041 min 40 s
Enterprise full coverage20 0002 h 46 min

Multiply by the number of suites running per day in a CI environment, multiply by the number of CI minutes billed by the runner provider, and the cost becomes very real.

That was the baseline. No optimization had been attempted yet. The number was simply the truth about what happened on the metal.

Before deciding what to optimize, the right move is always to look at what the code already does. OculiX inherits sixteen years of optimization attempts from the Sikuli and SikuliX lineage. A naive optimizer would re-read the OpenCV documentation and propose to switch to a faster matching algorithm. That would be a beginner mistake.

The thing to investigate first is whether there is some optimization the existing code already attempts but cannot fully complete in the current configuration. That is almost always where the gains hide.

A few minutes of grep revealed a symmetric pair of fields and a setting that nobody seems to talk about:

Image.lastSeen

Private Rectangle field on every Image object. Stores the position of the most recent successful match. Paired with getLastSeen() and setLastSeen(rect, score) accessors.

Settings.CheckLastSeen

Public static boolean, set to true by default since at least 2018. Enables the optimization at the framework level.

checkLastSeenAndCreateFinder

Private method in Region.java. When called, creates a Finder restricted to the small rectangle around the previous match, falling back to full-screen only if needed.

The intent of these three pieces is clear once you piece them together. After a successful find, the rectangle of the match is stored in the Image object’s lastSeen field. On the next call to find for the same image, if Settings.CheckLastSeen is true and lastSeen is non-null, the code creates a Finder restricted to that small rectangle and tries to match there first. Only if the small-region match fails does it fall back to a full-screen scan.

This is a classic spatial memoization pattern, well-known in computer vision. Sikuli implemented it correctly, a long time ago.

The optimization works beautifully inside a single JVM session. If you run a script that performs screen.find("submit_button.png") ten times in a row, the first call pays the full-screen scan cost, but the next nine calls find the image almost instantly through checkLastSeen. The cache hit rate is essentially 100 percent on stable UIs.

There is, however, a subtle but critical limitation.

This is exactly what our benchmark exposed. Fifty separate JVM invocations, fifty Image objects with lastSeen always equal to null at the moment of the find call, fifty full-screen scans. The checkLastSeen optimization was active and present in the code throughout, but it had no input to work with. The cache was empty because the cache lived inside a process that died right after building it.

This is the core observation. The existing optimization was correct in design. It was simply unable to bridge the gap between JVM invocations. In a typical test environment, where each test is its own process, the optimization never had a chance to engage.

The code that solved the problem was already there, written long before this benchmark was performed, in a careful and well-tested form. The missing piece was not a clever algorithm. It was a way to keep the optimization’s input alive across process boundaries. A storage problem, not a computation problem.

Once the gap was identified, the question became: how do you persist Image.lastSeen between JVM invocations? Several candidate approaches surfaced, each with their own trade-offs.

ApproachDescriptionDrawbacks
Sidecar fileWrite foo.png.position next to each PNGTwo files to commit, risk of desynchronization, clutter
Central project fileSingle .oculix-positions.toml at project rootLinear lookup cost, merge conflicts in parallel CI, full file rewrite per change
PNG ancillary chunkEmbed the position metadata inside the PNG itselfRequires writing PNG-aware code, but standard since 1996

The PNG ancillary chunk option won, for reasons that became clearer as we explored the constraint of CI environments.

PNG ancillary chunks: the underused W3C standard

Section titled “PNG ancillary chunks: the underused W3C standard”

The PNG file format, standardized by the W3C in 1996, is a structured container made of a fixed signature followed by a sequence of chunks. Each chunk has a four-byte length, a four-byte type, a variable-length data payload, and a four-byte CRC32 of the type and data.

The PNG standard distinguishes between critical chunks (IHDR, IDAT, IEND, and others), which are mandatory for decoding the image, and ancillary chunks, which are optional and can be safely ignored by decoders that do not recognize them.

We chose the type code oPLx for our chunk:

  • Lowercase o: ancillary, not critical
  • Uppercase P: private to OculiX
  • Uppercase L: reserved bit
  • Lowercase x: safe to copy

Decoded: an optional, private, safe-to-copy chunk identified by oPLx. Other PNG tools encountering an OculiX-modified file see something they do not recognize, preserve it untouched on save, and ignore it on load. Compatibility is total.

The internal layout of the oPLx chunk’s data payload is deliberately small. The goal was a fixed-size, fast-to-parse, easy-to-debug binary structure. The total payload is exactly 34 bytes:

OffsetSizeTypeFieldNotes
04ASCIIMagic OPL\0Redundant identifier, prevents misinterpretation
42uint16 (BE)VersionCurrently 1. Bump on breaking format change.
64int32 (BE)XPixel coordinate, last successful match
104int32 (BE)YPixel coordinate, last successful match
144int32 (BE)WidthMatch rectangle width, pixels
184int32 (BE)HeightMatch rectangle height, pixels
228int64 (BE)TimestampUNIX epoch milliseconds, last update
304int32 (BE)Run countTotal successful matches since file creation

Total: 34 bytes of payload, plus the standard 12 bytes of PNG chunk framing (4 bytes length, 4 bytes type, 4 bytes CRC32). Each pattern gains 46 bytes of metadata embedded in its PNG file. For a project with 500 patterns, this represents 23 kilobytes of additional space across the entire pattern library. Effectively negligible.

A few decisions deserve commentary.

Big-endian byte order

Non-negotiable. The PNG standard mandates big-endian for all multi-byte integers in chunk fields. Following the same convention inside our payload simplifies parsing and removes confusion with future tooling.

32-bit signed for coordinates

Generous. 16-bit unsigned would have sufficed for single-screen resolutions. We chose 32 bits to leave room for multi-monitor setups where coordinates extend into negative space and tens of thousands of pixels.

64-bit timestamp

Standard UNIX epoch milliseconds. No bytes saved here. Audit trails that span years require room.

Run counter

Allows detecting dead patterns (counter at zero), unstable patterns (high counter on young file), and locked-in patterns (counter grows without timestamp updates).

The chunk is plaintext at this stage. The integrity check is the standard CRC32 that PNG mandates at the end of every chunk; it catches accidental corruption but is not cryptographically strong.

The actual change inside OculiX comes down to three modifications in two existing files, plus one new utility class. Total addition: roughly 250 lines of Java. No external dependencies. No new Maven coordinates. The standard JDK classes DataInputStream, DataOutputStream, ByteBuffer, and java.util.zip.CRC32 cover all the needs.

1. Image.load() reads the chunk

After ImageIO.read() decodes the pixels, a separate streaming read parses the PNG chunks, locates oPLx, and calls setLastSeen(rect, 1.0) on the current Image instance.

2. doCheckLastSeenAndCreateFinder expands the search

Instead of creating a Region exactly the size of the previous match, it now creates a search box 2.5x larger, clamped to screen bounds. Tolerates UI drift between runs.

3. find() writes the chunk after match

After updating in-memory lastSeen, the chunk in the PNG file is streamed through a temp file and atomic-renamed. The position persists to disk.

Modification 1: Image.load() reads the chunk

Section titled “Modification 1: Image.load() reads the chunk”

In Image.java, the existing load method reads the PNG file from disk into a BufferedImage using ImageIO.read. This call decodes only the image data (the IDAT chunks). It does not parse other chunks. Our addition opens the same file separately, in streaming mode, walks through its chunks until it finds the oPLx chunk if present, parses the position metadata, and calls setLastSeen(rect, 1.0) on the current Image instance.

// In Image.java, after ImageIO.read(fileURL) at line 1018:
try {
File pngFile = new File(fileURL.toURI());
byte[] chunk = PngChunk.read(pngFile, "oPLx");
if (chunk != null && chunk.length >= 34) {
ByteBuffer buf = ByteBuffer.wrap(chunk);
byte[] magic = new byte[4];
buf.get(magic);
if (magic[0] == 'O' && magic[1] == 'P'
&& magic[2] == 'L' && magic[3] == 0
&& buf.getShort() == 1) {
int cx = buf.getInt();
int cy = buf.getInt();
int cw = buf.getInt();
int ch = buf.getInt();
setLastSeen(new Rectangle(cx, cy, cw, ch), 1.0);
}
}
} catch (Exception ignored) {}

The streaming parser is deliberately optimized for the common case where the chunk is present and is one of the first non-critical chunks in the file. It skips the PNG signature (8 bytes), then enters a loop: read the chunk length, read the four-byte chunk type, compare it to oPLx, return the payload if matched, return null if IEND is reached, or skip the chunk data and CRC and continue otherwise.

This is the architectural detail that matters most. The chunk reading is a strict addition that fills a gap in the existing system. It does not replace anything. It does not modify the contract of any existing method. If the chunk is absent (a legacy PNG that has never been processed by OculiX, a PNG whose chunk was stripped by an aggressive optimizer, a PNG generated by an external tool), the code falls through to lastSeen being null, which is exactly the situation the existing codebase has handled for sixteen years. The fall-through path is the cold-start path. It still works. It is just slower.

Section titled “Modification 2: doCheckLastSeenAndCreateFinder expands the search”

In Region.java, the existing method creates a small Region exactly the size of the previous match rectangle. This works well for in-session use, where the image has just been matched at the exact same position. It works less well for cross-process use, where the UI may have drifted slightly between runs.

Our modification expands the search rectangle by a factor of 2.5 around the stored center, clamped to the screen bounds.

// Replace at line 2891:
Rectangle ls = img.getLastSeen();
int sw = (int) (ls.width * 2.5);
int sh = (int) (ls.height * 2.5);
if (sw > screen.w) sw = screen.w;
if (sh > screen.h) sh = screen.h;
int cx = ls.x + ls.width / 2;
int cy = ls.y + ls.height / 2;
int sx = cx - sw / 2;
int sy = cy - sh / 2;
// Translate to stay inside screen, do not truncate
if (sx < 0) sx = 0;
if (sy < 0) sy = 0;
if (sx + sw > screen.w) sx = screen.w - sw;
if (sy + sh > screen.h) sy = screen.h - sh;
Region r = Region.create(sx, sy, sw, sh);

The 2.5 multiplier was chosen empirically:

MultiplierEffect
1.5xOccasionally misses drifted patterns
2.0xMarginal improvement, still occasional misses
2.5xSweet spot: tolerates drift without ambiguity
3.0xNo safety improvement, starts introducing ambiguity on dense UIs
4.0xMultiple visually similar elements get reached, confusing the matcher

The clamping logic preserves the search box size when the pattern is near a screen edge. A pattern at (0, 1022) still gets a full 250 by 145 search box, just positioned at (0, 935) instead of being centered. The match rectangle for the original pattern still fits inside the search box.

Modification 3: find() writes the chunk after a successful match

Section titled “Modification 3: find() writes the chunk after a successful match”

In Region.java, the existing find method already calls img.setLastSeen(lastMatch.getRect(), lastMatch.getScore()) after a successful match. This updates the in-memory lastSeen field. Our addition extends this call with a write to the PNG file’s oPLx chunk.

// At line 2284 of Region.find(), after setLastSeen:
img.setLastSeen(lastMatch.getRect(), lastMatch.getScore());
// New: persist to PNG chunk
try {
File pngFile = new File(img.getFileURL().toURI());
ByteBuffer buf = ByteBuffer.allocate(34);
buf.put((byte) 'O').put((byte) 'P').put((byte) 'L').put((byte) 0);
buf.putShort((short) 1);
Rectangle r = lastMatch.getRect();
buf.putInt(r.x).putInt(r.y).putInt(r.width).putInt(r.height);
buf.putLong(System.currentTimeMillis());
buf.putInt(getRunCount(img) + 1);
PngChunk.write(pngFile, "oPLx", buf.array());
} catch (Exception ignored) {}

The chunk-writing logic streams through the PNG file once, copying every chunk through to a temporary file, replacing the oPLx chunk in place if it already exists, or inserting a fresh oPLx chunk before the IEND marker if not. At the end, the temporary file replaces the original via an atomic file system rename.

This streaming approach is more complex than reading the whole file into memory, modifying the byte array, and writing the result back, but it is meaningfully more robust:

PropertyIn-memoryStreaming (chosen)
Memory costProportional to PNG sizeConstant (~8 KB buffer)
Crash safetyRisk of partial writeAtomic rename: old or new, never half
Cost on small PNG< 1 ms1-2 ms
Cost on large PNG (1 MB+)20-50 ms + allocation4-8 ms, no allocation spike

Reading and writing PNG chunks does not require an external library. The format is simple enough that a 200-line utility class handles both operations with zero dependencies. The class exposes two methods:

public static byte[] read(File png, String type) throws IOException

Returns the payload bytes if the chunk is found, null otherwise. The read uses streaming DataInputStream, with skip() to bypass non-target chunks. Exits as soon as the target chunk is located.

The total code in PngChunk.java is 218 lines including comments, blank lines, and the class declaration. The file has no imports outside the java.io, java.nio, java.util.Arrays, java.util.zip.CRC32, and java.nio.charset.StandardCharsets packages, all standard JDK.

A benchmark is only useful if its methodology is described in enough detail that someone else can reproduce it.

ItemValue
CPUIntel Core i3-7100 (2 cores, 4 threads, 3.9 GHz)
RAM8 GB DDR4-2400
StorageNVMe SSD
OSWindows 10 build 19045
JavaOpenJDK 25 from Eclipse Temurin
OculiX3.0.3 release, feature branch with modifications
Screen1920 × 1080, no DPI scaling
PatternWindows search bar fragment, 12 × 58 pixels, at (0, 1022)

Each benchmark run consisted of fifty independent JVM invocations, each:

  1. Starting a fresh process
  2. Loading OculiX and its native dependencies
  3. Performing exactly one find call against the screen
  4. Printing the elapsed time in milliseconds to standard output
  5. Exiting

The elapsed time was measured using System.nanoTime() immediately before and after the screen.find(pattern) call. This excludes JVM startup, library loading, and framework initialization. It includes only the find operation itself, including the screen capture inside the find.

A separate post-processing class read the fifty timing lines from the captured log and computed: arithmetic mean, median, minimum, maximum, range, and standard deviation.

ScenarioInitial state of PNGExpected behavior
BaselineNo oPLx chunkAll 50 runs in full-screen scan
OptimizedoPLx chunk pre-writtenAll 50 runs in small-region scan

Both scenarios were measured cold-start (50 separate JVM invocations) to ensure the comparison reflects the CI environment.

The numbers are the headline of this post. Let me state them precisely.

MetricBaseline (FULL)Optimized (ROI)Improvement
n5050
Mean502 ms77.7 ms×6.46
Median502 ms77 ms×6.5
Minimum480 ms63 ms×7.6
Maximum545 ms113 ms×4.8
Range65 ms50 mscomparable
Standard deviation18 ms11 mstighter

Speedup factor on the find call alone: 6.46.

Low variance in both conditions

Standard deviation is 4% of mean in baseline, 14% in optimized. Neither is noisy enough to require repeated measurement.

Optimized scenario has a floor

Roughly 60-70 ms cannot be reduced further by this technique. Dominated by screen capture (20-40 ms) and OpenCV setup (20-40 ms).

Speedup depends on pattern size

Small patterns (under 50×50 px) give the largest speedup. Large patterns (300×300+ px) give a smaller speedup because the 2.5x search region itself becomes substantial.

Faster hardware preserves ratio

On modern CPUs, absolute timings shrink but the relative speedup factor stays at ×6-8. The overhead floor is proportionally less significant.

If we extrapolate to other typical hardware:

HardwareBaseline (FULL)Optimized (ROI)Speedup
i3 7th gen (measured)502 ms77 ms×6.5
i7-13700K (projected)~150-180 ms~22-25 ms×8
Apple M2/M3 (projected)~80-120 ms~10-15 ms×8
GitHub Actions standard runner (projected)~200-250 ms~28-35 ms×7

The benchmark measures one find operation in isolation. A real-world test suite performs many operations, each containing at least one find. Translating the per-operation speedup into a suite-level wall-clock improvement requires a few assumptions.

Consider a moderately sized regression suite: 100 test cases, each performing 20 visual interactions on average, for 2000 total find calls.

ScenarioTime per findTotal find timeSavings
Baseline (i3)500 ms16 min 40 s
Optimized (i3)77 ms2 min 34 s14 min 6 s
Optimized (modern i7)22 ms44 seconds15 min 56 s

If the suite runs once per pull request on a CI runner billed at $0.008 per minute (the rough GitHub Actions standard runner cost):

ActivityCost per run (baseline)Cost per run (optimized)Annual saving (50 PR/day, 250 days)
Suite execution$0.13$0.02~$1 375 per suite

For organizations running multiple suites in parallel across many projects, the saving compounds quickly.

Find is not the only cost

The speedup applies to find time only. The full suite also contains network calls, server processing, browser rendering, mouse and keyboard events, and waits. A suite where finds dominate (visual regression, end-to-end smoke) sees larger wall-clock improvement than a suite waiting on slow back-ends.

Optimization rewards stability

Teams that regenerate patterns frequently see the chunk reset on each regeneration. The first run after regeneration pays the full-screen scan cost. The optimization rewards UI stability and frequent test execution — both characteristics of mature codebases.

Why this survives git clone and CI runners

Section titled “Why this survives git clone and CI runners”

The structural argument for embedding the position in the PNG itself, rather than in a sidecar file or a central index, is best expressed by considering the CI lifecycle.

A CI build typically starts from a clean state. A fresh container is provisioned, the repository is cloned from origin, dependencies are installed, the build runs, the tests run, the artifacts are collected, and the container is destroyed. There is no persistent state between builds. There is no shared file system. There is no cache that can be relied upon to be hot.

The oPLx chunk inside the PNG removes this discipline burden entirely. The chunk is part of the PNG file itself. When a developer runs the test suite locally and the chunk is updated, the modified PNG appears in git status automatically. Git tracks PNG files because they are committed in the project. The developer cannot commit “the update” without committing “the chunk inside the update”, because they are the same file.

This means that when a CI build clones the repository, it gets the PNG file with the chunk already inside. The first test run in CI is already in the fast path. The optimization is active from the first second. No warm-up phase is needed. No cache must be primed. The position metadata is in the file, and the file is in the repository, and the repository is in the clone.

A team running the suite locally, committing the regenerated PNGs, and pushing to CI is effectively pre-warming the CI cache through normal development activity. There is no separate “cache warm-up” step. The cache warms itself as a side effect of using the tool.

The broader pattern: PNG chunks as runtime context

Section titled “The broader pattern: PNG chunks as runtime context”

The technique of embedding metadata in image files is not new. Adobe XMP, EXIF, IPTC, GIMP layer information, and many other systems use this mechanism. What is new, or at least underused, is the embedding of runtime state — not authoring metadata, not provenance information, but actual operational data that evolves as the file is used.

This pattern generalizes beyond visual automation:

Texture caches for graphics applications

A texture file could embed the time it was last loaded, the GPU memory pool it was allocated from, and the average frame time when it was active. A renderer could use this metadata to predict and pre-load textures.

Build artifacts for incremental compilation

A compiled object file could embed the hash of the source it was compiled from, the compiler version, and the optimization level. An incremental build tool could detect when recompilation is actually necessary.

Machine learning datasets

An image used in a vision model training set could embed its last classification result, the confidence score, and the model version. A cleaning tool could identify mislabeled samples without re-running the model.

Audit logs for compliance

A document image stored in a regulated workflow could embed a signed audit trail of the operations performed on it. The Ed25519 signing pattern from the OculiX MCP module would apply directly.

The common thread across all these examples is the same: keep operational state with the artifact, in a format the artifact’s primary tools will preserve, so that the state survives every distribution mechanism the artifact ever encounters.

A 6.5× speedup on the find is significant but not exhaustive. Several costs remain untouched by this optimization:

Cost componentStatusApproximate weight
Screen captureUnchanged20-40 ms per find
OpenCV setupUnchanged20-40 ms per find
Tesseract OCR loadingUnchanged~300 ms per JVM cold start
OculiX framework initUnchanged~1.5 seconds per JVM cold start
JVM startup itselfUnchanged~1-2 seconds per cold start

Reducing these other costs would require deeper structural changes: pre-built native images via GraalVM (the JVM startup), incremental OpenCV initialization (the OpenCV setup), lazy OCR loading (Tesseract), and possibly an alternative screen capture API on each platform. Each of these is a separate optimization project, none of which is in scope for the current change.

Closing: context engineering as a discipline

Section titled “Closing: context engineering as a discipline”

The lesson of this whole investigation is not “PNG chunks are cool” or “spatial memoization wins”. Both of those are true but uninteresting on their own.

The lesson is structural.

Finding these gaps requires a specific habit: benchmarking systems in the configuration users actually experience, not in the configuration that is convenient for developers. In a tight in-process loop, the OculiX find is fast. In a cold-start CI environment, it is slow. The same algorithm, the same code path, the same operating system. The only difference is whether the context built by the previous match has survived to inform the next match.

The fix is rarely a new algorithm. It is usually a new place to store something that already existed, in a form that can travel through the boundaries the user’s actual workflow imposes.

The PNG ancillary chunk happens to be a particularly elegant place to store visual automation context, because the artifact whose context we want to preserve is itself a PNG file, and the standard that defines PNG includes a mechanism specifically designed for this kind of metadata, and the tooling ecosystem has respected that mechanism for thirty years.

The result is an optimization that did not require inventing a new algorithm, did not require breaking any existing contract, did not require any external dependency, and is fully backward compatible with files that do not yet have the chunk. It just needed someone to measure the actual baseline, identify the gap, and close it.

For anyone maintaining a similar tool, my parting suggestion would be: go measure your cold-start performance against your warm-cache performance, in the same configuration your users actually run. If the gap is large, the optimization opportunity is probably not where you think it is. It is probably in the space between two of your existing components, in the form of a state that briefly exists and then disappears.


Repository: github.com/oculix-org/Oculix

Issue tracking the implementation of the persistent locator chunk: oculix-org/Oculix#353