CodeQL: the boy who cried Error

Ce contenu n’est pas encore disponible dans votre langue.

21 mai 2026

We enabled GitHub CodeQL on OculiX in April 2026. The repository inherits sixteen years of Sikuli and SikuliX history before our continuation began, so the scanner was pointed at roughly 110 000 lines of Java accumulated since 2009.

It returned a hundred or so findings across all severities. Nineteen of them were classified Error, the highest tier. The classification suggests urgency: each Error is, in CodeQL’s taxonomy, the kind of thing a maintainer should drop everything to fix.

We dropped nothing. We triaged each one. After triage, the result is unambiguous: zero of the nineteen findings have caused an observable user incident in fifteen years. Most of them are dormant architectural defects, dead code paths, or false positives produced by structural blindness in the analyzer.

Meanwhile, fifty-three missing catches for NumberFormatException — places where parsing a user-provided number can crash the application — are classified Note. Almost invisible. Below Warning. Far below Error.

This post walks through the nineteen findings, explains why none of them actually warrant the Error tier, and explains why the things that should warrant it are buried two levels down. The conclusion is not that static analysis is useless. The conclusion is that CodeQL’s severity classification is broken in both directions — overstating the trivial, understating the serious — and that the noise this generates is the main thing keeping the tool from being useful at scale.

Methodology

For each finding, we asked three questions:

Does it produce an observable user incident? Has it caused a crash, a wrong result, a regression in fifteen years of use? Has anyone filed a ticket against it?
If not, would it produce one if a specific code path were exercised? Is it dormant but reachable, or dormant and unreachable?
Is the finding even correct? Does CodeQL’s analysis actually match the runtime behavior, or has it missed something about the language, the bridge to another runtime, or the type system?

Three categories emerged from this triage:

Architectural defects

Real problems in the code that have never been triggered. Either the branch is dead, the configuration is never set, or the upstream dependency is never invoked. The finding is correct but the impact is theoretical.

Code smells

Patterns that look suspicious but have legitimate uses (or, more often, have just persisted as harmless cruft for over a decade). The code is ugly but not broken.

False positives

CodeQL’s analysis is wrong. It missed a dynamic access, a language bridge, an inheritance relationship, or a type guarantee. The finding does not correspond to any real risk in the code.

What we did not find in our nineteen Errors: any case where a user reproduced a crash, a wrong result, or a regression caused by the code at the location CodeQL flagged. Not one.

A tour of the nineteen Errors

Below is the breakdown by CodeQL rule, in descending count.

Container contents are never accessed (8 findings)

CodeQL’s claim: a collection or map is allocated and possibly written to, but its contents are never read back. The implication is that the container is dead code that should be removed.

Eight findings of this kind landed on our codebase. The triage:

Three are genuine dead code. Two local variables named classpath inside ExtensionManager.java, and a local map docParams inside Crawler.java. All three are declared, populated in a loop, and never consumed. Cleanup candidates. None of them have produced a single user-visible defect.

Four are false positives caused by dynamic access. Lists like aliases, filenames, mimeTypes in Lexer.java (the syntax highlighter grammar) are populated by parsing grammar metadata and consumed by a registry mechanism that CodeQL cannot trace. A map called libsLoaded in RunTime.java tracks loaded native libraries and is consulted before each native load — but the consumption goes through a debug helper that CodeQL’s call-graph analysis does not follow. From the static point of view, the containers look unread. From the runtime point of view, they are read on every invocation.

One is the most interesting finding in the entire set. SikulixServer.allowedIPs is declared like this:

private static List<String> allowedIPs = new ArrayList<>();

The class is the Sikuli network server. The list is named after a security feature: an IP allow-list to restrict who can connect. There is a method called makeAllowedIPs that populates the list. There is no code that reads the list. No filter checks against it. No connection-handling code consults it. The allow-list is built but never enforced.

If an administrator configures allowedIPs thinking they are restricting access, they are not. The server accepts every connection regardless of the configured allow-list, because the list is allocated, filled, and never read.

This is a security defect. A serious one, by any reasonable definition. CodeQL classifies it as Maintainability — Error. Not Security. Not Critical. Just “your container is unused”, as if the impact were cosmetic.

We have opened an internal issue to investigate whether the network server is reachable in any default OculiX deployment, and to either remove the network server entirely or wire up the missing filter. As of this writing, the defect has not produced an incident report in any of the years it has existed — but that is more an accident of the network server’s low usage than a vindication of CodeQL’s classification.

Self assignment (2 findings)

CodeQL’s claim: a variable is being assigned to itself. The assignment has no effect.

The first finding is in Runner.java:396:

public static void setLastScriptRunReturnCode(int lastScriptRunReturnCode) {
    lastScriptRunReturnCode = lastScriptRunReturnCode;
}

The setter’s parameter shadows the static field. The author intended Runner.lastScriptRunReturnCode = lastScriptRunReturnCode; or this.lastScriptRunReturnCode = lastScriptRunReturnCode;. As written, the setter is inert. It has been inert since whenever it was added — almost certainly in the mid-2010s, possibly earlier.

For fifteen years, any caller invoking Runner.setLastScriptRunReturnCode(5) has stored exactly nothing. The field lastScriptRunReturnCode stays at its initial value of 0 forever. Anyone reading the field gets 0 regardless of what scripts have actually returned.

Has anyone noticed? Apparently not. Either the field is never consulted, or the consumers do not depend on the value being non-zero, or the corner cases where it would matter have not been hit in fifteen years of cumulative production use. The “bug” is real in the source code, but the observable behavior is no different from what it would be if the setter worked correctly. The downstream code path has adapted to the dysfunction.

The second finding is in Deflate.java:1703, inside com.jcraft.jzlib (the JZlib compression library, embedded for SSH purposes):

Deflate dest = (Deflate) super.clone();
dest.pending_buf = dup(dest.pending_buf);
dest.d_buf = dest.d_buf;
dest.l_buf = dup(dest.l_buf);
dest.window = dup(dest.window);

The clone() method deep-duplicates every buffer except d_buf. The assignment dest.d_buf = dest.d_buf; is a typo where dup(dest.d_buf) was intended. The result is that the cloned Deflate instance shares its d_buf with the original. Modifying the buffer through either reference is visible through both.

The defect is in an upstream library that nobody has actively maintained for years. The OculiX project carries it because the library is bundled. The risk is real but only materializes if the Deflate.clone() method is called in a context where both the original and the clone are used concurrently — a rare pattern that JZlib’s tests apparently do not cover and that nobody seems to hit in production.

Both Self assignment findings are genuinely correct. Both have been silently dormant in the codebase for over a decade. Neither has produced a user-visible incident. CodeQL classifies them as Error. The classification suggests they require immediate attention. The empirical evidence suggests they have been irrelevant for a decade.

Reference equality test of boxed types (1 finding)

In RecordedEventsFlow.java:149:

} else if (!modifiers.isEmpty() && characterWithModifiers == character) {
    actions.add(new TypeTextAction(keyText, modifiersTexts));
}

characterWithModifiers and character are boxed Integer (or Character) values. Comparing them with == compares object references, not values. Java caches boxed integers from -128 to 127, so for any ASCII codepoint, two boxed instances of the same value point to the same cached object, and == returns true by accident. For codepoints above 127 — accented Latin, CJK, Cyrillic, Arabic, Hindi, almost any non-ASCII script — the cache does not apply, and == returns false even when the values are equal.

The defect is correct. It has existed for years. The empirical impact is invisible because:

The recorder is used predominantly with ASCII input in the regions where it is most used.
When the comparison fails for non-ASCII input, the recorder falls through to a different branch that produces a slightly different — but still functional — action.

This is a textbook example of a real defect that has been silently absorbed by the surrounding code’s tolerance. No user has filed a ticket. No CI test has caught it. The fix is one line (.equals() instead of ==). The classification as Error is consistent with CodeQL’s rule but inconsistent with the actual user impact, which is zero.

Equals method does not inspect argument type (1 finding)

In PatternPaneScreenshot.java:36:

@Override
public boolean equals(Object o) {
    return false;
}

CodeQL warns that the method does not check the argument’s type, and might therefore lead to a ClassCastException somewhere downstream. The implementation is trivial: it returns false for any input, including null. There is no cast. There is no downstream usage of the result that depends on the argument’s type.

The override exists inside a Comparator declaration. The Comparator interface allows equals to be overridden to indicate that two comparators impose the same ordering. Returning false is a safe, conservative default: “I do not claim to impose the same ordering as you do.” It does no harm.

The finding is a false positive on this specific code. The override is pointless (it could be removed without changing behavior), but it is not a defect. CodeQL classifies it as Reliability — Error, suggesting it could cause a crash. It cannot. The implementation has no crash path.

Non-callable called (2 findings, Python)

Two findings on Sikuli.py, the Jython-based scripting layer that ships with OculiX. Both report that an imported name is being called as if it were callable when it is not.

import org.sikuli.script.ScreenRemote as SR
SCREEN = SR(adr, str(port))

The CodeQL Python analyzer treats this as CPython. It sees import org.sikuli.script.ScreenRemote as importing an attribute, has no idea that ScreenRemote is a Java class, and concludes that calling SR(...) is invalid.

In Jython, this import is valid Java-to-Python bridging. SR(...) invokes the Java constructor ScreenRemote(String, String). The runtime knows this. CodeQL does not.

This is structural blindness across language runtimes. CodeQL Python has no awareness of the Jython platform. Every Java class accessed from a Jython script will be flagged as “non-callable called”. The number of such findings will grow linearly with the size of the scripting layer.

Both findings are false positives. There is no fix on the project side. The recourse is to disable the rule in .codeql.yml or to suppress the findings manually one by one.

Dereferenced variable is always null (1 finding)

In ButtonGenCommand.java:127:

if (pref.getAutoCaptureForCmdButtons()) {
    btnCapture = null; //TODO new ButtonCapture(pane, line);
    pane.insertComponent(btnCapture);
    btnCapture.captureWithAutoDelay();
}

The author explicitly assigned null, left a TODO comment, and then dereferenced the variable twice. Whoever opens the IDE with the AutoCaptureForCmdButtons preference enabled will crash on btnCapture.captureWithAutoDelay().

This is the only finding in the set that is a definite crash trap. CodeQL is correct to flag it. Calling it Error is, for once, defensible.

Has anyone hit the crash? The preference is off by default, and the TODO has lived in the code for years, which suggests nobody enables this preference in practice. The crash is a real risk but a dormant one. Fix: replace the null with new ButtonCapture(pane, line) and finish the TODO.

Contradictory type checks (1 finding)

In Offset.java:152:

public <RMILO> Offset(RMILO whatEver) {
    if (whatEver instanceof Region || whatEver instanceof Match) {
        Region what = (Region) whatEver;

CodeQL claims that when whatEver is Match but not Region, the cast to Region will fail at runtime. This would be true if Match and Region were unrelated types. They are not. Match extends Region throughout the Sikuli class hierarchy. Every Match is also a Region. The cast is safe.

CodeQL does not consult the inheritance graph of the project, so it treats the two types as disjoint. The OR condition is redundant (whatEver instanceof Region already covers the Match case), but it is not unsafe.

The finding is a false positive caused by type system blindness. The check on Match is dead, in the sense that it is always reached on objects that already match instanceof Region. Removing it is a cleanup, not a bug fix.

Type mismatch on container modification (1 finding)

In HotkeyManager.java:263:

boolean res = _instance._removeHotkey(key, mod);
if (res) {
    hotkeys.remove(key);
}

hotkeys is a Map keyed on String. key is Integer (an AWT virtual key code). Map.remove(Object) accepts any Object for backward compatibility with pre-generics Java, so the call compiles cleanly. At runtime, no entry in the map has an Integer key, so the remove silently does nothing. The intended key for the map is a textual representation that must be constructed from key and mod.

The hotkey is reported as removed (because _removeHotkey succeeded at unbinding the OS-level listener), but the entry stays in the map indefinitely. If hotkeys is consulted to dispatch incoming events, the old hotkey’s listener continues to fire.

The defect is real and observable in theory. In practice, no user has filed a ticket about a hotkey persisting after rebind. Either:

The listener-dispatch path does not consult the map.
Nobody rebinds hotkeys.
The fallout is silently absorbed by other code.

Whichever explanation applies, the failure mode of this defect is dormant in the same way as the others. CodeQL’s classification as Error matches the technical risk but overstates the empirical urgency.

Missing format argument (2 findings)

In Commons.java:1525:

RunTime.terminate(999, "Commons.loadLib: %s");

The format string demands an argument, none is supplied. When this line is reached, the String.format call throws IllegalFormatException. The thrown exception is wrapped inside the terminate call, which means the user sees a confusing format-related error instead of a clean shutdown with code 999 and a useful diagnostic message.

The branch is an error path inside the native library loading code. It executes when a library fails to load — itself a rare event. The defect aggravates the user experience precisely at the moment when it is already poor (something is breaking, and now the error reporting is also breaking), but it does not produce its own incident on a healthy system.

The second finding is Crawler.java:441:

docMethod = String.format(pyMethod, clazz, method, docParams, returns);

The format string pyMethod expects five arguments, four are supplied. Every invocation of this method throws IllegalFormatException. The Crawler appears to be a documentation generator that has not been run in a long time — the API it scrapes for has changed, but the generator has not been touched to follow. The defect is invisible because the tool is no longer used.

Both findings are real defects. Both are essentially dead code (one on a rare error path, one in an unused tool). Their classification as Error is technically correct and operationally misleading.

What the classification actually looks like

The total: 19 findings classified as Error. After triage:

Category	Count	Share
Dead-code paths with crash potential	4	21 %
Architectural defects with dormant impact	8	42 %
False positives (structural blindness)	7	37 %
Findings with observed user incidents	0	0 %

Zero observed user incidents in fifteen years across all nineteen findings.

Meanwhile, in the layers below Error, CodeQL reports the following counts on the same codebase:

Rule	Findings	Classified
Missing catch of NumberFormatException	53	Note
Dereferenced variable may be null	38	Warning
Potential input resource leak	24	Warning
Ignored error status of call	23	Note
Implicit conversion from array to string	11	Note
Potential output resource leak	11	Warning
Use of default toString()	3	Note
Non-synchronized override of synchronized method	2	(below visible threshold)

Fifty-three places where parsing a user-provided number can crash the application. Classified Note. Almost invisible.

Twenty-four places where a file or network resource may not be closed, leading to file descriptor exhaustion on a long-running server. Classified Warning. Below Error.

Two places where overriding a synchronized method without re-applying synchronized creates a race condition in multi-threaded code. Buried so deep it does not even appear in the default UI summary.

Where the severity system breaks down

The pattern across the data is unambiguous. CodeQL’s Error tier is dominated by structural patterns that may produce defects in extreme cases but routinely do not. The Warning and Note tiers contain patterns that frequently produce real production incidents on real systems. The mapping from classification to operational urgency is inverted.

The reasons appear to be structural to how the tool reasons:

Definite vs probable

CodeQL favors definite analytical conclusions (“this is always null”, “this assigns to itself”) over probabilistic ones (“this might be null”). Definite conclusions get Error. Probabilistic ones get Warning or Note. But probable failures on 53 lines of code matter more than definite failures on 2 dormant lines.

Pattern-level severity

Every instance of a rule inherits the rule’s severity. There is no per-instance reasoning. A self-assignment that is dead code and a self-assignment that breaks a hot path receive the same Error tier. The reviewer cannot distinguish them from the severity column.

Cosmetic and structural together

Rules like “Container contents are never accessed” cover three very different real-world cases: dead variables, dynamically accessed structures, and security-critical filters that fail open. All three receive Maintainability — Error. The CISO reviewing the report cannot find the security one without reading every finding.

No cross-rule prioritization

A self-assignment with no impact is Error. A NumberFormatException not caught on user input is Note. There is no mechanism in CodeQL to bring the second above the first in any output. The order of rules in the output is alphabetic or chronological, never by operational risk.

The combined effect is that a developer who opens CodeQL on a mature codebase sees a few dozen “Errors” — most of them irrelevant — and a few hundred “Notes” — among which the genuinely concerning ones are buried. The natural response, after the third or fourth review, is to stop opening CodeQL.

What CodeQL actually catches well

This post would be unfair if it described only the failure modes. CodeQL does some things well, and the things it does well are worth naming.

Definite null dereference (when the analyzer is sure, not “may be”). The ButtonGenCommand finding is a real crash trap. The rule has high precision.

Type-incompatible container modification. The HotkeyManager finding is a real bug that compiles cleanly and silently misbehaves at runtime. Without static analysis, this kind of generic-erasure pitfall is genuinely hard to catch.

Format string desync. Both Missing format argument findings are real. The rule has very high precision because the analysis is local.

Self-assignment. Both findings are real. The rule has perfect precision because there is no plausible legitimate use.

What unites these is that they are narrow, local, and high-confidence rules. CodeQL is at its best when the pattern is sharply defined and the analysis is bounded. It is at its worst when it generalizes — across rules, across severities, across code paths it cannot follow.

How we configured CodeQL to be useful

After triage, the actions we took for this scan:

Suppressed the Jython-related rules entirely. Non-callable called and a few other Python rules produce structural false positives on every Jython-bridged file. There is no fix on our side. Suppressing the rule per-file via comments is more cost than benefit. We disable the rule in .codeql.yml.
Filed an internal issue for SikulixServer.allowedIPs to investigate whether the network server is reachable and to either fix the missing filter or remove the server. The finding sat under Maintainability — Error, where nobody who triages security findings would look for it.
Treated the 53 NumberFormatException findings as a focused remediation backlog, separately from the Error-tier findings. They are not the same kind of work and should not be in the same queue.
Documented our dismissals. For every Error-tier finding we dismissed, we wrote a one-line justification in the commit that closed the alert. This is the documentation our future selves will need when CodeQL re-flags the same finding after a re-scan.
Did not aim for zero Errors. Zero CodeQL Errors is not a meaningful goal. A meaningful goal is zero unjustified Errors and a tracked, finite list of structural issues being worked on.

Why we still keep CodeQL on

Despite all of the above, CodeQL stays enabled on OculiX. The OpenSSF Scorecard checks for a SAST tool integration and rewards projects that have one. Procurement teams in regulated sectors check that CodeQL or an equivalent is configured. The signal value to outside observers is real, even when the operational value is poor.

But the cost of running CodeQL is also real: the triage time, the alert fatigue, the temptation to ignore the tool entirely. We pay that cost in exchange for the external signal. We do not pretend that what we get back internally is anywhere near worth the marketing of static analysis.

If you maintain a similar codebase and you are considering whether to enable CodeQL: enable it, run it once, do the triage we just walked through, and then decide rule-by-rule which queries you actually want to run. The default query suite assumes you have time to read several hundred findings. Most teams do not. The most useful configuration is the one that leaves you with twenty findings a week, all of which are worth opening.

The boy who cried Error eventually had a wolf. We will probably see one, eventually, in a CodeQL scan. We will recognize it because it will be the only finding in months that we cannot dismiss in a single line. Everything else will continue to be the noise that the wolf is hidden in.

Repository: github.com/oculix-org/Oculix

The findings discussed are from a CodeQL scan on commit 29a14d9. Detailed dismissal justifications are visible in the Security tab.