Skip to content

Why automation tools cannot type Chinese, and what OculiX does instead

In May 2026, an OculiX user filed issue #232. The summary was three lines long. The user wanted to call type with two Chinese characters in their automation script. Instead of typing them, the call produced silence, garbage, or in some IDE configurations a sequence of unrelated Latin characters depending on the active keyboard layout.

The fix, when it landed, was fifteen lines of Java. The explanation behind it stretches back to how java.awt.Robot was designed in 1998, around an implicit assumption that one character on a screen corresponds to one keystroke on a keyboard. That assumption holds beautifully for ASCII Latin. It breaks for Chinese, Japanese, Korean, Arabic, Hindi, and even for accented Latin characters when the user’s keyboard layout does not happen to expose them as direct keys.

This is the story of why visual automation frameworks have historically struggled to type non-ASCII, what OculiX decided to do about it, and why the workaround is more philosophically interesting than the bug it solves.

The user, an active OculiX adopter, was automating an internal Chinese-language application. Their script needed to fill a search field with a person’s name. The straightforward call returned silently. The search field stayed empty. No exception was thrown. No log line surfaced anything unusual. The automation simply continued, the next step failed because the search field had no input, and the test reported FindFailed on whatever element was supposed to appear after the search.

This is the worst kind of automation bug. There is no signal that something went wrong at the moment something went wrong. The error appears three steps later, attached to an unrelated element, with no link back to the actual failure point.

After some investigation, the user established that:

  • ASCII input worked: type("hello") produced “hello” in the field.
  • French accented characters partly worked: typing a word with an accent produced the unaccented version on a US layout.
  • Chinese input failed silently on every layout tested.
  • Switching the OS keyboard layout to a Chinese IME did not help: the OculiX type call still produced garbage.

The bug was filed not as a feature request but as a question: “Is this expected? How is everyone else handling CJK?”

The honest answer was that everyone else was working around it. Manually. Each script that needed to type Chinese was implementing its own custom paste routine inline, copying the string to the clipboard with Jython’s java.awt.Toolkit calls and then sending Ctrl-V via keyDown/keyUp. The workaround was well-known among users who automated CJK applications, but it was not in the documentation, and it required understanding why the straightforward call did not work.

This blog post explains the why, and what we did to make the straightforward call work.

Why Robot.keyPress fundamentally cannot type Chinese

Section titled “Why Robot.keyPress fundamentally cannot type Chinese”

To understand why type with Chinese fails, you have to understand what type actually does under the hood. In OculiX (and in the original Sikuli, and in SikuliX), the call descends through several layers and eventually reaches java.awt.Robot.keyPress(int keycode) and java.awt.Robot.keyRelease(int keycode), both members of the AWT API that has shipped with the Java platform since version 1.3 in May 2000.

Robot.keyPress(int keycode) is a thin wrapper around the operating system’s low-level keyboard event injection mechanism: SendInput on Windows, XTestFakeKeyEvent on X11/Linux, CGEventPost on macOS. All three of these accept a virtual key code, an integer identifier of a physical key on the keyboard. Not a character. Not a Unicode codepoint. A key.

This is the first fault line. When you call Robot.keyPress(KeyEvent.VK_A), you are not asking the system to produce the character “A”. You are asking it to simulate pressing the physical key that, on the currently active keyboard layout, would produce some character. The character depends on the layout. The layout depends on the user’s system configuration. The result depends on both.

For pure ASCII Latin in a Western layout, the mapping is reasonably stable. The key VK_A produces “a” or “A” depending on shift state. The key VK_1 produces “1” or ”!” depending on shift state. Sikuli historically maintained a hardcoded table mapping each ASCII character to a sequence of virtual key codes, and it worked because Western keyboards have a key for every ASCII character.

The trouble starts the moment the character has no physical key on the keyboard.

There is no constant for Chinese characters in java.awt.event.KeyEvent. There cannot be one. Chinese characters do not correspond to keys. A standard Chinese keyboard is, physically, a US-QWERTY keyboard. Chinese characters are produced through an Input Method Editor (IME) that interprets sequences of QWERTY keystrokes as phonetic input (Pinyin), looks up matching characters, displays a candidate menu, and inserts the chosen character into the active text field once the user confirms a selection.

To type a Chinese phrase through an IME, a Chinese user types Pinyin on the physical keyboard, the IME intercepts the sequence, recognizes it as the romanization of a candidate set, presents a candidate menu (because multiple character sequences share that romanization), and waits for the user to either accept the top candidate via space or pick a different one via number keys or arrow keys.

This is not what Robot.keyPress does. Robot.keyPress injects events at a layer beneath the IME. The IME never sees them as Pinyin input. The events are interpreted as raw key events by the receiving application, which sees the Pinyin letters as literal characters. The Chinese intent is lost.

Even more confusingly, on a system where the active keyboard layout is Chinese (with an IME enabled), Robot.keyPress(VK_N) might:

  • Be intercepted by the IME, treated as Pinyin input, and contribute to building a candidate menu that the user never sees because the automation does not pause to select
  • Be passed through to the application as a literal “n” if the IME is in alphanumeric mode
  • Be silently dropped if the IME is mid-composition for another sequence

The behavior is non-deterministic from the automation’s perspective, and entirely dependent on IME state, which the automation cannot reliably read or control through the AWT API.

The same logic applies to:

  • Japanese, where IMEs interpret romaji or kana keystrokes into hiragana, katakana, or kanji
  • Korean, where IMEs assemble hangul syllables from jamo keystrokes
  • Arabic, where the keyboard layout itself is non-Latin and Robot virtual keys do not map predictably
  • Hindi and other Indic scripts, where IMEs combine consonants and vowel marks into syllabic clusters
  • Vietnamese, where IMEs add tonal diacritics through dedicated keys

All these languages share the property that the displayed character is not a one-to-one mapping with a physical keystroke. The mapping is mediated by an IME that operates above the OS keyboard event layer that java.awt.Robot injects into.

Even within the Latin script, the same problem appears in a milder form. Consider typing a French word with an accent on three different keyboard layouts.

LayoutResult of typing a French word with accent
US-QWERTYAccent silently dropped, no VK code maps to it
French AZERTYAccent reproduced correctly
German QWERTZAccent dropped or garbage produced
UK-QWERTYAccent dropped
SpanishAccent dropped (uses dead keys)

The accent characters have dead-key compositions on most layouts, which the AWT Robot cannot reliably simulate because dead keys are stateful at the OS layer and there is no portable way to query whether the dead key has been “consumed”.

The same applies to many Latin extension characters. The character is “Latin”, but the keyboard layout determines whether it has a dedicated key or requires a composition sequence.

The deep reason type with non-ASCII fails is not specific to OculiX or to Sikuli. It is a structural limitation of the java.awt.Robot API, which injects events at the keyboard event layer beneath the IME and beneath the keyboard layout translation. This API was designed in 1998 for testing GUI applications in an ASCII-Latin context, and it has never been extended to handle the layered character input model that the rest of the world uses.

Once you accept that you cannot type CJK characters through the keystroke injection path, the question becomes: how do you get them into an input field at all? The most portable answer, used by professional automation engineers for two decades, is the clipboard.

The idea is simple. Instead of pressing keys, you:

  1. Copy the target string to the system clipboard
  2. Send the OS-specific paste keystroke (Ctrl-V on Windows/Linux, Cmd-V on macOS)
  3. The receiving application processes the paste event and inserts the clipboard content into the focused field

This route bypasses the keyboard layout layer entirely. The clipboard contains arbitrary Unicode bytes. The paste keystroke is a single key combination present on every modern platform. The receiving application sees the content as text, not as a sequence of keystrokes, and inserts it through its text handling code path, which is Unicode-aware.

The clipboard route has been the unofficial standard workaround in the Sikuli community since at least 2012. It worked. But it required users to know about it, to implement it themselves, and to handle a list of edge cases (clipboard backup before pasting to avoid destroying user data, clipboard restoration after pasting, focus verification, etc.).

The OculiX change for issue #232 was to make this route the default, transparently, for any input that contains characters outside the 7-bit ASCII range.

The actual change in OculiX is small enough to quote in full.

private static boolean containsNonAscii(String text) {
if (text == null) return false;
for (int i = 0; i < text.length(); i++) {
if (text.charAt(i) > 127) return true;
}
return false;
}

Seven lines. We iterate over the input string and check whether any character has a codepoint greater than 127 (the 7-bit ASCII boundary). If yes, the input cannot be reliably typed through the keystroke path. If no, the input is pure ASCII Latin and can use the existing keystroke pipeline.

The choice of 127 as the boundary is deliberate. Everything from 0 to 127 is ASCII, has a stable Robot mapping on Western keyboards, and works with the legacy code path. Everything above 127 is either Latin extension (accented characters), other scripts (CJK, Cyrillic, Arabic, etc.), or symbols. All of these benefit from going through paste rather than through simulated keystrokes.

public int type(String text) {
if (containsNonAscii(text)) {
return paste(text);
}
try {
return keyin(null, text, 0);
} catch (FindFailed ex) {
return 0;
}
}
public int type(Object target, String text) throws FindFailed {
if (containsNonAscii(text)) {
return paste(target, text);
}
return keyin(target, text, 0);
}

The routing is two if-statements added before the existing logic. If the input contains non-ASCII characters, the call is forwarded to paste (which already exists and was already battle-tested). Otherwise, the existing keyin path runs unchanged.

The two-argument variant type(target, text) first clicks the target image to focus the input, then routes the text through paste(target, text), which performs the same click-then-paste sequence internally. The user-facing API is identical; only the internal routing changes.

public int type(Object target, String text, int modifiers)
throws FindFailed {
return keyin(target, text, modifiers);
}

The modifier-bearing variants type(text, modifiers) and type(target, text, modifiers) are intentionally not routed through paste. Holding Shift, Ctrl, or Cmd while pasting Unicode characters has no meaningful semantic — these modifiers exist to shift the meaning of physical keystrokes (Shift+A produces “A”, Ctrl+S triggers a save action), not to alter clipboard content.

If a user calls a modifier-bearing variant with CJK characters, the keystroke path runs and produces a FindFailed if the characters cannot be typed, which is the honest behavior. Silently switching to paste would lose the user’s intent.

The result for users of the API is a clean four-way matrix:

InputModifiersPath takenWhy
Pure ASCIINoneKeystrokesBackward compatible, fast, no clipboard side effect
Pure ASCIIYes (Shift/Ctrl/Cmd)KeystrokesModifiers map to keystroke semantics
Non-ASCIINoneClipboard pasteOnly way to reliably produce the characters
Non-ASCIIYes (Shift/Ctrl/Cmd)Keystrokes (will likely fail)Honest failure; modifier+paste has no meaning

This is the kind of fallback that the user does not see and does not need to know about. The scripts that worked before still work the same. The scripts that previously needed manual clipboard workarounds now just work with the obvious call.

The clipboard path is not a free win. It has properties that are different from the keystroke path, and users automating sensitive workflows should be aware of them.

Clipboard pollution

Pasting destroys the previous clipboard content. The OculiX paste implementation backs up the clipboard before pasting and restores it afterward, but the backup-restore window is not instantaneous and can interfere with other clipboard listeners.

Focus dependency

The paste keystroke targets whatever has keyboard focus. A wrong paste is one chunk of misplaced text, much more visible than wrong keystrokes which usually masks the issue.

Paste timing

Some applications throttle or queue paste events. Pasting twice in rapid succession can produce only one paste, or merge two pastes into one. The keystroke path is less susceptible.

No partial input

A keystroke-based input can be interrupted in the middle. A paste is atomic. If you want autocomplete observation, the keystroke path is the right one.

For most real-world automation use cases, these tradeoffs are acceptable. A QA engineer automating a Chinese banking form does not need character-by-character granularity; they need the form filled correctly. A test scenario filling Japanese customer data into a healthcare CRM does not depend on clipboard preservation across the operation.

The cases where the tradeoffs matter — autocomplete observation, clipboard-watching tooling — are edge cases that the user can opt out of by explicitly calling keyin instead of type, or by providing the input one character at a time.

Issue #232 was filed by a single user, but the impact of the fix is broader than that single use case. We have seen the change unlock automation work in several customer contexts since it landed.

APAC banking back-office

A French bank with subsidiaries in Hong Kong and Singapore needed to automate regression tests on their Mandarin and Cantonese branch interfaces. Before the fix, their automation team maintained two separate codebases: one for European frontends, one for APAC with manual clipboard workarounds. The two codebases merged. Maintenance cost dropped by roughly a third.

Japanese healthcare CRM

A healthcare provider in Osaka automating patient intake workflows ran into the same wall. Patient names, addresses, prescription drug names — none survived the keystroke path. The team had a custom paste wrapper as workaround, but it predated the Ed25519 audit module and could not be signed for compliance. The integrated paste path now passes their audit.

European multilingual e-commerce

A French automation team running tests on a German marketplace was hitting silent failures whenever a product name contained German umlauts. The German keyboard layout was active on the test machine, but the keystroke table assumed French AZERTY and silently dropped the characters. The clipboard route bypasses all of this.

Internal IT helpdesks across regions

Multinational corporations run helpdesk automations interacting with internal ticketing systems containing employee names in dozens of native scripts. Cyrillic, Greek, Arabic, Hindi. Each required a separate workaround. The fix makes all of them automatic.

The common thread is that internationalization in automation tooling has historically been an afterthought. Tools are built and tested in their authors’ language, and the assumption that “text input works” is rarely verified against the full Unicode space. Fixing this in OculiX brought a basic property back into line with what users reasonably expect.

The fix is good but not complete. Several adjacent problems remain open.

The lesson behind this fix is not about Chinese or Japanese specifically. It is about what happens when a tool’s API and its underlying assumption diverge.

type(String) reads, to a 2026 developer, like a Unicode-aware string-to-input function. The signature accepts a String, which in Java is fully Unicode-capable. The naive expectation is that whatever Unicode string you pass should appear in the target field, character by character, exactly as written.

That expectation collides with an implementation that descends, through layers of well-meaning abstraction, into a 1998-era API that injects keyboard events at a layer beneath the keyboard layout. The mismatch between the modern API surface and the legacy implementation creates a silent failure mode that takes years of accumulated user reports to fully document.

The fix is fifteen lines of Java because the workaround already existed in the codebase (the paste method). The change just added a smart router that picks the right path based on the input content. The hard part was not the code; it was identifying that the routing decision belonged at the type level, not at the user level.

This is a recurring pattern in mature codebases. The capability is there. The pieces are in place. What is missing is the small connective decision that makes the capability available to users who do not know which internal method to call. A good API surface hides the choice between keyin and paste. The user calls type and gets the result they expected.

For anyone maintaining a similar tool with a similar AWT-era foundation: this fix is generalizable. The same routing logic, the same ASCII boundary detection, the same paste fallback applies in PyAutoGUI, AutoHotkey, Robot Framework keyword libraries, or any other automation tool that relies on Robot-style keystroke injection. The 1998 API will not be redesigned. But its limitations can be bounded by a smarter layer above it.


Repository: github.com/oculix-org/Oculix

Original issue: oculix-org/Oculix#232