LLM orchestrator selection matrix
OculiX MCP is neutral with respect to the LLM that orchestrates it. This document provides the factual matrix to help select an orchestrator compatible with GDPR, EU AI Act, DORA, NIS2, HDS, SecNumCloud — or simply with a sovereignty strategy.
1. Positioning and scope
Section titled “1. Positioning and scope”1.1 OculiX is neutral with respect to the LLM orchestrator
Section titled “1.1 OculiX is neutral with respect to the LLM orchestrator”The OculiX MCP server (oculixmcp) exposes 11 visual automation tools (click, find, type, screenshot, etc.) via the Model Context Protocol (MCP), with Ed25519 audit trail and ActionGate access control.
OculiX does not embed an LLM. The orchestrator — the language model that decides which MCP tools to call, in what order, with which arguments — is provided by the customer, in the customer’s environment, under the customer’s responsibility.
Concretely:
flowchart LR
A["LLM Orchestrator<br/>(customer pick)"] <-->|MCP| B["OculiX MCP<br/>Server"]
B <-->|OculiX| C["Application<br/>under test"]
A -.-> D["Customer data<br/>(prompts, screenshots,<br/>UI context, logs)"]
style D fill:#fff3cd,stroke:#ffc107,color:#856404
All sensitive data (screenshots, prompts, application context, UI traces) transits through the LLM orchestrator before or during the call to OculiX tools. So the LLM choice determines:
- the jurisdiction where data is processed,
- the retention policy,
- eligibility for GDPR / EU AI Act / DORA / NIS2 / HDS / SecNumCloud,
- operational cost,
- the robustness of tool calling and therefore the reliability of scenarios.
OculiX does not opine on the LLM choice. This document provides the factual matrix to help the customer choose with full knowledge.
1.2 What this document is not
Section titled “1.2 What this document is not”- Not a “best LLM” ranking — the concept makes no sense out of context
- Not legal advice — this document does not bind OculiX or any associated commercial entity under GDPR or AI Act
- Not a quality benchmark — those exist elsewhere (lmarena.ai, artificialanalysis.ai, scale.com/leaderboard)
- Not a real-time-updated guarantee — vendor terms evolve; re-verify before signature
1.3 Reading assumptions
Section titled “1.3 Reading assumptions”The reader is assumed to know:
- the basics of MCP (Anthropic, November 2024) and tool calling
- the basics of GDPR (Art. 28, Art. 44–46, Chapter V)
- the basics of the AI Act (Annex III, Art. 6, Art. 26, 2026–2028 calendar)
- the provider / processor / deployer distinction
2. Evaluation criteria
Section titled “2. Evaluation criteria”Seven criteria are used. They answer concrete operational questions a DPO, CISO, or Procurement officer asks when choosing an LLM for an OculiX integration.
2.1 Physical hosting of inference servers
Section titled “2.1 Physical hosting of inference servers”Question: where, geographically, is the compute performed?
This is the simplest question but often poorly handled. An “EU” endpoint can in fact be a proxy to the US (historical case of several “EU” offerings up to 2025). Verify in the DPA and sub-processors the actual GPU inference location, not only the access control.
2.2 Vendor’s legal jurisdiction
Section titled “2.2 Vendor’s legal jurisdiction”Question: which jurisdiction can compel the vendor to disclose data?
- US-incorporated company → subject to the CLOUD Act (2018) and FISA 702, even if datacenters are in Europe
- Chinese company → subject to the National Intelligence Law 2017 (Art. 7), which obliges any Chinese entity to cooperate with intelligence services
- European company (SAS, SA, GmbH, etc.) → subject only to GDPR and national laws
2.3 Retention policy (ZDR — Zero Data Retention)
Section titled “2.3 Retention policy (ZDR — Zero Data Retention)”Question: how long are inputs/outputs stored after inference?
Three typical cases:
- Default retention: 30 days (anti-abuse), with opt-out sometimes available (OpenAI, Mistral, Anthropic)
- Contractual ZDR: no post-inference storage (available on Enterprise / API tier)
- Indefinite storage: default on free consumer offerings (to be avoided for pro use)
Pitfall: ZDR generally does not cover the ongoing inference pipeline, nor subprocessors during processing. It covers the absence of persistence after the response.
2.4 Training-on-customer-data policy
Section titled “2.4 Training-on-customer-data policy”Question: can customer prompts/outputs be used to train the vendor’s future models?
- Major vendors’ enterprise APIs: no, never, by default (contractual)
- Consumer offerings (ChatGPT Free/Pro, Claude.ai Free/Pro): variable, opt-in/opt-out depending on period — not usable in B2B pro
- Public DeepSeek, some Chinese offerings: yes by default, to be avoided for sensitive data
2.5 GDPR and AI Act compliance
Section titled “2.5 GDPR and AI Act compliance”Question: does the vendor have the contractual commitments needed to allow the customer (deployer) to meet their own GDPR and AI Act obligations?
Minimum checklist:
- Signed DPA (GDPR Article 28)
- SCC (Standard Contractual Clauses) if transfer outside the EEA
- Documented subprocessors
- Commitment to non-use for training
- Logs accessible for AI Act Art. 12 obligations (traceability)
- Model documentation sufficient for AI Act Art. 11 + Annex IV
- HIPAA BAA for US healthcare, HDS for French healthcare
2.6 Self-hosted / on-premise deployment
Section titled “2.6 Self-hosted / on-premise deployment”Question: can the model be run in your own datacenter, or even fully air-gapped?
- Downloadable open weights (Apache 2.0, MIT, Llama Community License): yes, fully
- Proprietary models in dedicated VPC: possible with Mistral (Le Chat Enterprise), partially with OpenAI (dedicated Azure), Anthropic (via Bedrock PrivateLink), but this is not true on-prem — the weights stay with the vendor
- Pure SaaS API: no self-hosting possible
This is the only way to obtain complete sovereign independence. Hardware cost to anticipate: see section 5.
2.7 Tool calling and MCP compatibility
Section titled “2.7 Tool calling and MCP compatibility”Question: can the model call tools reliably, deterministically, in parallel?
Technical criteria:
- Native support for function/tool calling (vs manual prompt engineering)
- Accuracy on BFCL (Berkeley Function Calling Leaderboard) benchmarks
- Native MCP protocol support client-side or via bridge (MCPHost, ollmcp, LiteLLM, llama.cpp)
- Tool call streaming
- Call parallelization (multiple tools in parallel within one response)
State of the art as of May 2026:
- Tier 1 (production-grade): Claude Sonnet/Opus 4.x, GPT-4.1/5, Gemini 3.1 Pro
- Tier 2 (excellent): Mistral Large 3, Qwen 3.5, Llama 4
- Tier 3 (functional but to validate): Gemma 4, DeepSeek V3.x, Phi-4
3. Detailed vendor matrices
Section titled “3. Detailed vendor matrices”The data below reflects the state as of May 9, 2026 and may evolve. Always re-verify in the effective DPA at signature time.
3.1 Anthropic Claude
Section titled “3.1 Anthropic Claude”| Criterion | Status |
|---|---|
| Company | Anthropic PBC, Delaware (US) |
| Jurisdiction | United States (CLOUD Act, FISA 702 applicable) |
| Models available via API | Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5 |
| Direct API hosting | Mostly US; inference_geo=eu option in beta, storage US in all cases |
| Hosting via AWS Bedrock | EU possible (eu-central-1 Frankfurt) — under AWS / Delaware jurisdiction |
| Hosting via GCP Vertex AI | EU possible (europe-west1, europe-west4) — under Google / Delaware jurisdiction |
| Hosting via Azure Foundry | EU “Coming 2026” — not yet effective on Foundry |
| ZDR | Available by separate addendum, on Enterprise/API tier — not by default |
| DPA / SCC | Included automatically (DPA v.01/01/2026) on Team, Enterprise and commercial API |
| API training opt-out | Opt-out by default (Anthropic does not train on API data) |
| Certifications | SOC 2 Type II, ISO 27001:2022, ISO 42001:2023, HIPAA BAA, FedRAMP High (Claude for Government) |
| Self-hosted | No. Weights not public. No on-premise option. |
| Tool calling | Excellent. Industry reference. Native streaming and parallel tools. |
| MCP | Creator of the protocol (Nov. 2024). Reference native support. |
| Indicative pricing | Sonnet 4.6: ~$3/M input, ~$15/M output. Opus 4.7: ~$15/M input, ~$75/M output |
Verdict: excellent tool-calling quality, native MCP support (concrete advantage for OculiX), but US-incorporated. To achieve EU data residency and remain legally defensible under GDPR, you must route through Bedrock (eu-central-1) or Vertex AI (europe-west). The CLOUD Act remains applicable even in this case — this is documented in any serious DPIA.
3.2 OpenAI (GPT)
Section titled “3.2 OpenAI (GPT)”| Criterion | Status |
|---|---|
| Company | OpenAI OpCo LLC (Delaware, US) / OpenAI Ireland Ltd (secondary EU entity) |
| Jurisdiction | United States (CLOUD Act, FISA 702 applicable — Ireland entity insufficient alone) |
| Models available | GPT-5, GPT-4.1, o-series (reasoning) |
| Direct API hosting | EU residency option available for eligible projects, with forced ZDR |
| Hosting via Azure OpenAI | EU Data Zone available (West Europe, North Europe, etc.), full control via Azure tenant |
| ZDR | Available: automatic on EU residency projects, on request for US projects (Limited Access program) |
| DPA / SCC | Included in standard commercial terms |
| API training opt-out | Opt-out by default on API (no training on API/business data since March 2023) |
| Certifications | SOC 2 Type II, ISO 27001, HIPAA BAA (via Azure), CSA STAR |
| Self-hosted | No. Weights not public. Open gpt-oss models under evaluation (limited). |
| Tool calling | Excellent. Historical function-calling reference since June 2023. |
| MCP | Native support since 2025 (Responses API, ChatGPT Apps) |
| Indicative pricing | GPT-5: ~$1.25/M input, ~$10/M output. GPT-4.1: ~$2/M input, ~$8/M output |
Verdict: Azure OpenAI in EU Data Zone is the most compliance-defensible path for a customer that wants to stay in the OpenAI ecosystem. The CLOUD Act remains applicable but the operational scope is better controlled (logs, IAM, VNet). For strict sovereign use (government, defense, French HDS healthcare), insufficient as is.
3.3 Mistral AI
Section titled “3.3 Mistral AI”| Criterion | Status |
|---|---|
| Company | Mistral AI SAS, Paris (France) |
| Jurisdiction | France / European Union (GDPR, no CLOUD Act applicable) |
| Models available | Mistral Large 3 (MoE, 256k context), Pixtral Large, Devstral 2 (code), Mistral Medium 3, Mistral Small/Nemo (open weights) |
| Direct API hosting (La Plateforme / Mistral AI Studio) | France and EU by default. No routing to the US. |
| Hosting via AWS / Azure / GCP Marketplace | Available. Effective jurisdiction depends on the chosen cloud operator. |
| Self-hosted | Yes, officially supported: self-hosted, private cloud, VPC, on-premise via TensorRT-LLM, vLLM, Ollama, llama.cpp |
| ZDR | Available (toggle parameter on API). Default 30-day rolling retention for anti-abuse monitoring. |
| DPA / SCC | Included. SCC not needed for EU customers (no transfer outside the EEA). |
| Training opt-out | Opt-out by default on Team and Enterprise. No training on customer data under enterprise terms. |
| Certifications | SOC 2 Type II, ISO 27001 (extension in progress). GPAI Code of Practice commitment. |
| Open-weight models | Several models published under Apache 2.0: Mistral 7B, Mixtral 8x7B/8x22B, Nemo, Small 3, Devstral, Codestral Mamba |
| Tool calling | Very good (Mistral Large 3 and Medium 3). Documented native function calling. |
| MCP | Compatible support via official SDK and OpenAI-compatible gateways (LiteLLM, etc.) |
| Indicative pricing | Mistral Medium 3: ~$0.40/M input. Mistral Large 3: ~$2/M input, ~$6/M output |
Verdict: the only frontier-class native EU offering without CLOUD Act. Complete stack: France-hosted SaaS, VPC, on-prem, open-weight models. Adopted by HSBC, SAP, French/German governments for sovereign stacks. Recommended default choice for regulated EU OculiX customers unless specific technical constraints apply. Limitation: tooling ecosystem younger than OpenAI/Anthropic.
3.4 Google Gemini
Section titled “3.4 Google Gemini”| Criterion | Status |
|---|---|
| Company | Google LLC, Mountain View (US) / Alphabet Inc. (US) |
| Jurisdiction | United States (CLOUD Act, FISA 702 applicable) |
| Models available | Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.0 Flash |
| Hosting via Vertex AI | EU possible: europe-west1 (Belgium), europe-west4 (Netherlands). But Gemini 3.x not yet in EU as of May 9, 2026 — only the 2.x generations are GDPR-compatible in EU. |
| ZDR | Available on Vertex AI Enterprise (paid option) |
| DPA / SCC | Included in Google Cloud Terms |
| Training opt-out | Opt-out by default on Vertex AI (no training on customer prompts) |
| Certifications | SOC 1/2/3, ISO 27001, 27017, 27018, 27701, 42001, FedRAMP High, HIPAA BAA |
| Self-hosted | No. No Gemini open-weight models. Gemma 4 variants available open-weight (but far less capable than Gemini 3). |
| Tool calling | Excellent (Gemini 3.1 Pro). Native function calling, structured output, parallel calls. |
| MCP | Official support since 2026 (Vertex AI Agent Builder, Gemini Enterprise) |
| Indicative pricing | Gemini 3.1 Flash: ~$0.30/M input, ~$2.50/M output. Gemini 3.1 Pro: ~$2/M input, ~$10/M output |
Verdict: excellent capability/price ratio, but two major traps:
- the 3.x models are not (yet) available in EU as of May 9, 2026 → need to use Gemini 2.5 Pro to remain GDPR-compatible, which degrades tool-calling quality
- US jurisdiction, CLOUD Act applicable
For a non-regulated customer, OK. For a customer with sovereignty constraints, unsuitable.
3.5 DeepSeek
Section titled “3.5 DeepSeek”| Criterion | Status |
|---|---|
| Company | Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd. (China) |
| Jurisdiction | People’s Republic of China — National Intelligence Law 2017 applicable |
| Models available | DeepSeek V4, DeepSeek R1 (reasoning), DeepSeek Coder |
| Official API hosting | Servers in China. No EU residency available. |
| Hosting via third parties (Atlas Cloud, Baseten, Fireworks, Together) | Variable. Verify the sub-processor. |
| ZDR | Not guaranteed on official API. Partial documentation. |
| DPA / SCC | Nearly nonexistent. DeepSeek contested GDPR applicability in 2025. |
| Training opt-out | No, by default, on official API. Data used for training. |
| Certifications | No recognized EU certification. |
| EU regulatory status | Italy (Garante) banned DeepSeek in January 2025. Active investigations in France, Germany, Belgium, Ireland, Netherlands. |
| Open-weight models | Yes, under MIT license — this is the safe usage angle: self-host the weights and ignore the official API |
| Tool calling | Decent (V3.x), to validate in production |
| MCP | Via OpenAI-compatible bridge on self-hosted side |
| Indicative pricing (official API) | Very low — ~$0.07/M input. Out of EU for pro use. |
Verdict: the official DeepSeek API is to be avoided for any EU customer processing personal or industrial-sensitive data. However, the open-source weights are among the most performant freely available and can be self-hosted with no downside. Conclusion: yes to the weights, no to the API.
3.6 Self-hosted open-weight models
Section titled “3.6 Self-hosted open-weight models”Generic case: the customer downloads the weights and runs them on their own infrastructure (datacenter, private cloud, or own on-prem GPUs).
| Model | License | Sizes | Tool calling | Ideal self-host |
|---|---|---|---|---|
| Llama 4 (Meta) | Llama Community License | 8B, 70B, 405B | Good | vLLM |
| Mistral Small 3 / Nemo / Mixtral | Apache 2.0 | 7B to 8x22B | Good to very good | vLLM, llama.cpp |
| Qwen 3.5 (Alibaba) | Apache 2.0 | 0.5B to 72B | Very good | vLLM, llama.cpp |
| Gemma 4 (Google) | Gemma Terms | 2B to 27B | Good (big jump since Gemma 3) | Ollama, vLLM |
| DeepSeek V3 / R1 | MIT | 671B (MoE) | Decent | vLLM |
| Phi-4 (Microsoft) | MIT | 14B | Average | Ollama, LM Studio |
Self-host advantages:
- Jurisdiction = your datacenter’s. Native GDPR. No CLOUD Act, no FISA 702, no NIL 2017.
- Air-gap possible (no outbound connection)
- Very low marginal cost once hardware is amortized: ~$0.001 to $0.04/M tokens in electricity vs ~$2.50 to $15/M in cloud API
- Typical hardware ROI: less than 4 months above ~30M tokens/day
- Full auditability: frozen weights, traceable version, no silent “model drift” on the vendor side
Drawbacks:
- Initial capex: a 2x H100 80GB node costs ~€50-80k (rental ~€3-5k/month)
- Internal MLOps competence needed (vLLM tuning, GPU K8s, monitoring)
- Open-weight models trail the proprietary frontier by 6 to 12 months
- Stack maintenance (vLLM updates, OS security, CUDA drivers)
Recommended stacks as of May 9, 2026:
- Prototyping / dev: Ollama (simple, OpenAI-compatible, MCP via MCPHost)
- Multi-user production: vLLM 0.17+ (PagedAttention, continuous batching, Anthropic API compat since v0.17)
- CPU-only air-gap: llama.cpp (native MCP support since March 2026)
- Managed EU cluster: Mistral AI Studio “Enterprise-Supported Self-Deployment”, or sovereign operators like Scaleway/OVH with dedicated LLM offerings
3.7 Inference accelerators (Groq, Cerebras, SambaNova)
Section titled “3.7 Inference accelerators (Groq, Cerebras, SambaNova)”These vendors are not model publishers. They operate specialized hardware (LPU, WSE, RDU) that serves third-party open-weight models (Llama, Mistral, Qwen, DeepSeek) at very low latency.
| Vendor | Hosting | Jurisdiction | Models served | Main interest |
|---|---|---|---|---|
| Groq | Mostly US, EU coming | US | Llama, Mixtral, Qwen, GPT-OSS | Latency < 100ms, record throughput |
| Cerebras | US | US | Llama, Qwen, DeepSeek | Massive throughput (3000+ tok/s) |
| SambaNova | US | US | Llama, DeepSeek | Throughput |
Verdict: interesting for latency (OculiX MCP benefits from fast responses since there are many round-trips), but US jurisdiction on all major players as of May 9, 2026. For regulated EU customers, not a sovereign path. For non-regulated customers wanting an ultra-fast orchestrator, excellent latency/price ratio.
4. Summary table
Section titled “4. Summary table”Legend:
- OK: criterion fully satisfied in standard configuration
- CONF: satisfied through specific configuration (Bedrock EU, Vertex EU, ZDR addendum, etc.)
- NO: not available or unsatisfactory
| Vendor | Non-US jurisdiction | EU hosting | ZDR | GDPR/DPA | Self-host | Tool calling | Native MCP |
|---|---|---|---|---|---|---|---|
| Anthropic Claude (direct API) | NO | CONF | CONF | OK | NO | OK | OK |
| Anthropic via Bedrock EU | NO (AWS) | OK | OK | OK | NO | OK | OK |
| Anthropic via Vertex EU | NO (Google) | OK | OK | OK | NO | OK | OK |
| OpenAI direct API EU residency | NO | OK | OK | OK | NO | OK | OK |
| OpenAI via Azure EU Data Zone | NO (MS) | OK | OK | OK | NO | OK | OK |
| Mistral La Plateforme | OK | OK | OK | OK | OK | OK | CONF |
| Mistral self-hosted | OK | OK | OK | OK | OK | OK | CONF |
| Google Gemini 3.x via Vertex EU | NO | NO (not avail EU on 5/9/26) | CONF | OK | NO | OK | OK |
| Google Gemini 2.5 via Vertex EU | NO | OK | CONF | OK | NO | OK | OK |
| DeepSeek official API | NO (China) | NO | NO | NO | NO | OK | NO |
| DeepSeek self-hosted (MIT weights) | OK (per DC) | OK | OK | OK | OK | OK | CONF |
| Llama 4 self-hosted | OK (per DC) | OK | OK | OK | OK | OK | CONF |
| Mixtral / Qwen self-hosted | OK (per DC) | OK | OK | OK | OK | OK | CONF |
| Groq / Cerebras / SambaNova | NO | NO | CONF | CONF | NO | OK | CONF |
5. Recommended deployment profiles
Section titled “5. Recommended deployment profiles”Five typical profiles, from most to least constrained. Profile A is the strictest on sovereignty; profile E is the most flexible.
5.1 Profile A — Government, defense, healthcare (HDS), critical-infrastructure operators
Section titled “5.1 Profile A — Government, defense, healthcare (HDS), critical-infrastructure operators”Constraints:
- SecNumCloud, HDS (French healthcare), DiffusionRestreinte (defense)
- No CLOUD Act, no FISA 702
- Air-gap possible or required
- AI Act high-risk (Annex III) likely
Recommendation: open-weight models self-hosted on SecNumCloud infrastructure or air-gapped on-prem.
Stack:
- Model: Mistral Large 3 (if Enterprise license negotiated), Mixtral 8x22B, or Llama 4 70B
- Inference engine: vLLM 0.17+
- Hosting: OVH SecNumCloud, Outscale, Scaleway, or private datacenter
- MCP bridge: native llama.cpp or MCPHost
Acceptable fallback: Mistral AI Studio in self-deployment mode supervised by Mistral.
5.2 Profile B — Regulated mid/large enterprise (banking, insurance, energy)
Section titled “5.2 Profile B — Regulated mid/large enterprise (banking, insurance, energy)”Constraints:
- DORA (since January 2025), NIS2, strict GDPR
- Active AI ethics committee and DPO
- AI Act high-risk for certain uses (HR, scoring, surveillance)
- Possible audit by regulators (ACPR, BaFin, etc.)
Recommendation: Mistral Le Chat Enterprise in private VPC or sovereign cloud.
Stack:
- Model: Mistral Large 3 via La Plateforme with enhanced DPA and ZDR enabled
- Hosting: Mistral cloud (FR/EU) or self-hosted in VPC on OVH/Scaleway
- Backup: Claude via Bedrock eu-central-1 for non-sensitive tasks (with DPIA documenting CLOUD Act residual risk)
5.3 Profile C — Generic EU B2B SaaS (non-regulated)
Section titled “5.3 Profile C — Generic EU B2B SaaS (non-regulated)”Constraints:
- GDPR applicable
- No AI Act high-risk
- Cost-sensitive
- Need good tool-calling quality
Recommendation: OpenAI via Azure EU Data Zone, or Claude via Bedrock eu-central-1.
CLOUD Act residual risk is documented in the DPIA. Acceptable for an EU B2B customer that does not process particularly sensitive data. Mistral La Plateforme remains the best option if you accept testing a slightly younger ecosystem.
5.4 Profile D — POC, internal R&D, sandbox
Section titled “5.4 Profile D — POC, internal R&D, sandbox”Constraints:
- Non-sensitive data (anonymized, synthetic, or outside GDPR scope)
- Minimal cost
- Fast iteration
Recommendation: Anthropic API direct (Claude Sonnet 4.6) or Mistral API.
Native tool calling and MCP, top quality. Explicitly document in an internal policy that this scope does not handle personal or confidential data. Otherwise switch back to profiles A/B/C.
5.5 Profile E — Full air-gap, classified environment
Section titled “5.5 Profile E — Full air-gap, classified environment”Constraints:
- No outbound connection allowed
- Datacenter under physical customer control
- Long-term bit-for-bit inference reproducibility
Recommendation: open-weight models, llama.cpp or vLLM, on dedicated hardware.
Stack:
- Model: Llama 4 70B FP16, or Mixtral 8x22B, or Mistral Small 3 for more modest targets
- Inference engine: llama.cpp (CPU possible) or vLLM (dedicated GPU, A100/H100)
- Audit: systematic packet capture to verify the absence of any outbound
- Storage: cryptographically signed weights, SHA-256 verification at each load
No SaaS offering is eligible. This is the only profile where sovereignty is mathematically verifiable.
6. Common pitfalls to avoid
Section titled “6. Common pitfalls to avoid”7. Specific recommendations for OculiX MCP integration
Section titled “7. Specific recommendations for OculiX MCP integration”7.1 Ed25519 audit trail: to keep customer-side
Section titled “7.1 Ed25519 audit trail: to keep customer-side”OculiX’s signed audit trail allows proving, after the fact and in a tamper-evident way, which tools were called with which arguments. It is a major asset for AI Act compliance (Art. 12 — logs and traceability).
Never transmit this audit trail to the LLM orchestrator for re-injection: this would invert the chain of trust. The audit trail is meant for DPOs/auditors, not for the model.
7.2 ActionGate: deterministic authorization policy
Section titled “7.2 ActionGate: deterministic authorization policy”OculiX’s ActionGate access control is deterministic and independent of the LLM. Even if an LLM hallucinates or falls victim to prompt injection, ActionGate blocks actions not authorized by the policy.
Consequence: the customer can choose a less “safe” LLM (more creative, less guarded) for orchestration without compromising operational security, provided the ActionGate policy is correctly defined.
7.3 Latency: direct impact of LLM choice
Section titled “7.3 Latency: direct impact of LLM choice”OculiX MCP typically does 5 to 50 LLM round-trips per test scenario. Per-call latency accumulates:
| Provider | Median TTFT latency |
|---|---|
| Groq (Llama) | 100-200 ms |
| Cerebras (Llama) | 150-300 ms |
| Claude Sonnet (US direct) | 400-800 ms |
| Claude via Bedrock EU | 500-1000 ms |
| GPT-4.1 via Azure EU | 500-1200 ms |
| Mistral La Plateforme | 300-700 ms |
| Self-hosted vLLM (H100, EU DC) | 50-150 ms |
For a 20-call scenario, that’s a 2 to 20-second difference per execution. Over thousands of executions, this is structurally significant for CI/CD.
7.4 Reproducibility
Section titled “7.4 Reproducibility”For visual non-regression use cases, you ideally want an LLM whose outputs are reproducible. No stochastic LLM is strictly reproducible (even temperature=0 doesn’t guarantee determinism on GPU due to float non-determinism).
OculiX mitigation: the deterministic layer (Sikuli, OpenCV, OCR) absorbs the majority of LLM non-reproducibility. The LLM decides what to look for; the deterministic layer guarantees how it is found.
This is consistent with the project’s philosophy: deterministic code in the critical loop, LLM as a high-level decision layer. Not the other way around.
7.5 Specifically discouraged models
Section titled “7.5 Specifically discouraged models”As of May 9, 2026, do not use for orchestrating OculiX MCP in production:
- Models < 7B parameters: tool calling too weak
- Non-instruct (base) models: no value
- Phi-3: unstable tool calling on multi-hop scenarios
- “Uncensored” models without alignment: risk of aberrant behavior on ambiguous prompts
- LLMs without native function-calling support (manual prompt engineering): too fragile
8. Appendices
Section titled “8. Appendices”8.1 Glossary
Section titled “8.1 Glossary”- CLOUD Act (Clarifying Lawful Overseas Use of Data Act, 2018): US law allowing US authorities to require a US-incorporated company to disclose data, wherever stored worldwide.
- FISA 702 (Foreign Intelligence Surveillance Act, Section 702): US provision authorizing the collection of electronic communications from non-US foreigners without individual warrant.
- National Intelligence Law 2017 (Art. 7): Chinese law requiring any Chinese organization and citizen to cooperate with intelligence services.
- ZDR (Zero Data Retention): contractual commitment not to store inputs and outputs beyond the inference cycle.
- DPA (Data Processing Agreement): GDPR Art. 28 processor agreement.
- SCC (Standard Contractual Clauses): standard contractual clauses for transfers outside the EEA.
- HRAIS (High-Risk AI System): AI system classified high-risk under the AI Act (Annex III).
- Deployer (AI Act): natural or legal person using an AI system under their own authority.
- MCP (Model Context Protocol): open Anthropic protocol (Nov. 2024) for connecting LLMs and tools.
- BFCL (Berkeley Function Calling Leaderboard): reference benchmark for tool-calling quality.
8.2 Sources and useful links
Section titled “8.2 Sources and useful links”- Anthropic — Data residency
- Anthropic — ZDR
- OpenAI — EU data residency
- OpenAI — Data controls
- Mistral — Privacy Policy
- Mistral — Le Chat Enterprise
- Google Vertex AI — Data residency
- EU AI Act — Regulatory framework
- Berkeley Function Calling Leaderboard
8.3 Disclaimer
Section titled “8.3 Disclaimer”This document is provided for informational purposes. It does not constitute legal advice, a compliance audit, or a contractual commitment from OculiX or any associated commercial entity. Responsibility for LLM orchestrator compliance rests with the customer (deployer under the AI Act and processor/controller under GDPR as applicable).
LLM vendor terms evolve rapidly. Always re-verify DPAs, SCCs, certifications, and retention policies at contract signature time.
8.4 Document history
Section titled “8.4 Document history”| Version | Date | Changes |
|---|---|---|
| 1.0 | May 9, 2026 | Initial version |
| 1.0 (publication) | May 19, 2026 | Published on oculix.org |