Ecosystem

European LLM infrastructure is a stack now. What Aleph Alpha, deepset, and Jina AI represent.

8 June 2026·4 min read

The three-layer problem

Building a production AI application requires decisions at three distinct infrastructure layers, and the dominant narrative about European AI has for three years obscured the fact that the European market now has production-capable options at all three.

At the model layer, the question is which language model the application calls for inference, and where that inference happens. At the orchestration layer, the question is what framework connects the model to the application's data, logic, and workflows. At the retrieval layer, the question is how the application finds and surfaces relevant information from its data corpus — the component that has made retrieval-augmented generation the dominant production architecture for enterprise AI.

The default answers to all three questions, from 2022 through most of 2024, pointed to US-based infrastructure. GPT-4 or Claude at the model layer. LangChain or custom code at orchestration. Pinecone or Chroma at retrieval. European enterprise AI applications were, with few exceptions, European applications running on American infrastructure.

The model layer: Aleph Alpha and PhariaAI

Aleph Alpha, founded in Heidelberg in 2019, has built its market position on a specific thesis: large language models for regulated industries must be sovereign, explainable, and auditable — properties that are prerequisites for the use cases it targets, not optional features. Its clients are government agencies, financial institutions, healthcare providers, and defence contractors, for whom American model infrastructure creates sovereignty risk that is material and operational, not theoretical or reputational.

PhariaAI, Aleph Alpha's enterprise platform, runs on private or sovereign cloud infrastructure entirely within EU jurisdiction. Its models are designed with the explainability requirements of the EU AI Act's high-risk provisions in mind — the ability to trace an output to specific training data and document the reasoning path is part of the architecture, not a post-hoc compliance addition. In the regulated enterprise segment the platform targets, there is no close European competitor at comparable scale, and the US-based models are structurally disadvantaged by their data residency regardless of their benchmark performance.

The orchestration layer: deepset and Haystack

deepset, founded in Berlin in 2018, built the orchestration layer before the category had a widely used name. Haystack, the company's open-source framework, is among the most widely adopted tools in production for building RAG pipelines and LLM applications. The framework abstracts infrastructure complexity from model integration: engineering teams write pipeline logic that runs against different models and different vector databases without rewriting the core application code when the underlying components change.

deepset Cloud, the managed version, provides production infrastructure for teams that want Haystack's capabilities without operating the deployment stack themselves. It runs on EU infrastructure, integrates with the major EU-native model and vector database providers, and includes the observability and evaluation tooling that production AI teams require to manage application quality as usage scales. The company's position is infrastructure-agnostic at the model layer — Haystack can connect to any model — while providing EU-based managed infrastructure for teams that need it.

The retrieval layer: Jina AI

Jina AI, founded in Berlin in 2020, built its initial reputation on neural search and has extended that work into multimodal embeddings — representations that place text, images, and documents in the same vector space, enabling cross-modal retrieval that single-modality tools cannot perform. Its embedding APIs and neural search framework are used by engineering teams building semantic search, document understanding, and retrieval-augmented applications on EU infrastructure.

The company operates at a layer that is present in almost every production RAG architecture but rarely surfaces in discussions about AI infrastructure: the component that converts organisational data into the vector representations the retrieval system queries. For an enterprise AI application processing contracts, emails, technical documentation, and images from a single knowledge base, the multimodal retrieval layer is where the quality of the application is largely determined.

What the stack represents in practice

A European enterprise team building a production AI application in 2026 can route the full data flow — model inference, pipeline orchestration, embedding generation — through infrastructure that stays within EU jurisdiction throughout. That was a realistic option for very few organisations before 2024.

The combination is a production-capable stack for regulated enterprise use cases where sovereignty and explainability requirements are structural constraints. Teams without a structural reason to avoid US infrastructure will still find US-based models more accessible, more extensively documented, and more broadly integrated with the tooling already in use — the European stack is not yet the default choice for the segment that does not face those constraints. What has changed is that the segment facing those constraints is larger than it was two years ago, the European infrastructure to serve it now exists at all three layers, and the cost of building on it has fallen to a point where it is no longer a significant penalty over the US-based alternative.