Index / Services / /04 · AI & AGENTS

/04

JOURNAL ENTRY · UPDATED MAY 2026 12 MIN READ

AI products
& autonomous
agents.

Autonomous conversational agents, custom LLM integrations, voice and multimodal interfaces — engineered for production, not demos. Evals, observability, guardrails, escalation paths.

Practice lead

Defne Arslan

Team size

11 engineers · 3 researchers

Active projects

14 in production

Models in stock

GPT · Claude · Mistral · OSS

CHAPTER 01

Why this practice exists.

Most "AI products" you've seen are demos in a wig. They work in the keynote and fall over in the wild. We built this practice for the other thing — the production system, sitting in front of paying customers, handling tens of thousands of conversations a day without a human at the wheel.

The cost of a wrong answer in production is asymmetric. A retail-customer assistant that hallucinates a promo costs a refund. A medical triage agent that hallucinates a dosage costs a lawsuit. Eval-driven development isn't a luxury — it's the only honest way to ship.

We work with a small set of clients where the agent is the product, or where it sits on the critical path. We don't take chatbot-bolted-onto-existing-app briefs — there are agencies for that, and we'll happily refer.

If the model can fail it will fail. Our job is to make that failure boring, observable, and recoverable.

— DEFNE ARSLAN, PRACTICE LEAD

CHAPTER 02

The architecture we ship.

Every agent we put in production follows the same skeleton. The pieces vary; the shape does not.

The orchestrator is where the opinions live. It picks the model per task — GPT-4 for nuanced reasoning, Claude for tool-heavy chains, a self-hosted Mistral for cheap classification. It enforces evals before any side-effect commits. It logs every decision to a queryable trace, so when a customer asks "why did the agent do that?" — three weeks later — the answer is in your dashboard, not your memory.

CHAPTER 03

What we deliver.

Every engagement ships with the same seven artifacts. Hand-wave on any of them and the agent is a demo, not a product.

Eval suite

200+ test cases per agent. Run on every commit, every model change.

Observability

Every prompt, response, tool call, and latency in a queryable trace.

Guardrails

PII redaction, topic boundaries, jailbreak detection — model-agnostic.

Human escalation

Routed handoff to your support team with full conversation context.

Model abstraction

Swap providers in one config line. No vendor lock-in, ever.

Cost dashboard

Tokens per user, per query, per quarter. Forecast before you scale.

CHAPTER 04

Engagement shape.

Three ways to start. Most engagements begin with a two-week Sprint to de-risk the model choice and the user research before any production code is written.

/01 · SPRINT

2 weeks

Working prototype on real data. €24k fixed.

/02 · BUILD

3 — 9 months

Full production engagement + all deliverables.

/03 · OPERATE

Ongoing

Continuous pod. Pause or scale every 90 days.

CHAPTER 05

The stack we trust.

Opinionated but not religious. We choose per-task; we'd rather use the right tool than the cool one.

FRONTIER OpenAI

FRONTIER Anthropic

OPEN Mistral

EMBED Cohere

STT/TTS Whisper · 11Labs

VECTOR Qdrant

VECTOR pgvector

OBSERVE Langfuse

EVAL Promptfoo

SERVE Modal · Fly

CHAPTER 06

Selected work.

Three production agents from the practice. Each shipped in under nine months.

CASE · SALPRE

Salpre AI

Hardware + AI · world's first phone-to-agent adapter · iOS.

→

CASE · ADREP

Adrep AI

GPT-powered ad analytics · shareable reports · iOS + web.

→

CASE · METO

Meto CRM

Omnichannel CRM with in-product AI assist · WhatsApp + IG + email.

→

Got a brief? AI or otherwise.

A senior partner replies personally within 24 hours.

Start a project ↗