Selected work

Models we shipped — and the numbers behind them.

40+Domains
99.6%Prod uptime
1Open model
Live in productionDeep learningRisk & fraud

A detection model that runs at 98.2% accuracy

A fast-growing platform was drowning in false alarms — a brittle rules engine flagged so much that the ops team had stopped trusting it, and real risk slipped through the noise.

Detection model

The problem

The rules engine fired on everything that looked vaguely wrong. False positives buried the genuine cases, analysts burned hours on ghosts, and the metric everyone watched — catch rate — was quietly falling.

What we built

A deep-learning classifier trained on their own labelled history, tuned explicitly for the asymmetric cost of a wrong “yes” — wrapped in an MLOps pipeline with drift detection and a retraining loop.

The result

Accuracy climbed past 98% while the false-positive rate fell under 1.5%. Analysts stopped chasing noise, real cases surfaced faster, and the model now scores every event in real time.

98.2%
model accuracy
1.4%
false-positive rate
63%
less analyst review time
24/7
real-time scoring
“The accuracy they quoted is still the accuracy we see in production. That sentence sounds obvious until you've been burned by everyone who couldn't deliver it.” — Maya Ellison, VP Operations
Open researchFrontier AIPeer-reviewed

Mixture-of-Recursions, open-sourced

We don't just integrate frontier models — we contribute to them. Our team designed and released a recursive transformer architecture, with research published in applied-AI venues.

Mixture-of-Recursions architecture

The idea

Reuse model depth recursively instead of stacking parameters — adapting compute per token, so the model spends effort only where reasoning actually requires it.

Why it matters

It's the depth most consultancies cite but never produce. The same rigor goes into every fine-tune and retrieval pipeline we ship — which is why our clients' production systems run leaner.

In the open

Released publicly, with architecture and findings documented for the community — not locked behind a sales motion.

2x
inference efficiency target
Open
source & weights
Peer-
reviewed
applied AI research
reusable across client builds

More of what we've put into the world.

Breadth · 40+ domains

A cross-section of builds spanning RAG, fine-tuning, vision, and MLOps.

RAG pipeline
RAGLegal

Grounded retrieval over 2M documents

A production RAG system answering from a firm's own corpus — with citations, guardrails, and zero hallucinated precedent.

2Mdocs indexed
94%answer precision
Fine-tuned LLM
Fine-tuningSupport

A 7B model that beat the giant

We fine-tuned and distilled an open model on a client's support history — matching a frontier API on their task at a fraction of the cost per call.

11xcheaper per call
96%of frontier quality
Computer vision damage detection
VisionLogistics

Damage detection at the dock

A vision model flagging shipment damage from a phone photo — turning a manual inspection queue into an instant decision.

11xfaster inspection
97%detection recall
MLOps pipeline
MLOpsSaaS

From notebooks to a real pipeline

We replaced a sprawl of one-off scripts with a versioned, monitored pipeline — the foundation every model the team ships now runs on.

9xfaster to deploy
0silent model failures
Most agencies show you a slide. Kaylo showed up obsessing over our false-positive rate. Six weeks later it was running in production.
D
Daniel OkaforFounder, document-automation startup
They treated our messy data like a feature, not a problem. The fine-tuned model they shipped does the work of a team — and we own the weights.
P
Priya NairHead of Data, B2B SaaS
Your case study next

The next number on this page could be yours.

Start a conversation