Selected work

Models we shipped — and the numbers behind them.

40+Domains

99.6%Prod uptime

1Open model

Live in productionDeep learningRisk & fraud

A detection model that runs at 98.2% accuracy

A fast-growing platform was drowning in false alarms — a brittle rules engine flagged so much that the ops team had stopped trusting it, and real risk slipped through the noise.

The problem

The rules engine fired on everything that looked vaguely wrong. False positives buried the genuine cases, analysts burned hours on ghosts, and the metric everyone watched — catch rate — was quietly falling.

What we built

A deep-learning classifier trained on their own labelled history, tuned explicitly for the asymmetric cost of a wrong “yes” — wrapped in an MLOps pipeline with drift detection and a retraining loop.

The result

Accuracy climbed past 98% while the false-positive rate fell under 1.5%. Analysts stopped chasing noise, real cases surfaced faster, and the model now scores every event in real time.

98.2%

model accuracy

1.4%

false-positive rate

63%

less analyst review time

24/7

real-time scoring

“The accuracy they quoted is still the accuracy we see in production. That sentence sounds obvious until you've been burned by everyone who couldn't deliver it.” — Maya Ellison, VP Operations

Open researchFrontier AIPeer-reviewed

Mixture-of-Recursions, open-sourced

We don't just integrate frontier models — we contribute to them. Our team designed and released a recursive transformer architecture, with research published in applied-AI venues.

The idea

Reuse model depth recursively instead of stacking parameters — adapting compute per token, so the model spends effort only where reasoning actually requires it.

Why it matters

It's the depth most consultancies cite but never produce. The same rigor goes into every fine-tune and retrieval pipeline we ship — which is why our clients' production systems run leaner.

In the open

Released publicly, with architecture and findings documented for the community — not locked behind a sales motion.

inference efficiency target

Open

source & weights

Peer-
reviewed

applied AI research

∞

reusable across client builds

More of what we've put into the world.

Breadth · 40+ domains

A cross-section of builds spanning agentic automation, conversational AI, RAG, fine-tuning, vision, and MLOps.

Support automationSaaS / E-commerce

Tier-1 and tier-2 tickets that close themselves

A multi-channel agent network reads each ticket, checks live systems, and resolves the routine majority autonomously — escalating only the genuinely hard cases and learning from every conversation it closes.

−68%tickets reaching an agent

4 minavg. time to resolution

Back-office automationLogistics / Ops

The back office that runs while the team sleeps

Agents that match invoices to POs, reconcile vendor statements, and assemble the weekly report — operating inside the ERP and inbox the team already lives in, with no rip-and-replace.

12×faster month-end close

~0manual data entry

Conversational analyticsFinance / RevOps

Ask the data a question, get the chart back

A natural-language layer over the warehouse: type the question, watch it resolve into the right chart, drill in, and export — so revenue and finance stop queuing behind the data team.

0lines of SQL written

−90%ad-hoc data requests

Recruiting automationRecruiting / HR

From 800 résumés to a shortlist by morning

Screening, scoring, scheduling, and interview-feedback synthesis stitched into one pipeline — every shortlist gated by a recruiter sign-off, so speed never costs you the human call.

4×faster time-to-shortlist

100%recruiter-approved

Predictive maintenance anomaly detection

Predictive maintenanceManufacturing

The machine warns you it will fail — two days early

Streaming sensor pipelines feed an anomaly-detection model that catches the drift before the breakdown — turning 3 a.m. line-down emergencies into a part quietly ordered on a Tuesday.

93%failures caught early

48hadvance warning

RAGLegal

Grounded retrieval over 2M documents

A production RAG system answering from a firm's own corpus — with citations, guardrails, and zero hallucinated precedent.

2Mdocs indexed

94%answer precision

Fine-tuningSupport

A 7B model that beat the giant

We fine-tuned and distilled an open model on a client's support history — matching a frontier API on their task at a fraction of the cost per call.

11xcheaper per call

96%of frontier quality

VisionLogistics

Damage detection at the dock

A vision model flagging shipment damage from a phone photo — turning a manual inspection queue into an instant decision.

11xfaster inspection

97%detection recall

MLOpsSaaS

From notebooks to a real pipeline

We replaced a sprawl of one-off scripts with a versioned, monitored pipeline — the foundation every model the team ships now runs on.

9xfaster to deploy

0silent model failures

“

Most agencies show you a slide. Kaylo showed up obsessing over our false-positive rate. Six weeks later it was running in production.

Daniel OkaforFounder, document-automation startup

“

They treated our messy data like a feature, not a problem. The fine-tuned model they shipped does the work of a team — and we own the weights.

Priya NairHead of Data, B2B SaaS

Your case study next

The next number on this page could be yours.

Start a conversation ↗