Beyond 95% Accuracy
The engineering lessons from building voice agents for healthcare - where benchmarks fail, context hints unlock accuracy, and your production failures become your most valuable training data
Field notes
Reflections on AI, product decisions, and the small stories that shape how we build.
The engineering lessons from building voice agents for healthcare - where benchmarks fail, context hints unlock accuracy, and your production failures become your most valuable training data
Every sentence in a PRD should either be a Fact or a Hypothesis. Avoid making Statements at all costs.
Why effective LLM evaluation goes beyond simple metrics—addressing trustworthiness, safety, reliability, and continuous improvement for truly valuable AI applications
The Model Context Protocol revolution requires a shift from engineering to product design thinking—building experiences that transform how work gets done, not just tools that LLMs can use
Exploring how original ideas emerge not from isolation, but from the rich interplay of diverse knowledge and experiences