Routing under distribution shift
Policies degrade quietly when production traffic moves away from their evaluation set. We are working on detection and graceful reroute, in deployment with two partners.
Blossom Labs is the research arm of Blossom AI. We study scalable knowledge discovery — how systems should learn from simulation, from operators, and from each other — and turn what we find into the operating principles behind our product.
A research question is only worth pursuing if we can name where it bites us in production. We work backwards from the operator's problem.
Reinforcement learning in carefully built environments lets agents make a thousand mistakes before they make their first real one. The fidelity of the sim is the bar we hold ourselves to.
Domain expertise should travel into our models faster than it travels out. We build pipelines where specialists can correct, audit, and steer — and where the model knows when to ask.
A model that is wrong with confidence costs more than a model that is right with humility. We grade refusal, deferral, and uncertainty as first-class outcomes.
The lab publishes — papers, notes, benchmarks. The product benefits, but the field has to be able to check our work, push back, and improve on it.
Most of our agents have already done a job a thousand times before they meet the real one. We build small, carefully shaped environments — operations problems stripped of cosmetic complexity but not of structure — and let agents learn there first.
Each episode the agent gets a little less random and a little more deliberate. The point is not the gridworld. The point is the ritual: explore, fail, update, try again. By the time the policy reaches a real customer, it has already seen the shape of the work.
Three featured directions, plus the broader index of questions the lab is paying attention to.
Policies degrade quietly when production traffic moves away from their evaluation set. We are working on detection and graceful reroute, in deployment with two partners.
When should an agent stop and ask? When should it abort? We are studying the cost of overconfidence in multi-step work where every step compounds.
How quickly can specialist knowledge enter a model — and how do we keep that knowledge from going stale as the practice it came from moves?
Questions the lab is actively scoping or watching. Stage indicates how far along the work is.