研究室Blossom Labs

Studying how AI
should learn, reason,
and collaborate.

Blossom Labs is the research arm of Blossom AI. We study scalable knowledge discovery — how systems should learn from simulation, from operators, and from each other — and turn what we find into the operating principles behind our product.

知識は道具ではなく、
練習である。Knowledge is not a tool. It is a practice.
— Lab principle, 2025

原則Principles

How we work,
and why.

01
問い
Start from the production question.
A research question is only worth pursuing if we can name where it bites us in production. We work backwards from the operator's problem.
02
模擬
Practice in simulation before the world.
Reinforcement learning in carefully built environments lets agents make a thousand mistakes before they make their first real one. The fidelity of the sim is the bar we hold ourselves to.
03
協働
Experts teach. Models defer.
Domain expertise should travel into our models faster than it travels out. We build pipelines where specialists can correct, audit, and steer — and where the model knows when to ask.
04
校正
Calibration before performance.
A model that is wrong with confidence costs more than a model that is right with humility. We grade refusal, deferral, and uncertainty as first-class outcomes.
05
発表
Publish what we find.
The lab publishes — papers, notes, benchmarks. The product benefits, but the field has to be able to check our work, push back, and improve on it.

模擬Reinforcement learning · in simulation

Practice before the real world.

Most of our agents have already done a job a thousand times before they meet the real one. We build small, carefully shaped environments — operations problems stripped of cosmetic complexity but not of structure — and let agents learn there first.

Each episode the agent gets a little less random and a little more deliberate. The point is not the gridworld. The point is the ritual: explore, fail, update, try again. By the time the policy reaches a real customer, it has already seen the shape of the work.

Environment: Operations gridworld · 8×8
Algorithm: ε-greedy · decaying explore
Goal: Reach target · avoid obstacles
Reward: +1 per goal · −0.01 per step

未解決Open problems

What we’re working on now.

Three featured directions, plus the broader index of questions the lab is paying attention to.

01分布シフト

Routing under distribution shift

Policies degrade quietly when production traffic moves away from their evaluation set. We are working on detection and graceful reroute, in deployment with two partners.

in progress · Q2 2026

02校正された拒否

Calibrated refusal in long-horizon agents

When should an agent stop and ask? When should it abort? We are studying the cost of overconfidence in multi-step work where every step compounds.

in progress · Q2 2026

03専門家の介入

Expert-in-loop discovery

How quickly can specialist knowledge enter a model — and how do we keep that knowledge from going stale as the practice it came from moves?

in progress · Q3 2026

索引Directory

The broader index.

Questions the lab is actively scoping or watching. Stage indicates how far along the work is.

経路Routing

Per-tenant policy learningRouting policies that adapt to a single customer's distribution without overfitting.In progress
Drift detection without labelsCatching silent quality regressions when ground truth is delayed by weeks.Scoping
Speculative decoding as a routing primitiveTreating fast/slow model pairs as a single dispatch decision.Open

推論Reasoning

Calibrated refusal in long-horizon agentsFeatured aboveIn progress
Reasoning under distribution shiftFeatured aboveIn progress
Outcome-supervised progress rewardsLearning which reasoning steps, tool calls, and routing decisions actually move work forward from final success signals alone.In progress
Verifier-guided self-consistency for operationsLightweight verifiers that survive prompt drift.Scoping
Confidence as a learned signal, not a heuristicTraining models that say "I don't know" because they don't.Scoping

協働Collaboration

Expert-in-loop discoveryFeatured above.In progress
Generative interfaces for agent operationsAgent businesses will not be operated through chat alone. We are studying how agents can generate the right control surface — dashboards, approval flows, exception views, and audit trails — around each task.Open
Where to ask, where to defer, where to actDecision policies for when human attention is itself a scarce resource.Scoping
Auditable disagreement between operator and modelWhen the model is right but the human disagrees — what should the system do?Open

模擬Simulation

High-fidelity simulators for logistics dispatchBuilding environments rich enough for sim-to-real to mean something.In progress
RL pretraining as an operations curriculumCurriculum design for agents that will work alongside specialists.Scoping
Synthetic operator behavior at scaleModeling the human side of the loop in our training environments.Open

論文Papers & notes

Recent work.

Full archive →

Apr 2026論文Routing under distribution shiftPaper · arXiv →Mar 2026評価Cost-optimal selection across 14 production modelsEval · Tech report →Feb 2026覚書On the discipline of the kataNote · Lab journal →Jan 2026論文Expert-in-loop training for clinical operationsPaper · NeurIPS '26 →Dec 2025模擬High-fidelity simulation for logistics agentsTech report →Nov 2025評価Refusal as a calibrated outcomeNote · Lab journal →

Studying how AIshould learn, reason,and collaborate.

How we work,and why.

Start from the production question.

Practice in simulation before the world.

Experts teach. Models defer.

Calibration before performance.

Publish what we find.