研究室Blossom Labs

Studying how AI
should learn, reason,
and collaborate.

Blossom Labs is the research arm of Blossom AI. We study scalable knowledge discovery — how systems should learn from simulation, from operators, and from each other — and turn what we find into the operating principles behind our product.

知識は道具ではなく、
練習である。Knowledge is not a tool. It is a practice.
— Lab principle, 2025
原則Principles

How we work,
and why.

  1. 01
    問い

    Start from the production question.

    A research question is only worth pursuing if we can name where it bites us in production. We work backwards from the operator's problem.

  2. 02
    模擬

    Practice in simulation before the world.

    Reinforcement learning in carefully built environments lets agents make a thousand mistakes before they make their first real one. The fidelity of the sim is the bar we hold ourselves to.

  3. 03
    協働

    Experts teach. Models defer.

    Domain expertise should travel into our models faster than it travels out. We build pipelines where specialists can correct, audit, and steer — and where the model knows when to ask.

  4. 04
    校正

    Calibration before performance.

    A model that is wrong with confidence costs more than a model that is right with humility. We grade refusal, deferral, and uncertainty as first-class outcomes.

  5. 05
    発表

    Publish what we find.

    The lab publishes — papers, notes, benchmarks. The product benefits, but the field has to be able to check our work, push back, and improve on it.

模擬Reinforcement learning · in simulation

Practice before the real world.

Most of our agents have already done a job a thousand times before they meet the real one. We build small, carefully shaped environments — operations problems stripped of cosmetic complexity but not of structure — and let agents learn there first.

Each episode the agent gets a little less random and a little more deliberate. The point is not the gridworld. The point is the ritual: explore, fail, update, try again. By the time the policy reaches a real customer, it has already seen the shape of the work.

Environment
Operations gridworld · 8×8
Algorithm
ε-greedy · decaying explore
Goal
Reach target · avoid obstacles
Reward
+1 per goal · −0.01 per step
未解決Open problems

What we’re working on now.

Three featured directions, plus the broader index of questions the lab is paying attention to.

01

Routing under distribution shift

Policies degrade quietly when production traffic moves away from their evaluation set. We are working on detection and graceful reroute, in deployment with two partners.

in progress · Q2 2026
02

Calibrated refusal in long-horizon agents

When should an agent stop and ask? When should it abort? We are studying the cost of overconfidence in multi-step work where every step compounds.

in progress · Q2 2026
03

Expert-in-loop discovery

How quickly can specialist knowledge enter a model — and how do we keep that knowledge from going stale as the practice it came from moves?

in progress · Q3 2026
索引Directory

The broader index.

Questions the lab is actively scoping or watching. Stage indicates how far along the work is.

経路Routing
  • Per-tenant policy learningRouting policies that adapt to a single customer's distribution without overfitting.In progress
  • Drift detection without labelsCatching silent quality regressions when ground truth is delayed by weeks.Scoping
  • Speculative decoding as a routing primitiveTreating fast/slow model pairs as a single dispatch decision.Open
推論Reasoning
  • Calibrated refusal in long-horizon agentsFeatured aboveIn progress
  • Reasoning under distribution shiftFeatured aboveIn progress
  • Outcome-supervised progress rewardsLearning which reasoning steps, tool calls, and routing decisions actually move work forward from final success signals alone.In progress
  • Verifier-guided self-consistency for operationsLightweight verifiers that survive prompt drift.Scoping
  • Confidence as a learned signal, not a heuristicTraining models that say "I don't know" because they don't.Scoping
協働Collaboration
  • Expert-in-loop discoveryFeatured above.In progress
  • Generative interfaces for agent operationsAgent businesses will not be operated through chat alone. We are studying how agents can generate the right control surface — dashboards, approval flows, exception views, and audit trails — around each task.Open
  • Where to ask, where to defer, where to actDecision policies for when human attention is itself a scarce resource.Scoping
  • Auditable disagreement between operator and modelWhen the model is right but the human disagrees — what should the system do?Open
模擬Simulation
  • High-fidelity simulators for logistics dispatchBuilding environments rich enough for sim-to-real to mean something.In progress
  • RL pretraining as an operations curriculumCurriculum design for agents that will work alongside specialists.Scoping
  • Synthetic operator behavior at scaleModeling the human side of the loop in our training environments.Open