Mycelium
Query-driven. Agent-native. Growing.
Open-source framework for agent-native data networks: scoped ecosystems where a supervisor routes lookups to specialist agents. MCP and CLI; CRM and Lahman baseball reference examples; custom networks via network create.
What ships today
Open-source toolkit for data your AI agents can query and enrich. Define a domain, connect an MCP client, ask for attributes — specialists research on miss and cache what they learn. Two full reference networks ship today: a person CRM and historic baseball stats.
814 automated tests on the framework repo. Early prototype — expect rough edges.
Ask, don’t ingest
Look up a contact, player, or team with simple key–value fields. No SQL and no public write API — you query, the network assembles an answer from what it already knows (and what it can find).
Answers get smarter over time
Repeat questions hit cache and come back fast. The first time you ask for something missing, background agents search the web, store the result, and serve it on the next ask.
Two-step lookups
Step one finds the right record (or tells you what’s ambiguous). Step two returns the fields you wanted. Built for real agent flows — clarify, retry, or confirm before you commit.
Plug in your LLM client
MCP server with describe_network, query_entity, and health_check. Connect Claude Desktop or any MCP client; one long-lived process per network.
CRM examples
Seeded contact list, a blank network that grows from the first lookup, and a demo that quotes a price before running paid research.
Baseball example
Historic MLB data (Lahman): look up players and teams, pull career stats, compute batting and pitching numbers on demand, trace where a value came from.
Build your own network
Scaffold a new domain with network create — categories, specialists, optional seed file. Copy-paste walkthroughs in the repo; local admin UI to watch records fill in.
Try it
Clone the framework, bootstrap a reference network, and run the two-step query protocol.
git clone https://github.com/myceliumdata/mycelium.git
cd mycelium && uv sync --all-extras
cp .env.example .env # OPENAI_API_KEY required
./bin/refresh-example-network crm-seeded --yes
uv run mycelium query --network crm-seeded \
--lookup-json '{"name":"Nichanan Kesonpat","employer":"1k(x)"}'
# copy delivery_id from JSON, then:
uv run mycelium query --network crm-seeded --delivery-id d_… New contributors: onboarding → example walkthroughs → architecture
| Example | Command | Story |
|---|---|---|
| crm-seeded | ./bin/refresh-example-network crm-seeded --yes | 15-person bootstrap → entities.json; fuzzy lookup + contact research |
| crm-empty | ./bin/refresh-example-network crm-empty --yes | No seed — empty registry until first query bind creates the row |
| crm-metering | ./bin/refresh-example-network crm-metering --yes | Priced research negotiation (quote_required → pay_quote → deliver) |
| baseball | ./bin/refresh-example-network baseball --yes | Lahman warehouse (~3–4 min); player + team; derive + provenance; gate 34/34 |
Custom network: network create <name> --root <path> [--seed <file>] — --seed is optional; an empty registry plus first-query bind is valid.
How a query works
From network bootstrap to cached specialist results — same protocol across CRM and baseball.
Refresh or create a network
Copy an example (./bin/refresh-example-network) or run network create with optional --seed. crm-empty needs no fixture — first bind writes entities.json. Baseball bootstraps a full Lahman warehouse (~3–4 min).
Connect an agent (MCP)
One server entry per network. Call describe_network at connect time — response includes guide.md, ontology categories, record types, and framework usage policy.
Two-step query
Step 1: lookup JSON (or registry id) → delivery_id. Step 2: delivery_id → assembled results. Supervisor classifies fields, routes to specialists, runs research or derive on cache miss.
crm-seeded
Lookup Andrea Kalmans → step 2 requests email → contact specialist researches once → cached on repeat.
crm-empty
First bind for Paul Murphy @ Acme Corp creates the registry row. The network grows from zero without bootstrap people.
baseball
Lookup Hank Aaron → career_hr from warehouse manifest or derive-on-miss → provenance shows computation lineage.
For agents (MCP)
Built for LLM client integrators. One long-lived MCP server per network; call describe_network at connect time. query_entity uses the two-step protocol — lookup-only, no caller ingest payload.
query_entity — step 1 (resolve)
{
"lookup": {"employer": "645 Ventures"},
"requested_attributes": ["email"],
"thread_id": "optional-session-id"
} query_entity — step 2 (deliver)
{
"delivery_id": "d_abc123"
} Step 2 response snippet
{
"outcome": "assembled",
"suggestions": [],
"results": [
{
"id": "3fe6db14-a41d-50fe-9959-c5263dc5f53b",
"name": "Andrea Kalmans",
"employer": "Lontra Ventures",
"email": "[email protected]"
}
],
"message": "Found record for Andrea Kalmans; assembled from registry and specialist contributions."
}
Branch on outcome after each step. Step 1: lookup_resolved, lookup_incomplete, lookup_suggested, quote_required. Step 2: found, assembled, not_found.
MCP setup + Claude Desktop snippet: docs/examples/getting-started
Under the hood
Supervisor routing, specialist research and derive, warehouse-backed stats, query-only public surface.
Registry + routing
Canonical entities.json (UUID + bind keys) — not SQLite. Supervisor routes by record type (person, player, team) and MVR lookup fields.
Specialist collectives
Generated agents per category. Research on miss (LLM + search); warehouse manifest stats; derive-on-miss with computation-centric provenance.
Structured agent API
MCP and CLI return QueryResponse with outcome, delivery, suggestions, results, and message. Metering negotiation on crm-metering demo network.
Query graph (simplified)
Vision
Today's data sources are rigid, manually maintained, and lag behind the speed of agentic AI systems.
Mycelium aims to change that.
Agents bind and enrich entities; operators steer via network guides and validation rules. CRM and baseball examples prove the pattern across person networks and warehouse-backed stats. The long-term goal is data infrastructure where AI agents take primary ownership of organization, quality, and evolution.
"Mycelium lets AIs discover, structure, update, and serve data in real time, with operators setting policy instead of hand-curating every field."
Future directions
- → More reference networks beyond CRM and baseball (e.g. fleet, agronomic via network create)
- → Schema evolution guided by operator ontology and validation rules
- → Inter-network handoff between scoped ecosystems
What we don't claim yet
- · Turnkey networks for every domain without operator setup
- · Public data-ingest API (returns as internal agent coordination later)
- · Inter-network routing
- · Fully autonomous schema evolution without operator ontology