Evals
How drover's eval suite is structured and how to add scenarios.
drover ships an eval suite at the repo root (evals/) plus a Vite-based
viewer (apps/eval-viewer/). The suite is intentionally simple — a flat
list of Scenario records plus a runner — so you can read and copy it.
Run the suite
cd evals && bun run.tsRuns every scenario against OpenRouter using the model aliases in
@drover/model. Results land in evals/eval-results/<timestamp>/.
Filter:
bun run.ts write-article fix-code-bugScenario shape
import type { Scenario } from "./scenarios/types.ts";
import { defineAgent } from "@drover/core";
import { Type } from "@sinclair/typebox";
const spec = defineAgent({
id: "summariser",
systemPrompt: "...",
inputSchema: Type.Object({ file: Type.String() }),
outputSchema: Type.Object({ summary: Type.String() }),
model: "cheap",
tools: ["read"],
quota: { maxTurns: 4 },
});
export const scenario: Scenario<typeof spec> = {
id: "summarize-doc",
name: "Summarise an incident report",
inspiredBy: "generic",
description: "Read a doc and produce a 3-bullet summary.",
fixtureDir: "summarize-doc", // optional: evals/fixtures/<name>/
spec,
input: { file: "incident.md" },
};Add to scenarios/index.ts exports. The runner picks it up.
Fixtures
If fixtureDir is set, the runner snapshots evals/fixtures/<name>/
into eval-results/<timestamp>/<scenario>/workdir/ before the run. The
agent’s cwd is the workdir copy — runs don’t mutate the canonical
fixture.
Plugin observability per scenario
The runner attaches stepTracerPlugin() to every scenario via
options.plugins. After the run, the recorded steps land in
result.json.trace.
If your scenario needs additional plugins (e.g. phaseRecorderPlugin),
attach via spec.plugins. The runner’s tracer is additive.
Skills
If the spec declares skills and the fixture has a skills/ dir, the
runner auto-scans + builds a registry per run. See
Skills for the layout.
MCP
When any scenario in the run set declares mcpServers, the runner lazy-
boots an MCP runtime with the configured fixtures (currently the in-repo
stdio server at evals/fixtures/mcp-stdio/server.ts). See MCP.
Runtime-queue scenario
runtime-queue exercises @drover/runtime end-to-end: enqueue 5 echo
jobs with concurrency: 3, wait for terminal status on each, assert
done. Uses an in-memory queue + in-memory storage so it’s hermetic.
Viewer
cd apps/eval-viewer && bun run devHash-routed pages:
/— every runset with chip-grid jumpoffs to scenarios#/r/<runset>— runset table#/r/<runset>/<scenario>— full scenario detail with timeline#/storage(ifDROVER_STORAGE_URLset) — runs from libsql storage
The timeline component renders the full event stream: assistant text in markdown, thinking blocks collapsed by default, tool cards expanding to show input + result.