Harnesses
A harness is a visual graph that defines how your agents are wired together — inputs, agents, guardrails, and outputs on a canvas. Build it, run it, evaluate it, and export portable OpenTelemetry traces.
A harness is the structure around your model — the wiring that turns a raw LLM into a reliable agent. In Invoked you build harnesses on a visual canvas: connect inputs, agents, guardrails, and outputs into a graph, then run, evaluate, and export them.
The thesis is simple: the harness matters more than the model. A good harness makes a modest model dependable; a bad one wastes a great model. Invoked makes that harness explicit, observable, and reusable.
Harnesses are a top-level section in the sidebar with three tabs:
Build
The Build tab is the design canvas. Drag nodes onto the graph and connect them:
- Input — where a run's data enters (source only).
- Agent — a model with a role, tools, and instructions.
- Guardrail — a check placed between agents to validate or gate output.
- Output — where results leave the graph (sink only).
The canvas guides valid connections based on the graph grammar, so the structure stays sound as you build.
Evals
The Evals tab compares harness runs side by side, so you can measure whether a change actually improved quality rather than guessing. Evals turn iteration into a feedback loop: run, measure, adjust.
Export
The Export tab emits a portable trace of a run in an OpenTelemetry / OpenInference-shaped format (steps, tool calls, latency). It drops straight into observability and eval tools like Langfuse, Phoenix, or Braintrust — your traces aren't locked inside Invoked.
Harness recipes can also be exported as YAML for PR-style review and sharing, while the living run history and performance stats stay in your workspace.