Skip to main content
Evaluations score traces so you can quantify improvements and catch regressions as models, prompts, and code change.

Introduction

What evals are and when to use them.

Quickstart

Run your first evaluation.

Using a Dataset

Evaluate against curated data.

Online Evaluators

Score production traffic continuously.
Next: turn failures into reusable test data with Datasets and Queues.