Evaluations - Hipocap documentation

Evaluations score traces so you can quantify improvements and catch regressions as models, prompts, and code change.

Introduction

What evals are and when to use them.

Run your first evaluation.

Evaluate against curated data.

Score production traffic continuously.

Next: turn failures into reusable test data with Datasets and Queues.