> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hipocap.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Laminar evaluations cookbook

## Basic correctness evaluation

In this example our executor function calls an LLM to get the capital of a country.
We then evaluate the correctness of the prediction by checking for exact match with the target capital.

<Steps>
  <Step title="1. Define an executor function">
    The executor function calls OpenAI to get the capital of a country.
    The prompt also asks to only name the city and nothing else. In a real scenario,
    you will likely want to use structured output to get the city name only.

    <Tabs>
      <Tab title="JavaScript/TypeScript">
        ```javascript theme={null}
        import OpenAI from 'openai';

        const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY});

        const getCapital = async (
            {country}: {country: string}
        ): Promise<string> => {
            const response = await openai.chat.completions.create({
                model: 'gpt-4o-mini',
                messages: [
                    {
                        role: 'system',
                        content: 'You are a helpful assistant.'
                    }, {
                        role: 'user',
                        content: `What is the capital of ${country}?` +
                        ' Just name the city and nothing else'
                    }
                ],
            });
            return response.choices[0].message.content ?? ''
        }

        ```
      </Tab>

      <Tab title="Python">
        ```python theme={null}
        from openai import AsyncOpenAI

        openai_client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])

        async def get_capital(data):
            country = data["country"]
            response = await openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {
                        "role": "user",
                        "content": f"What is the capital of {country}? "
                        "Just name the city and nothing else",
                    },
                ],
            )
            return response.choices[0].message.content.strip()
        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="2. Define an evaluator function">
    The evaluator function checks for exact match and returns 1 if the executor output
    matches the target, and 0 otherwise.

    <Tabs>
      <Tab title="JavaScript/TypeScript">
        ```javascript theme={null}
        const evaluator = async (output: Promise<string>, target?: {capital: string}) =>
            (await output) === target?.capital ? 1 : 0
        ```
      </Tab>

      <Tab title="Python">
        ```python theme={null}
        def evaluator(output, target):
            return 1 if output == target["capital"] else 0
        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="3. Define data and run the evaluation">
    <Tabs>
      <Tab title="JavaScript/TypeScript">
        ```javascript my-eval.ts theme={null}
        import { evaluate } from '@lmnr-ai/lmnr';

        const evaluationData = [
            { data: { country: 'Canada' }, target: { capital: 'Ottawa' } },
            { data: { country: 'Germany' }, target: { capital: 'Berlin' } },
            { data: { country: 'Tanzania' }, target: { capital: 'Dodoma' } },
        ]

        evaluate({
            data: evaluationData,
            executor: async (data) => await getCapital(data),
            evaluators: { checkCapitalCorrectness: evaluator },
            config: {
                projectApiKey: process.env.LMNR_PROJECT_API_KEY
            }
        })
        ```

        And then run either `ts-node my-eval.ts` or `npx lmnr eval my-eval.ts`.
      </Tab>

      <Tab title="Python">
        ```python my-eval.py theme={null}
        from lmnr import evaluate
        import os

        data = [
            {"data": {"country": "Germany"}, "target": {"capital": "Berlin"}},
            {"data": {"country": "Canada"}, "target": {"capital": "Ottawa"}},
            {"data": {"country": "Tanzania"}, "target": {"capital": "Dodoma"}},
        ]

        evaluate(
            data=data,
            executor=get_capital,
            evaluators={'check_capital_correctness': evaluator},
            project_api_key=os.environ["LMNR_PROJECT_API_KEY"],
        )

        ```

        And then run either `python my-eval.py` or `lmnr eval my-eval.py`.
      </Tab>
    </Tabs>
  </Step>
</Steps>

## LLM as a judge offline evaluation

In this example, our executor will write short summaries of news articles,
and the evaluator will check if the summary is correct, and grade them from 1 to 5.

<Steps>
  <Step title="1. Prepare your data">
    The trick here is that the evaluator function needs to see the original article to evaluate the summary.
    That is why, we will have to duplicate the article from `data` into `target` prior to running the evaluation.

    The data may look something like the following:

    ```json theme={null}
    [
        {
            "data": {
                "article": "Laminar has released a new feature. ...",
            },
            "target": {
                "article": "Laminar has released a new feature. ...",
            }
        }
    ]
    ```
  </Step>

  <Step title="2. Define an executor function">
    An executor function calls OpenAI to summarize a news article. It returns a single string, the summary.

    <Tabs>
      <Tab title="JavaScript/TypeScript">
        ```javascript theme={null}
        import OpenAI from 'openai';

        const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY});

        const getSummary = async (data: {article: string}): Promise<string> => {
            const response = await openai.chat.completions.create({
                model: 'gpt-4o-mini',
                messages: [
                    {
                        role: "system",
                        content: "Summarize the articles that the user sends you"
                    }, {
                        role: "user",
                        content: data.article,
                    },
                ],
            });
            return response.choices[0].message.content ?? ''
        }

        ```
      </Tab>

      <Tab title="Python">
        ```python theme={null}
        from openai import AsyncOpenAI

        openai_client = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])

        async def get_summary(data: dict[str, str]) -> str:
            response = await openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {
                        "role": "system",
                        "content": "Summarize the articles that the user sends you"
                    }, {
                        "role": "user",
                        "content": data["article"],
                    },
                ],
            )
            return response.choices[0].message.content.strip()

        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="3. Define an evaluator function">
    An evaluator function grades the summary from 1 to 5. It returns an integer.
    We've simply asked OpenAI to respond in JSON, but you may want to use
    structured output or BAML instead.

    We also ask the LLM to give a comment on the summary. Even though we don't use it in the evaluation,
    it may be useful for debugging or further analysis. In addition, LLMs are known to perform better
    when given a chance to explain their reasoning.

    <Tabs>
      <Tab title="JavaScript/TypeScript">
        ```javascript theme={null}
        import OpenAI from 'openai';

        const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY});

        const gradeSummary = async (
            summary: string,
            target: {article: string}
        ): Promise<number> => {
            const response = await openai.chat.completions.create({
                model: 'gpt-4o-mini',
                messages: [{
                    role: "user",
                    content: "Given an article and its summary, grade the " +
                        "summary from 1 to 5. Answer in json. For example: " +
                        '{"grade": 3, "comment": "Summary is missing key points"}' +
                        `Article: ${target['article']}. Summary: ${summary}`
                }],
            });
            return JSON.parse(response.choices[0].message.content ?? '')["grade"]
        }
        ```
      </Tab>

      <Tab title="Python">
        ```python theme={null}
        import json
        async def grade_summary(summary: str, target: dict[str, str]) -> int:
            response = await openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{
                    "role": "user",
                    "content": "Given an article and its summary, grade the " +
                      "summary from 1 to 5. Answer in json. For example: " +
                      '{"grade": 3, "comment": "Summary is missing key points"}' +
                      f"Article: {target['article']}. Summary: {summary}"
                    },
                ],
            )
            return int(
                json.loads(response.choices[0].message.content.strip())["grade"]
            )

        ```
      </Tab>
    </Tabs>
  </Step>

  <Step title="4. Run the evaluation">
    <Tabs>
      <Tab title="JavaScript/TypeScript">
        ```javascript my-eval.ts theme={null}
        import { evaluate } from '@lmnr-ai/lmnr';

        const evaluationData = [
            { data: { article: '...' }, target: { article: '...' } },
            { data: { article: '...' }, target: { article: '...' } },
            { data: { article: '...' }, target: { article: '...' } },
        ]

        evaluate({
            data: evaluationData,
            executor: async (data) => await getSummary(data),
            evaluators: { gradeSummary: gradeSummary },
            config: {
                projectApiKey: process.env.LMNR_PROJECT_API_KEY
            }
        })
        ```

        And then run either `ts-node my-eval.ts` or `npx lmnr eval my-eval.ts`.
      </Tab>

      <Tab title="Python">
        ```python my-eval.py theme={null}
        from lmnr import evaluate
        import os

        data = [
            {"data": {"article": '...'}, "target": {"article": '...'}},
            {"data": {"article": '...'}, "target": {"article": '...'}},
            {"data": {"article": '...'}, "target": {"article": '...'}},
        ]

        evaluate(
            data=data,
            executor=get_summary,
            evaluators={'grade_summary': grade_summary},
            project_api_key=os.environ["LMNR_PROJECT_API_KEY"],
        )

        ```

        And then run either `python my-eval.py` or `lmnr eval my-eval.py`.
      </Tab>
    </Tabs>
  </Step>
</Steps>

## Evaluation with no target

Sometimes you may want to run evaluations on the output of the executor without a target.
This can be useful, for example, to check if the output of the executor is in the correct format
or if you want to use an LLM as a judge evaluator that generally evaluates the output.

<Tabs>
  <Tab title="JavaScript/TypeScript">
    This is as simple as not passing `target` to your evaluator functions.

    ```javascript theme={null}
    function isOutputLongEnough(output) {
        return output.length > 100 ? 1 : 0
    }
    ```
  </Tab>

  <Tab title="Python">
    In Python, you will have to add `*args, **kwargs` to your evaluator function.

    ```python theme={null}
    def is_output_long_enough(output, *args, **kwargs):
        return int(len(output) > 100)
    ```
  </Tab>
</Tabs>

And for your dataset, you can just remove the `target` field. For example:

```json theme={null}
[
    { "data": { "article": "..." } },
    { "data": { "article": "..." } },
    { "data": { "article": "..." } },
]
```
