Skip to content

Gracefully handle errors in evals #2295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dmontagu
Copy link
Contributor

@dmontagu dmontagu commented Jul 24, 2025

Also adds retry functionality. Builds on top of the work in #2282.

Still need to:

  • update the report printing to show errors in the terminal
  • make sure this change doesn't break the logfire display of pydantic-evals spans
  • probably add tests and docs

Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..

Copy link

github-actions bot commented Jul 24, 2025

Docs Preview

commit: f07f13d
Preview URL: https://25bbb084-pydantic-ai-previews.pydantic.workers.dev

@dmontagu dmontagu force-pushed the dmontagu/graceful-evals-error-handling branch from 5fe2ebf to 6bb2a3f Compare July 24, 2025 04:20
Base automatically changed from dmontagu/retry-handling to main July 25, 2025 17:34
@@ -19,7 +26,7 @@

async def run_evaluator(
evaluator: Evaluator[InputsT, OutputT, MetadataT], ctx: EvaluatorContext[InputsT, OutputT, MetadataT]
) -> list[EvaluationResult]:
) -> list[EvaluationResult] | list[EvaluatorFailure]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the Returns: docstring

except Exception as e:
return [
EvaluatorFailure(
name=evaluator.get_default_evaluation_name(), error_msg=f'{type(e).__name__}: {e}', source=evaluator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we attach the exception itself so it can be read programmatically to get the stack trace if so desired? It wouldn't be serialized of course.

"""Represents a failure raised during the execution of an evaluator."""

name: str
error_msg: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love unnecessary abbreviations in field names, why not just error_message?

expected_output: OutputT | None
"""The expected output of the task, from [`Case.expected_output`][pydantic_evals.Case.expected_output]."""

error_msg: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I'd prefer error_message

]
),
cases=[x for x in cases_and_failures if isinstance(x, ReportCase)],
failures=[x for x in cases_and_failures if isinstance(x, ReportCaseFailure)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, but I'd prefer to iterate just once and append into one of 2 lists

inputs=case.inputs,
metadata=case.metadata,
expected_output=case.expected_output,
error_msg=f'{type(exc).__name__}: {exc}',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with EvaluatorFailure, I'd like to include the exception itself

@DouweM
Copy link
Contributor

DouweM commented Jul 25, 2025

Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..

@dmontagu Can't we handle missing fields by just using the default value of []?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants