Gracefully handle errors in evals #2295

dmontagu · 2025-07-24T04:01:02Z

Also adds retry functionality. Builds on top of the work in #2282.

Still need to:

update the report printing to show errors in the terminal
make sure this change doesn't break the logfire display of pydantic-evals spans
probably add tests and docs

Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..

github-actions · 2025-07-24T04:04:09Z

Docs Preview

commit:	`f07f13d`
Preview URL:	https://25bbb084-pydantic-ai-previews.pydantic.workers.dev

DouweM · 2025-07-25T17:28:50Z

pydantic_evals/pydantic_evals/evaluators/_run_evaluator.py

@@ -19,7 +26,7 @@

 async def run_evaluator(
    evaluator: Evaluator[InputsT, OutputT, MetadataT], ctx: EvaluatorContext[InputsT, OutputT, MetadataT]
-) -> list[EvaluationResult]:
+) -> list[EvaluationResult] | list[EvaluatorFailure]:


Don't forget to update the Returns: docstring

DouweM · 2025-07-25T17:31:07Z

pydantic_evals/pydantic_evals/evaluators/_run_evaluator.py

+    except Exception as e:
+        return [
+            EvaluatorFailure(
+                name=evaluator.get_default_evaluation_name(), error_msg=f'{type(e).__name__}: {e}', source=evaluator


Can we attach the exception itself so it can be read programmatically to get the stack trace if so desired? It wouldn't be serialized of course.

DouweM · 2025-07-25T17:32:22Z

pydantic_evals/pydantic_evals/evaluators/evaluator.py

+    """Represents a failure raised during the execution of an evaluator."""
+
+    name: str
+    error_msg: str


I don't love unnecessary abbreviations in field names, why not just error_message?

DouweM · 2025-07-25T17:32:59Z

pydantic_evals/pydantic_evals/reporting/__init__.py

+    expected_output: OutputT | None
+    """The expected output of the task, from [`Case.expected_output`][pydantic_evals.Case.expected_output]."""
+
+    error_msg: str


Same as above, I'd prefer error_message

DouweM · 2025-07-25T17:39:05Z

pydantic_evals/pydantic_evals/dataset.py

-                    ]
-                ),
+                cases=[x for x in cases_and_failures if isinstance(x, ReportCase)],
+                failures=[x for x in cases_and_failures if isinstance(x, ReportCaseFailure)],


Minor thing, but I'd prefer to iterate just once and append into one of 2 lists

DouweM · 2025-07-25T17:42:19Z

pydantic_evals/pydantic_evals/dataset.py

+            inputs=case.inputs,
+            metadata=case.metadata,
+            expected_output=case.expected_output,
+            error_msg=f'{type(exc).__name__}: {exc}',


As with EvaluatorFailure, I'd like to include the exception itself

DouweM · 2025-07-25T17:43:32Z

Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..

@dmontagu Can't we handle missing fields by just using the default value of []?

dmontagu added 2 commits July 23, 2025 03:05

Add tenacity utilities/integration

0be044e

Fix 3.9 tests

a5523db

dmontagu added 2 commits July 23, 2025 21:19

Address feedback

08140ba

Add retry support and graceful error handling for evals

6bb2a3f

dmontagu force-pushed the dmontagu/graceful-evals-error-handling branch from 5fe2ebf to 6bb2a3f Compare July 24, 2025 04:20

dmontagu added 2 commits July 23, 2025 21:29

Fix various issues

7271c3b

Fix some failing tests

f07f13d

Base automatically changed from dmontagu/retry-handling to main July 25, 2025 17:34

DouweM requested changes Jul 25, 2025

View reviewed changes

DouweM self-assigned this Jul 25, 2025

DouweM added the awaiting author revision label Jul 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gracefully handle errors in evals #2295

Gracefully handle errors in evals #2295

Uh oh!

dmontagu commented Jul 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 24, 2025 •

edited

Loading

Uh oh!

DouweM Jul 25, 2025

Uh oh!

DouweM Jul 25, 2025

Uh oh!

DouweM Jul 25, 2025

Uh oh!

DouweM Jul 25, 2025

Uh oh!

DouweM Jul 25, 2025

Uh oh!

DouweM Jul 25, 2025

Uh oh!

DouweM commented Jul 25, 2025

Uh oh!

Uh oh!

Gracefully handle errors in evals #2295

Are you sure you want to change the base?

Gracefully handle errors in evals #2295

Uh oh!

Conversation

dmontagu commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docs Preview

Uh oh!

DouweM Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM commented Jul 25, 2025

Uh oh!

Uh oh!

dmontagu commented Jul 24, 2025 •

edited

Loading

github-actions bot commented Jul 24, 2025 •

edited

Loading