-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Gracefully handle errors in evals #2295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Docs Preview
|
5fe2ebf
to
6bb2a3f
Compare
@@ -19,7 +26,7 @@ | |||
|
|||
async def run_evaluator( | |||
evaluator: Evaluator[InputsT, OutputT, MetadataT], ctx: EvaluatorContext[InputsT, OutputT, MetadataT] | |||
) -> list[EvaluationResult]: | |||
) -> list[EvaluationResult] | list[EvaluatorFailure]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to update the Returns:
docstring
except Exception as e: | ||
return [ | ||
EvaluatorFailure( | ||
name=evaluator.get_default_evaluation_name(), error_msg=f'{type(e).__name__}: {e}', source=evaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we attach the exception itself so it can be read programmatically to get the stack trace if so desired? It wouldn't be serialized of course.
"""Represents a failure raised during the execution of an evaluator.""" | ||
|
||
name: str | ||
error_msg: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love unnecessary abbreviations in field names, why not just error_message
?
expected_output: OutputT | None | ||
"""The expected output of the task, from [`Case.expected_output`][pydantic_evals.Case.expected_output].""" | ||
|
||
error_msg: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I'd prefer error_message
] | ||
), | ||
cases=[x for x in cases_and_failures if isinstance(x, ReportCase)], | ||
failures=[x for x in cases_and_failures if isinstance(x, ReportCaseFailure)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, but I'd prefer to iterate just once and append into one of 2 lists
inputs=case.inputs, | ||
metadata=case.metadata, | ||
expected_output=case.expected_output, | ||
error_msg=f'{type(exc).__name__}: {exc}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with EvaluatorFailure
, I'd like to include the exception itself
@dmontagu Can't we handle missing fields by just using the default value of |
Also adds retry functionality. Builds on top of the work in #2282.
Still need to:
Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..