Skip to content

More tests, docs, and minor things #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 9, 2022
Merged

Conversation

cretz
Copy link
Member

@cretz cretz commented Jun 7, 2022

What was changed

  • More README docs including build docs (opened Add more robust docs to docstrings #40 for more docstring docs)
  • Update core submodule origin URL to be http instead of SSH
  • Add retry policy to workflow info
  • Support disabling deadlock timeout with env var
  • Track tasks ourselves to since Python only uses weak references for them (see https://bugs.python.org/issue21163 and others)
  • Give descriptive task names in Python versions that support it
  • Many more tests

Checklist

  1. Closes More workflow tests #28
  2. Closes Add build docs to README #39

@cretz cretz force-pushed the more-tests branch 2 times, most recently from 69dc4cf to 0355f47 Compare June 7, 2022 14:41
@cretz cretz marked this pull request as ready for review June 7, 2022 16:58
@cretz cretz requested review from bergundy, Sushisource and a team June 7, 2022 16:58
@cretz cretz mentioned this pull request Jun 7, 2022

#### Testing

Tests currently require [Go](https://go.dev/) to be installed since they use an embedded Temporal server as a library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to get rid of this soon?

Copy link
Member Author

@cretz cretz Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things in Go right now: Temporalite wrapper and "kitchen-sink" workflow/worker. I am waiting on TLS support and distributed Temporalite binaries for the former. For the latter, probably at the same time I get rid of the former, I will rewrite the workflow in Python and get rid of the Go worker.

I could move to docker compose I suppose, though Temporalite is very fast for our use case.

@@ -35,6 +37,21 @@ class RetryPolicy:
non_retryable_error_types: Optional[Iterable[str]] = None
"""List of error types that are not retryable."""

@staticmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use a classmethod instead and instaniate cls() below instead of referencing the class by name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like being explicit on class instantiation here (and I have chosen to use @staticmethod throughout, though Google style guide wants @classmethod; I have documented the deviation in the README too)

Comment on lines +923 to +925
# Mark it as started each loop because backoff could cause it to
# be marked as unstarted
handle._started = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to local activities?

Copy link
Member Author

@cretz cretz Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we share a good bit of code w/ local/non-local activity handling so it's in this shared space

Comment on lines +1068 to +1069
if hasattr(task, "_log_destroy_pending"):
setattr(task, "_log_destroy_pending", False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only do this for workflows that we know we've evicted or after shutdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only applies in those two cases since this is only used on GC and I hold a strong reference until eviction/shutdown or the task is done. I just choose to do this at registration time instead of loop over all tasks at the end.

@workflow.run
async def run(self) -> None:
# Wait forever
await asyncio.Future()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it's worth having a workflow method to "block" like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure a "block forever" is a primitive we need in any lang. I suppose it's just await new Promise(() => {}) in TS which I figure we don't have a utility for. I guess you could await workflow.wait_condition(lambda: false) too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in TS you have to use await new Trigger() or await CancellationScope.current().cancelled or await condition(() => false).

There's no exported function and I have gotten questions on how to achieve this.
I recommend waiting on the cancellation scope directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sleeping until cancel is a bit of an anti-pattern due to history growth, though some people trust they'll stay low until they cancel. Python doesn't have an explicit wait forever so not sure we need one. They could just as easily do await asyncio.Event().wait() or anything like that.


@workflow.query
def bad_query(self) -> NoReturn:
raise ApplicationError("query fail", 456)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need an ApplicationError here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do if I want to test how details are handled :-) Though I note "(no details on query failure)" in the test, so it became useless.

assert isinstance(err.value.cause, ApplicationError)
assert list(err.value.cause.details) == [123]
# Fail query (no details on query failure)
with pytest.raises(RPCError) as rpc_err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wrap query failures in other SDKs with QueryRejectedError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, at least in TS that's only on successful query response with a query_rejected field which is different than this failure.

But yes, in other SDKs we do sometimes make the RPC errors more palatable in some cases. But some SDKs get it wrong (e.g. assuming not found means workflow not found). This was discussed at #32 (comment) and temporalio/features#59 was opened as a result.

await assert_eq_eventually(True, child_started)
# Send cancel signal and wait on the handle
await handle.signal(CancelChildWorkflow.cancel_child)
await handle.result()
assert isinstance(err.value.cause, ChildWorkflowError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon my laziness, remind me what err.value is here? Is it the Failure object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the Python Exception. Pytest wraps it, so err here is technically https://docs.pytest.org/en/7.1.x/reference/reference.html#exceptioninfo

Comment on lines +1007 to +1008
event.timer_started_event_attributes.start_to_fire_timeout.ToMilliseconds()
== 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. looking at this now I'm wondering if it's worth not sending this command if it's cancelled in the same activation. (Would be a backwards incompatible Core change).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I saw in history, it seems core properly handles cancelling out a command sent in the same activation completion. It's for that reason I didn't add extra code to remove commands of things I cancel in the same completion before sending the completion - it seems like core does it.

Comment on lines +1747 to +1748
assert 1 == sum(
1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very pythonic :)

Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nothing blocking here.

I think there were some other missing tests that I mentioned in the last PR, also not blocking this PR.

@cretz
Copy link
Member Author

cretz commented Jun 8, 2022

I think there were some other missing tests that I mentioned in the last PR, also not blocking this PR.

@bergundy - There are probably some I missed, but can you point me at them again just so I have them tracked?

@cretz cretz mentioned this pull request Jun 8, 2022
@bergundy
Copy link
Member

bergundy commented Jun 8, 2022

Did you check all interceptors e2e?
Did you check that if workflow receives signal while WFT is in flight the workflow is rewinded and completes after signal is processed?

@cretz
Copy link
Member Author

cretz commented Jun 8, 2022

I will add tests for those two

EDIT: Only adding test for the first one for now and will do the next in another PR (after off-PR discussions about how best to solve).

@cretz cretz merged commit 81e17c0 into temporalio:main Jun 9, 2022
@cretz cretz deleted the more-tests branch June 9, 2022 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add build docs to README More workflow tests
3 participants