-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Allow reading multiple files simultaneously from Tensorboard #2346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Allows using multiple eval workers at once. They write events to Tensorboard independently. Make Tensorboard support reading from multiple files simultaneously. Note that under the assumptions of the old class, the behavior remains the same, so existing jobs with one eval worker will not be affected.
Hi @bcoopers—thank you for the PR! We’re definitely interested in this I believe that in the past we’ve run into performance issues when The best person to review this is @nfelt; he’s OOO right now, but will |
For Googlers, cf. http://cl/253027277. |
If you want I can add a max # of files parameter (say a thousand) and start expiring the least-recently-updated files once we have that many. |
Friendly ping |
Hi and thanks for your interest in contributing, and apologies for so long to get to this PR. I have a few reservations about this approach, relative to the approach in #1867.
I should be able to finish off #1867 in mid-July. I realize it's been a long time coming, and I was hoping I could get to it this week, but unfortunately I didn't and I'll be OOO next week. If you'll bear with us until then, I think that will be the best path to fixing the underlying issue #1063. |
I've submitted #1867, please take a look and feel free to provide feedback. I believe it should satisfy the use case for this PR, so I will go ahead and close it, but let me know if you'd like it re-opened. Note that our sync into Google is currently blocked on some unrelated issues, but once those are resolved this should land internally soon. |
Motivation for features / changes
Enables setting multiple evaluation workers in TF estimator. Each writes to its own file. Now all are read by Tensorboard, allowing greater statistical significance and/or throughput rate of points written.
Technical description of changes
Reads from all files, publishing the event with the most recent step number. For identical step numbers publishes the earlier time.
Screenshots of UI changes
Please reach out to me for this. I cannot share this externally I think.
Detailed steps to verify changes work correctly (as executed by you)
Ran training, observed results on Tensorboard. Ran unit test- it passes.
Alternate designs / implementations considered
None