-
Notifications
You must be signed in to change notification settings - Fork 191
ENH: remove memory addresses from param repr() #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can maybe also add a check that there are no duplicate reprs present (maybe simplest to raise informative exception in benchmarks.py when constructing the benchmarks).
Initially there was an idea that the repr() would not be used, only the ordinal, and that's followed in most places but not everywhere. This could be used to solve the issue of identical reprs, probably needs some changes in results.py:_compatible_results
and maybe in compare.py
.
asv/benchmark.py
Outdated
result = repr(obj) | ||
address_regex = re.compile('^<.* at 0x[\da-fA-F]*>$') | ||
if address_regex.match(result): | ||
result = '<{} {}>'.format(str(obj.__class__.__name__), str(obj.__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs at least to guard against missing attributes, as obj
may be any object whatsoever, and the repr may contain anything. I'd actually prefer to guard this with if isinstance(obj, ...):
, so that the way the transformation is applied is actually controlled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, I'll see if I can accomplish the same result solely by modifying the repr()
string, which should resolve your concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about applying modifications to the repr string blindly --- the problem with this is that it's quite obscure behavior, and if something goes wrong it will be difficult to understand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, at the extreme end of options is to just use IPython.lib.pretty.pprint()
, which handles this on a type-by-type basis (and produces identical output for functions as here). However, that still leaks memory addresses in a bunch of cases and adds a dependency to a sensitive file.
Would the following be good enough in your view?
- If the regex triggers, extract the suspected memory address.
- Then compute
object.__repr__(obj)
and extract a known memory address. - Iff they match, delete the memory address substring.
9f67bd3
to
2b88339
Compare
(removed as no longer relevant) |
Forgot to push? I don't see the valueerror raised? |
@pv Sorry, forgot to update that comment - I ended up switching approaches due to finding existing cases where it didn't make sense to force a
And here's the new output:
|
Can you update the documentation, adding explanation what it now does?
…On December 2, 2018 1:42:34 AM UTC, Christopher Whelan ***@***.***> wrote:
@pv Sorry, forgot to update that comment - I ended up switching
approaches due to finding existing cases where it didn't make sense to
force a `repr()` change. Here's current pandas output:>
```>
$ asv dev -b ctors.SeriesConstructors>
· Discovering benchmarks>
· Running 1 total benchmarks (1 commits * 1 environments * 1
benchmarks)>
[ 0.00%] ·· Benchmarking existing-py_home_chris_anaconda3_bin_python>
[ 50.00%] ··· ctors.SeriesConstructors.time_series_constructor
ok>
[ 50.00%] ···
========================================================== ==========
==========>
--
with_index>
----------------------------------------------------------
--------------------->
data_fmt False True>
========================================================== ==========
==========>
<function SeriesConstructors.<lambda> at 0x7fa7f0beba60> 1.49±0ms
1.81±0ms>
list 3.12±0ms 3.33±0ms>
<function SeriesConstructors.<lambda> at 0x7fa7f0bebae8> 2.72±0ms
2.99±0ms>
<function SeriesConstructors.<lambda> at 0x7fa7f0bebb70> 6.91±0ms
7.37±0ms>
<function SeriesConstructors.<lambda> at 0x7fa7f0bebbf8> 2.95±0ms
3.03±0ms>
<function SeriesConstructors.<lambda> at 0x7fa7f0bebc80> 2.86±0ms
3.11±0ms>
<function SeriesConstructors.<lambda> at 0x7fa7f0bebd08> 2.74±0ms
3.08±0ms>
<function SeriesConstructors.<lambda> at 0x7fa7f0bebd90> 2.86±0ms
3.07±0ms>
========================================================== ==========
==========>
```>
And here's the new output:>
```>
$ asv dev -b ctors.SeriesConstructors>
· Discovering benchmarks>
· Running 1 total benchmarks (1 commits * 1 environments * 1
benchmarks)>
[ 0.00%] ·· Benchmarking existing-py_home_chris_anaconda3_bin_python>
[ 50.00%] ··· ctors.SeriesConstructors.time_series_constructor
ok>
[ 50.00%] ··· ============================================ ==========
==========>
-- with_index>
-------------------------------------------- --------------------->
data_fmt False True>
============================================ ========== ==========>
<function SeriesConstructors.<lambda>> (0) 1.41±0ms 1.81±0ms>
list 3.16±0ms 3.40±0ms>
<function SeriesConstructors.<lambda>> (1) 2.77±0ms 2.96±0ms>
<function SeriesConstructors.<lambda>> (2) 6.90±0ms 7.48±0ms>
<function SeriesConstructors.<lambda>> (3) 2.88±0ms 2.94±0ms>
<function SeriesConstructors.<lambda>> (4) 2.94±0ms 3.13±0ms>
<function SeriesConstructors.<lambda>> (5) 2.88±0ms 3.04±0ms>
<function SeriesConstructors.<lambda>> (6) 4.02±0ms 2.98±0ms>
============================================ ========== ==========>
```>
>
-- >
You are receiving this because you were mentioned.>
Reply to this email directly or view it on GitHub:>
#771 (comment)
|
@pv Added, still getting what appears to be an unrelated conda error. Apologies for the delay! |
@pv Rebased and all green aside from a travis build that timed out (no reported failures) |
But now it's a different approach as in the previous push? Was this intended? |
@pv It's the same approach as the previous push, but distinct from the initial one - I ran into cases where distinct parameter cases where |
My local branch matches 75e836c, so not sure where the difference came from. Just force-pushed again and confirming this is the intended diff. |
Thanks! And sorry for the long delay... |
Thanks for merging! |
The
pandas
benchmark suite sometimes uses functions asparams
, which largely works fine. The one issue is that therepr()
of a function includes its memory address and can change over subsequentasv
invocations (for normal functions) or even within a singleasv
run (forlambda
functions). This change inrepr()
impacts the ability to track and compare a benchmark over time, leading to brokenasv compare
functionality like below:This PR modifies
repr()
iff it appears to include a memory address, allowing proper comparison over time. The new output looks like this after a re-run:One unaddressed issue is that any suite with multiple
lambda
functions will now have name collisions inparams
. I intend to address this inpandas
by creating normal named functions in the relevant suites. Alternative solutions would be to append distinct numbers or hashes to therepr()
s, or to warn or raise if a user passes multiplelambda
functions.