Skip to content

ENH: remove memory addresses from param repr() #771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 31, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Nov 6, 2018

The pandas benchmark suite sometimes uses functions as params, which largely works fine. The one issue is that the repr() of a function includes its memory address and can change over subsequent asv invocations (for normal functions) or even within a single asv run (for lambda functions). This change in repr() impacts the ability to track and compare a benchmark over time, leading to broken asv compare functionality like below:

Benchmarks that have stayed the same:

       before           after         ratio
     [24ab22f7]       [d7cef344]
              n/a          997±3μs      n/a  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f0deb0800d0>, False)
              n/a         1.05±0ms      n/a  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7f0deb0800d0>, True)
      1.04±0.01ms              n/a      n/a  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7fd9d0bbc598>, False)
      1.06±0.02ms              n/a      n/a  ctors.SeriesConstructors.time_series_constructor(<function SeriesConstructors.<lambda> at 0x7fd9d0bbc598>, True)

This PR modifies repr() iff it appears to include a memory address, allowing proper comparison over time. The new output looks like this after a re-run:

       before           after         ratio
     [24ab22f7]       [d7cef344]
          995±7μs      1.03±0.02ms     1.04  ctors.SeriesConstructors.time_series_constructor(<function <lambda>>, False)
      1.05±0.01ms      1.09±0.05ms     1.03  ctors.SeriesConstructors.time_series_constructor(<function <lambda>>, True)

One unaddressed issue is that any suite with multiple lambda functions will now have name collisions in params. I intend to address this in pandas by creating normal named functions in the relevant suites. Alternative solutions would be to append distinct numbers or hashes to the repr()s, or to warn or raise if a user passes multiple lambda functions.

Copy link
Collaborator

@pv pv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can maybe also add a check that there are no duplicate reprs present (maybe simplest to raise informative exception in benchmarks.py when constructing the benchmarks).

Initially there was an idea that the repr() would not be used, only the ordinal, and that's followed in most places but not everywhere. This could be used to solve the issue of identical reprs, probably needs some changes in results.py:_compatible_results and maybe in compare.py.

asv/benchmark.py Outdated
result = repr(obj)
address_regex = re.compile('^<.* at 0x[\da-fA-F]*>$')
if address_regex.match(result):
result = '<{} {}>'.format(str(obj.__class__.__name__), str(obj.__name__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs at least to guard against missing attributes, as obj may be any object whatsoever, and the repr may contain anything. I'd actually prefer to guard this with if isinstance(obj, ...):, so that the way the transformation is applied is actually controlled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I'll see if I can accomplish the same result solely by modifying the repr() string, which should resolve your concern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about applying modifications to the repr string blindly --- the problem with this is that it's quite obscure behavior, and if something goes wrong it will be difficult to understand...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, at the extreme end of options is to just use IPython.lib.pretty.pprint(), which handles this on a type-by-type basis (and produces identical output for functions as here). However, that still leaks memory addresses in a bunch of cases and adds a dependency to a sensitive file.

Would the following be good enough in your view?

  • If the regex triggers, extract the suspected memory address.
  • Then compute object.__repr__(obj) and extract a known memory address.
  • Iff they match, delete the memory address substring.

@qwhelan qwhelan force-pushed the repr branch 2 times, most recently from 9f67bd3 to 2b88339 Compare November 28, 2018 23:14
@qwhelan
Copy link
Contributor Author

qwhelan commented Nov 28, 2018

(removed as no longer relevant)

@pv
Copy link
Collaborator

pv commented Dec 1, 2018

Forgot to push? I don't see the valueerror raised?

@qwhelan
Copy link
Contributor Author

qwhelan commented Dec 2, 2018

@pv Sorry, forgot to update that comment - I ended up switching approaches due to finding existing cases where it didn't make sense to force a repr() change. Here's current pandas output:

$ asv dev -b ctors.SeriesConstructors
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_chris_anaconda3_bin_python
[ 50.00%] ··· ctors.SeriesConstructors.time_series_constructor                                                                                                          ok
[ 50.00%] ··· ========================================================== ========== ==========
              --                                                               with_index
              ---------------------------------------------------------- ---------------------
                                       data_fmt                            False       True
              ========================================================== ========== ==========
               <function SeriesConstructors.<lambda> at 0x7fa7f0beba60>   1.49±0ms   1.81±0ms
                                         list                             3.12±0ms   3.33±0ms
               <function SeriesConstructors.<lambda> at 0x7fa7f0bebae8>   2.72±0ms   2.99±0ms
               <function SeriesConstructors.<lambda> at 0x7fa7f0bebb70>   6.91±0ms   7.37±0ms
               <function SeriesConstructors.<lambda> at 0x7fa7f0bebbf8>   2.95±0ms   3.03±0ms
               <function SeriesConstructors.<lambda> at 0x7fa7f0bebc80>   2.86±0ms   3.11±0ms
               <function SeriesConstructors.<lambda> at 0x7fa7f0bebd08>   2.74±0ms   3.08±0ms
               <function SeriesConstructors.<lambda> at 0x7fa7f0bebd90>   2.86±0ms   3.07±0ms
              ========================================================== ========== ==========

And here's the new output:

$ asv dev -b ctors.SeriesConstructors
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_chris_anaconda3_bin_python
[ 50.00%] ··· ctors.SeriesConstructors.time_series_constructor                                                                                                          ok
[ 50.00%] ··· ============================================ ========== ==========
              --                                                 with_index
              -------------------------------------------- ---------------------
                                data_fmt                     False       True
              ============================================ ========== ==========
               <function SeriesConstructors.<lambda>> (0)   1.41±0ms   1.81±0ms
                                  list                      3.16±0ms   3.40±0ms
               <function SeriesConstructors.<lambda>> (1)   2.77±0ms   2.96±0ms
               <function SeriesConstructors.<lambda>> (2)   6.90±0ms   7.48±0ms
               <function SeriesConstructors.<lambda>> (3)   2.88±0ms   2.94±0ms
               <function SeriesConstructors.<lambda>> (4)   2.94±0ms   3.13±0ms
               <function SeriesConstructors.<lambda>> (5)   2.88±0ms   3.04±0ms
               <function SeriesConstructors.<lambda>> (6)   4.02±0ms   2.98±0ms
              ============================================ ========== ==========

@pv
Copy link
Collaborator

pv commented Dec 3, 2018 via email

@qwhelan
Copy link
Contributor Author

qwhelan commented Dec 17, 2018

@pv Added, still getting what appears to be an unrelated conda error.

Apologies for the delay!

@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 4, 2019

@pv Rebased and all green aside from a travis build that timed out (no reported failures)

@pv
Copy link
Collaborator

pv commented Jan 6, 2019

But now it's a different approach as in the previous push? Was this intended?

@pv pv added the needs-work The PR cannot be merged as is, further work required (explained in comments) label Jan 6, 2019
@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 6, 2019

@pv It's the same approach as the previous push, but distinct from the initial one - I ran into cases where distinct parameter cases where repr()s collided and could not be easily changed, so I went with the enumerating approach.

@pv
Copy link
Collaborator

pv commented Jan 6, 2019

75e836c and 33072d1 are not the same, please look at the PR diff.
Let me know when this is converged (please verify that the diff on this page is what you intend to submit...)

@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 6, 2019

My local branch matches 75e836c, so not sure where the difference came from. Just force-pushed again and confirming this is the intended diff.

@pv pv removed the needs-work The PR cannot be merged as is, further work required (explained in comments) label May 30, 2019
@pv pv merged commit f282ba3 into airspeed-velocity:master May 31, 2019
@pv
Copy link
Collaborator

pv commented May 31, 2019

Thanks! And sorry for the long delay...

@qwhelan qwhelan deleted the repr branch June 5, 2019 17:55
@qwhelan
Copy link
Contributor Author

qwhelan commented Jun 5, 2019

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants