-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Description
Feature or enhancement
The dataclasses
library provides an easy way to create classes. The library will automatically generate relevant methods for the users.
Creating dataclass
es with argument frozen=True
will automatically generate methods __setattr__
and __delattr__
in _frozen_get_del_attr
.
This issue proposes to change the tuple
-based lookup to set
-based lookup. Reduce the time complexity from
In [1]: # tuple-based
In [2]: %timeit 'a' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
9.91 ns ± 0.0982 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [3]: %timeit 'd' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
33.2 ns ± 0.701 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [4]: %timeit 'g' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
56.4 ns ± 0.818 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: # set-based
In [6]: %timeit 'a' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.3 ns ± 0.0723 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [7]: %timeit 'd' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11 ns ± 0.106 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [8]: %timeit 'g' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.1 ns ± 0.126 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
A tiny benchmark script:
from contextlib import suppress
from dataclasses import FrozenInstanceError, dataclass
@dataclass(frozen=True)
class Foo2:
a: int
b: int
foo2 = Foo2(1, 2)
def bench2(inst):
with suppress(FrozenInstanceError):
inst.a = 0
with suppress(FrozenInstanceError):
inst.b = 0
@dataclass(frozen=True)
class Foo7:
a: int
b: int
c: int
d: int
e: int
f: int
g: int
foo7 = Foo7(1, 2, 3, 4, 5, 6, 7)
def bench7(inst):
with suppress(FrozenInstanceError):
inst.a = 0
with suppress(FrozenInstanceError):
inst.b = 0
with suppress(FrozenInstanceError):
inst.c = 0
with suppress(FrozenInstanceError):
inst.d = 0
with suppress(FrozenInstanceError):
inst.e = 0
with suppress(FrozenInstanceError):
inst.f = 0
with suppress(FrozenInstanceError):
inst.g = 0
class Bar(Foo7):
def __init__(self, a, b, c, d, e, f, g):
super().__init__(a, b, c, d, e, f, g)
self.baz = 0
def bench(inst):
inst.baz = 1
Result:
set
-based lookup:
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit bench(bar)
249 ns ± 6.31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
tuple
-based lookup (original):
In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit bench(bar)
269 ns ± 4.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Result:
`set`-based lookup:
```python
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
tuple
-based lookup (original):
In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The set
-based is constantly faster than the old approach. And the theoretical time complexity is also smaller (
Ref: #102573
Pitch
(Explain why this feature or enhancement should be implemented and how it would be used.
Add examples, if applicable.)
In the autogenerate __setattr__
and __delattr__
, they have a sanity check at the beginning of the method. For example:
def __setattr__(self, name, value):
if type(self) is {{UserType}} or name in ({{a tuple of field names}}):
raise FrozenInstanceError(f"cannot assign to field {name!r}")
super(cls, self).__setattr__(name, value)
If someone inherits the frozen dataclass, the sanity check will take tuple__contains__(...)
and finally calls super().__setattr__(...)
. For example:
@dataclass(frozen=True)
class FrozenBase:
x: int
y: int
... # N_FIELDS
class Foo(FrozenBase):
def __init__(self, x, y, somevalue, someothervalue):
super().__init__(x, y)
self.somevalue = somevalue # takes O(N_FIELDS)
self.someothervalue = someothervalue # takes O(N_FIELDS) time again
foo = Foo(1, 2, 3, 4)
foo.extravalue = extravalue # takes O(N_FIELDS) time again
Previous discussion
N/A.