Skip to content

Commit 5148e90

Browse files
committed
Merge pull request #4657 from jreback/bool
API: GH4633, bool(obj) behavior, raise on __nonzero__ always
2 parents 03aa067 + e06d7a8 commit 5148e90

File tree

13 files changed

+200
-31
lines changed

13 files changed

+200
-31
lines changed

doc/source/10min.rst

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,6 @@ A ``where`` operation for getting.
269269
270270
df[df > 0]
271271
272-
273272
Setting
274273
~~~~~~~
275274

@@ -708,3 +707,20 @@ Reading from an excel file
708707
:suppress:
709708
710709
os.remove('foo.xlsx')
710+
711+
Gotchas
712+
-------
713+
714+
If you are trying an operation and you see an exception like:
715+
716+
.. code-block:: python
717+
718+
>>> if pd.Series([False, True, False]):
719+
print("I was true")
720+
Traceback
721+
...
722+
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
723+
724+
See :ref:`Comparisons<basics.compare>` for an explanation and what to do.
725+
726+
See :ref:`Gotachas<gotchas>` as well.

doc/source/basics.rst

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from pandas import *
99
randn = np.random.randn
1010
np.set_printoptions(precision=4, suppress=True)
11-
from pandas.compat import lrange
11+
from pandas.compat import lrange
1212
1313
==============================
1414
Essential Basic Functionality
@@ -198,16 +198,62 @@ replace NaN with some other value using ``fillna`` if you wish).
198198
199199
Flexible Comparisons
200200
~~~~~~~~~~~~~~~~~~~~
201+
202+
.. _basics.compare:
203+
201204
Starting in v0.8, pandas introduced binary comparison methods eq, ne, lt, gt,
202205
le, and ge to Series and DataFrame whose behavior is analogous to the binary
203206
arithmetic operations described above:
204207

205208
.. ipython:: python
206209
207210
df.gt(df2)
208-
209211
df2.ne(df)
210212
213+
These operations produce a pandas object the same type as the left-hand-side input
214+
that if of dtype ``bool``. These ``boolean`` objects can be used in indexing operations,
215+
see :ref:`here<indexing.boolean>`
216+
217+
Furthermore, you can apply the reduction functions: ``any()`` and ``all()`` to provide a
218+
way to summarize these results.
219+
220+
.. ipython:: python
221+
222+
(df>0).all()
223+
(df>0).any()
224+
225+
Finally you can test if a pandas object is empty, via the ``empty`` property.
226+
227+
.. ipython:: python
228+
229+
df.empty
230+
DataFrame(columns=list('ABC')).empty
231+
232+
.. warning::
233+
234+
You might be tempted to do the following:
235+
236+
.. code-block:: python
237+
238+
>>>if df:
239+
...
240+
241+
Or
242+
243+
.. code-block:: python
244+
245+
>>> df and df2
246+
247+
These both will raise as you are trying to compare multiple values.
248+
249+
.. code-block:: python
250+
251+
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
252+
253+
254+
See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
255+
256+
211257
Combining overlapping data sets
212258
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213259

doc/source/gotchas.rst

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,58 @@
1515
Caveats and Gotchas
1616
*******************
1717

18+
Using If/Truth Statements with Pandas
19+
-------------------------------------
20+
21+
.. _gotchas.truth:
22+
23+
Pandas follows the numpy convention of raising an error when you try to convert something to a ``bool``.
24+
This happens in a ``if`` or when using the boolean operations, ``and``, ``or``, or ``not``. It is not clear
25+
what the result of
26+
27+
.. code-block:: python
28+
29+
>>> if Series([False, True, False]):
30+
...
31+
32+
should be. Should it be ``True`` because it's not zero-length? ``False`` because there are ``False`` values?
33+
It is unclear, so instead, pandas raises a ``ValueError``:
34+
35+
.. code-block:: python
36+
37+
>>> if pd.Series([False, True, False]):
38+
print("I was true")
39+
Traceback
40+
...
41+
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
42+
43+
44+
If you see that, you need to explicitly choose what you want to do with it (e.g., use `any()`, `all()` or `empty`).
45+
or, you might want to compare if the pandas object is ``None``
46+
47+
.. code-block:: python
48+
49+
>>> if pd.Series([False, True, False]) is not None:
50+
print("I was not None")
51+
>>> I was not None
52+
53+
Bitwise boolean
54+
~~~~~~~~~~~~~~~
55+
56+
Bitwise boolean operators like ``==`` and ``!=`` will return a boolean ``Series``,
57+
which is almost always what you want anyways.
58+
59+
.. code-block:: python
60+
61+
>>> s = pd.Series(range(5))
62+
>>> s == 4
63+
0 False
64+
1 False
65+
2 False
66+
3 False
67+
4 True
68+
dtype: bool
69+
1870
``NaN``, Integer ``NA`` values and ``NA`` type promotions
1971
---------------------------------------------------------
2072

@@ -428,7 +480,7 @@ parse HTML tables in the top-level pandas io function ``read_html``.
428480
lxml will work correctly:
429481

430482
.. code-block:: sh
431-
483+
432484
# remove the included version
433485
conda remove lxml
434486

doc/source/release.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,9 @@ pandas 0.13
134134
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)
135135

136136
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
137+
- Factored out excel_value_to_python_value from ExcelFile::_parse_excel (:issue:`4589`)
138+
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
139+
behavior.
137140

138141
**Internal Refactoring**
139142

doc/source/v0.13.0.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,18 @@ API changes
121121
index.set_names(["bob", "cranberry"], inplace=True)
122122

123123
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
124+
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
125+
behavior.
126+
127+
This prevent behaviors like (which will now all raise ``ValueError``)
128+
129+
..code-block ::
130+
131+
if df:
132+
....
133+
134+
df1 and df2
135+
s1 and s2
124136

125137
Enhancements
126138
~~~~~~~~~~~~

pandas/core/generic.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,8 @@ def empty(self):
531531
return not all(len(self._get_axis(a)) > 0 for a in self._AXIS_ORDERS)
532532

533533
def __nonzero__(self):
534-
return not self.empty
534+
raise ValueError("The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().")
535+
535536
__bool__ = __nonzero__
536537

537538
#----------------------------------------------------------------------

pandas/core/groupby.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2101,9 +2101,22 @@ def filter(self, func, dropna=True, *args, **kwargs):
21012101
else:
21022102
res = path(group)
21032103

2104-
if res:
2104+
def add_indexer():
21052105
indexers.append(self.obj.index.get_indexer(group.index))
21062106

2107+
# interpret the result of the filter
2108+
if isinstance(res,(bool,np.bool_)):
2109+
if res:
2110+
add_indexer()
2111+
else:
2112+
if getattr(res,'ndim',None) == 1:
2113+
if res.ravel()[0]:
2114+
add_indexer()
2115+
else:
2116+
2117+
# in theory you could do .all() on the boolean result ?
2118+
raise TypeError("the filter must return a boolean result")
2119+
21072120
if len(indexers) == 0:
21082121
filtered = self.obj.take([]) # because np.concatenate would fail
21092122
else:

pandas/core/series.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -798,13 +798,6 @@ def __contains__(self, key):
798798
__long__ = _coerce_method(int)
799799
__int__ = _coerce_method(int)
800800

801-
def __nonzero__(self):
802-
# special case of a single element bool series degenerating to a scalar
803-
if self.dtype == np.bool_ and len(self) == 1:
804-
return bool(self.iloc[0])
805-
return not self.empty
806-
__bool__ = __nonzero__
807-
808801
# we are preserving name here
809802
def __getstate__(self):
810803
return dict(_data=self._data, name=self.name)

pandas/io/tests/test_pytables.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1593,19 +1593,19 @@ def test_table_values_dtypes_roundtrip(self):
15931593
with ensure_clean(self.path) as store:
15941594
df1 = DataFrame({'a': [1, 2, 3]}, dtype='f8')
15951595
store.append('df_f8', df1)
1596-
assert df1.dtypes == store['df_f8'].dtypes
1596+
assert_series_equal(df1.dtypes,store['df_f8'].dtypes)
15971597

15981598
df2 = DataFrame({'a': [1, 2, 3]}, dtype='i8')
15991599
store.append('df_i8', df2)
1600-
assert df2.dtypes == store['df_i8'].dtypes
1600+
assert_series_equal(df2.dtypes,store['df_i8'].dtypes)
16011601

16021602
# incompatible dtype
16031603
self.assertRaises(ValueError, store.append, 'df_i8', df1)
16041604

16051605
# check creation/storage/retrieval of float32 (a bit hacky to actually create them thought)
16061606
df1 = DataFrame(np.array([[1],[2],[3]],dtype='f4'),columns = ['A'])
16071607
store.append('df_f4', df1)
1608-
assert df1.dtypes == store['df_f4'].dtypes
1608+
assert_series_equal(df1.dtypes,store['df_f4'].dtypes)
16091609
assert df1.dtypes[0] == 'float32'
16101610

16111611
# check with mixed dtypes

pandas/tests/test_frame.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10607,13 +10607,10 @@ def test_index_namedtuple(self):
1060710607
df = DataFrame([(1, 2), (3, 4)], index=index, columns=["A", "B"])
1060810608
self.assertEqual(df.ix[IndexType("foo", "bar")]["A"], 1)
1060910609

10610-
def test_bool_empty_nonzero(self):
10610+
def test_empty_nonzero(self):
1061110611
df = DataFrame([1, 2, 3])
10612-
self.assertTrue(bool(df))
1061310612
self.assertFalse(df.empty)
1061410613
df = DataFrame(index=['a', 'b'], columns=['c', 'd']).dropna()
10615-
self.assertFalse(bool(df))
10616-
self.assertFalse(bool(df.T))
1061710614
self.assertTrue(df.empty)
1061810615
self.assertTrue(df.T.empty)
1061910616

0 commit comments

Comments
 (0)