Skip to content

gh-124529: Fix _strptime to make %c/%x accept a year with fewer digits #124778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Doc/library/datetime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2634,6 +2634,17 @@ Notes:
example, "month/day/year" versus "day/month/year"), and the output may
contain non-ASCII characters.

.. versionchanged:: 3.14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as i know we use next construction

Suggested change
.. versionchanged:: 3.14
.. versionchanged:: next

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I think you could mark many comments and suggestions as resolved, otherwise they can get in the way of reading and seem ignored :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I think you could mark many comments and suggestions as resolved, otherwise they can get in the way of reading and seem ignored :)

Shouldn't it be done by the reviewer who asked a question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can mark as solved.
at least I can't see this button as the author of suggestion,
so I think only you can close them, especially if there are no plans to continue this topic

The :meth:`~.datetime.strptime` method, if used with the ``%c``
or ``%x`` format code, no longer requires the *year* part of the
input to be zero-padded to the usual width (which is either 4 or
2 digits, depending on the format code and current locale). In
previous versions, a :exc:`ValueError` was raised if a shorter
*year*, not zero-padded to the 2- or 4-digit width as appropriate,
was part of the input (and it is worth noting that, depending
on the platform/locale, such inputs may be produced by using
:meth:`~.datetime.strftime` with ``%c`` or ``%x``).

(2)
The :meth:`~.datetime.strptime` method can parse years in the full [1, 9999] range, but
years < 1000 must be zero-filled to 4-digit width.
Expand Down
30 changes: 26 additions & 4 deletions Lib/_strptime.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,8 +199,6 @@ def __init__(self, locale_time=None):
'V': r"(?P<V>5[0-3]|0[1-9]|[1-4]\d|\d)",
# W is set below by using 'U'
'y': r"(?P<y>\d\d)",
#XXX: Does 'Y' need to worry about having less or more than
# 4 digits?
'Y': r"(?P<Y>\d\d\d\d)",
'z': r"(?P<z>[+-]\d\d:?[0-5]\d(:?[0-5]\d(\.\d{1,6})?)?|(?-i:Z))",
'A': self.__seqToRE(self.locale_time.f_weekday, 'A'),
Expand All @@ -213,8 +211,10 @@ def __init__(self, locale_time=None):
'Z'),
'%': '%'})
base.__setitem__('W', base.__getitem__('U').replace('U', 'W'))
base.__setitem__('c', self.pattern(self.locale_time.LC_date_time))
base.__setitem__('x', self.pattern(self.locale_time.LC_date))
base.__setitem__(
'c', self.__pattern_with_lax_year(self.locale_time.LC_date_time))
base.__setitem__(
'x', self.__pattern_with_lax_year(self.locale_time.LC_date))
base.__setitem__('X', self.pattern(self.locale_time.LC_time))

def __seqToRE(self, to_convert, directive):
Expand All @@ -236,6 +236,26 @@ def __seqToRE(self, to_convert, directive):
regex = '(?P<%s>%s' % (directive, regex)
return '%s)' % regex

def __pattern_with_lax_year(self, format):
"""Like pattern(), but making %Y and %y accept also fewer digits.

Necessary to ensure that strptime() is able to parse strftime()'s
output when %c or %x is used -- considering that for some locales
and platforms (e.g., 'C.UTF-8' on Linux), formatting with either
%c or %x may produce a year number representation that is shorter
than the usual four or two digits, if the number is small enough
(e.g., '999' rather than `0999', or '9' rather than of '09').

Note that this helper is intended to be used to prepare regex
patterns only for the `%c` and `%x` format codes (and *not* for
`%y`, `%Y` or `%G`).

"""
pattern = self.pattern(format)
pattern = pattern.replace(self['Y'], r"(?P<Y>\d{1,4})")
pattern = pattern.replace(self['y'], r"(?P<y>\d{1,2})")
return pattern

def pattern(self, format):
"""Return regex pattern for the format string.

Expand Down Expand Up @@ -374,6 +394,7 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
# U, W
# worthless without day of the week
if group_key == 'y':
# 1 or 2 digits (1 only for directive c or x; see TimeRE.__init__)
year = int(found_dict['y'])
# Open Group specification for strptime() states that a %y
#value in the range of [00, 68] is in the century 2000, while
Expand All @@ -383,6 +404,7 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
else:
year += 1900
elif group_key == 'Y':
# 1-4 digits (1-3 only for directive c or x; see TimeRE.__init__)
year = int(found_dict['Y'])
elif group_key == 'G':
iso_year = int(found_dict['G'])
Expand Down
77 changes: 77 additions & 0 deletions Lib/test/datetimetester.py
Original file line number Diff line number Diff line change
Expand Up @@ -2124,6 +2124,83 @@ def test_fromisocalendar_type_errors(self):
with self.assertRaises(TypeError):
self.theclass.fromisocalendar(*isocal)

def test_strftime_strptime_roundtrip_concerning_locale_specific_year(self):
concerned_formats = '%c', '%x'

def run_subtest():
reason = (f"test strftime/strptime roundtrip concerning "
f"locale-specific year representation "
f"- for {fmt=} and {year=}")
fail_msg = f"{reason} - failed"
initial = expected = self.theclass.strptime(f'{year:04}', '%Y')
with self.subTest(reason=reason):
formatted = initial.strftime(fmt)
try:
parsed = self.theclass.strptime(formatted, fmt)
except ValueError as exc:
# gh-124529
self.fail(f"{fail_msg}; parsing error: {exc!r}")
self.assertEqual(parsed, expected, fail_msg)

sample = self.theclass.strptime(f'1999-03-17', '%Y-%m-%d')
for fmt in concerned_formats:
with self.subTest(fmt=fmt):
sample_str = sample.strftime(fmt)
if '1999' in sample_str:
for year in [
1, 9, 10, 99, 100, 999, # <- gh-124529
1000, 1410, 1989, 2024, 2095, 9999,
]:
run_subtest()
elif '99' in sample_str:
for year in [
1969, 1999,
2000, 2001, 2009, # <- gh-124529
2068,
]:
run_subtest()
else:
self.fail(f"it seems that {sample.strftime(fmt)=} "
f"does not include year={sample.year!r} in "
f"any expected format (is there something "
f"severely wrong with current locale?)")

def test_strptime_accepting_locale_specific_year_with_fewer_digits(self):
concerned_formats = '%c', '%x'

def run_subtest():
input_str = sample_str.replace(sample_digits, year_digits)
reason = (f"test strptime accepting locale-specific "
f"year representation with fewer digits "
f"- for {fmt=} and {input_str=} ({year=})")
fail_msg = f"{reason} - failed"
expected = sample.replace(year=year)
with self.subTest(reason=reason):
try:
parsed = self.theclass.strptime(input_str, fmt)
except ValueError as exc:
# gh-124529
self.fail(f"{fail_msg}; parsing error: {exc!r}")
self.assertEqual(parsed, expected, fail_msg)

sample = self.theclass.strptime(f'1999-03-17', '%Y-%m-%d')
for fmt in concerned_formats:
with self.subTest(fmt=fmt):
sample_str = sample.strftime(fmt)
if (sample_digits := '1999') in sample_str:
for year in [1, 9, 10, 99, 100, 999]:
year_digits = str(year)
run_subtest()
elif (sample_digits := '99') in sample_str:
for year in [2000, 2001, 2009]:
year_digits = str(year - 2000)
run_subtest()
else:
self.fail(f"it seems that {sample.strftime(fmt)=} "
f"does not include year={sample.year!r} in "
f"any expected format (is there something "
f"severely wrong with current locale?)")


#############################################################################
# datetime tests
Expand Down
135 changes: 131 additions & 4 deletions Lib/test/test_strptime.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,11 +161,46 @@ def test_compile(self):
for directive in ('a','A','b','B','c','d','G','H','I','j','m','M','p',
'S','u','U','V','w','W','x','X','y','Y','Z','%'):
fmt = "%d %Y" if directive == 'd' else "%" + directive
input_string = time.strftime(fmt)
compiled = self.time_re.compile(fmt)
found = compiled.match(time.strftime(fmt))
self.assertTrue(found, "Matching failed on '%s' using '%s' regex" %
(time.strftime(fmt),
compiled.pattern))
found = compiled.match(input_string)
self.assertTrue(found,
(f"Matching failed on '{input_string}' "
f"using '{compiled.pattern}' regex"))
for directive in ('c', 'x'):
fmt = "%" + directive
with self.subTest(f"{fmt!r} should match input containing "
f"year with fewer digits than usual"):
# gh-124529
params = _input_str_and_expected_year_for_few_digits_year(fmt)
if params is None:
self.fail(f"it seems that using {fmt=} results in value "
f"which does not include year representation "
f"in any expected format (is there something "
f"severely wrong with current locale?)")
input_string, _ = params
compiled = self.time_re.compile(fmt)
found = compiled.match(input_string)
self.assertTrue(found,
(f"Matching failed on '{input_string}' "
f"using '{compiled.pattern}' regex"))
for directive in ('y', 'Y', 'G'):
fmt = "%" + directive
with self.subTest(f"{fmt!r} should not match input containing "
f"year with fewer digits than usual"):
params = _input_str_and_expected_year_for_few_digits_year(fmt)
if params is None:
self.fail(f"it seems that using {fmt=} results in value "
f"which does not include year representation "
f"in any expected format (is there something "
f"severely wrong with current locale?)")
input_string, _ = params
compiled = self.time_re.compile(fmt)
found = compiled.match(input_string)
self.assertFalse(found,
(f"Matching unexpectedly succeeded "
f"on '{input_string}' using "
f"'{compiled.pattern}' regex"))

def test_blankpattern(self):
# Make sure when tuple or something has no values no regex is generated.
Expand Down Expand Up @@ -299,6 +334,27 @@ def helper(self, directive, position):
(directive, strf_output, strp_output[position],
self.time_tuple[position]))

def helper_for_directives_accepting_few_digits_year(self, directive):
fmt = "%" + directive
params = _input_str_and_expected_year_for_few_digits_year(fmt)
if params is None:
self.fail(f"it seems that using {fmt=} results in value "
f"which does not include year representation "
f"in any expected format (is there something "
f"severely wrong with current locale?)")
input_string, expected_year = params
try:
output_year = _strptime._strptime(input_string, fmt)[0][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is't possible for _strptime._strptime to return a one-dim list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always returns a 3-tuple, the 0th item of which is a 9-tuple, the 0th item of which is a int representing the year number.

except ValueError as exc:
# See: gh-124529
self.fail(f"testing of {directive!r} directive failed; "
f"{input_string!r} -> exception: {exc!r}")
else:
self.assertEqual(output_year, expected_year,
(f"testing of {directive!r} directive failed; "
f"{input_string!r} -> output including year "
f"{output_year!r} != {expected_year!r}"))

def test_year(self):
# Test that the year is handled properly
for directive in ('y', 'Y'):
Expand All @@ -312,6 +368,17 @@ def test_year(self):
"'y' test failed; passed in '%s' "
"and returned '%s'" % (bound, strp_output[0]))

def test_bad_year(self):
for directive, bad_inputs in (
('y', ('9', '100', 'ni')),
('Y', ('7', '42', '999', '10000', 'SPAM')),
):
fmt = "%" + directive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can use f-string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both ways are valid; the one I chose is more consistent with the existing code in this module.

for input_val in bad_inputs:
with self.subTest(directive=directive, input_val=input_val):
with self.assertRaises(ValueError):
_strptime._strptime_time(input_val, fmt)

def test_month(self):
# Test for month directives
for directive in ('B', 'b', 'm'):
Expand Down Expand Up @@ -454,11 +521,21 @@ def test_date_time(self):
for position in range(6):
self.helper('c', position)

def test_date_time_accepting_few_digits_year(self): # gh-124529
# Test %c directive with input containing year
# number consisting of fewer digits than usual
self.helper_for_directives_accepting_few_digits_year('c')

def test_date(self):
# Test %x directive
for position in range(0,3):
self.helper('x', position)

def test_date_accepting_few_digits_year(self): # gh-124529
# Test %x directive with input containing year
# number consisting of fewer digits than usual
self.helper_for_directives_accepting_few_digits_year('x')

def test_time(self):
# Test %X directive
for position in range(3,6):
Expand Down Expand Up @@ -769,5 +846,55 @@ def test_TimeRE_recreation_timezone(self):
_strptime._strptime_time(oldtzname[1], '%Z')


def _input_str_and_expected_year_for_few_digits_year(fmt):
# This helper, for the given format string (fmt), returns a 2-tuple:
# (<strptime input string>, <expected year>)
# where:
# * <strptime input string> -- is a `strftime(fmt)`-result-like str
# containing a year number which is *shorter* than the usual four
# or two digits (namely: here the contained year number consist of
# one digit: 7; that's an arbitrary choice);
# * <expected year> -- is an int representing the year number that
# is expected to be part of the result of a `strptime(<strptime
# input string>, fmt)` call (namely: either 7 or 2007, depending
# on the given format string and current locale...); however, it
# is None if <strptime input string> does *not* contain the year
# part (for the given format string and current locale).

# 1. Prepare auxiliary *magic* time data (note that the magic values
# we use here are guaranteed to be compatible with `time.strftime()`,
# and are also intended to be well distinguishable within a formatted
# string, thanks to the fact that the amount of overloaded numbers is
# minimized, as in `_strptime.LocaleTime.__calc_date_time()`):
magic_year = 1999
magic_tt = (magic_year, 3, 17, 22, 44, 55, 2, 76, 0)

# 2. Pick an arbitrary year number representation that
# is always *shorter* than the usual four or two digits:
input_year_str = '7'

# 3. Obtain the resultant 2-tuple:

input_string = time.strftime(fmt, magic_tt)
expected_year = None

magic_4digits = str(magic_year)
if found_4digits := (magic_4digits in input_string):
# `input_string` contains up-to-4-digit year representation
input_string = input_string.replace(magic_4digits, input_year_str)
expected_year = int(input_year_str)

magic_2digits = str(magic_year)[-2:]
if magic_2digits in input_string:
# `input_string` contains up-to-2-digit year representation
if found_4digits:
raise RuntimeError(f'case not supported by this helper: {fmt=} '
f'(includes both 2-digit and 4-digit year)')
input_string = input_string.replace(magic_2digits, input_year_str)
expected_year = 2000 + int(input_year_str)

return input_string, expected_year


if __name__ == '__main__':
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Fix :meth:`datetime.datetime.strptime`, :meth:`datetime.date.strptime`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should write this

Fixed ``%c``/``%x`` in :meth:`datetime.datetime.strptime`, :meth:`datetime.date.strptime` and :func:`time.strptime` to accept years with  fewer digits than the usual 4 or 2 (not zero-padded).

This will be more concise and intuitive.

Copy link
Contributor Author

@zuo zuo Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've shortened it; yet I'd prefer to keep the information that the fix prevents the error for certain cases of a strftime/strptime round trip (this is the key element of the gh-124529 issue's description).

and :func:`time.strptime` (by altering the underlying mechanism) to make
``%c``/``%x`` accept year numbers with fewer digits than the usual 4 or
2 (not zero-padded), so that certain ``strftime/strptime`` round trips
(e.g., ``date.strptime(d.strftime('%c'), '%c'))`` for ``d.year < 1000``,
with C-like locales on Linux) no longer raise :exc:`!ValueError`. Patch
by Jan Kaliszewski.
Loading