Skip to content

gh-121459: Deferred LOAD_GLOBAL #123128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Include/internal/pycore_ceval.h
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ PyAPI_FUNC(PyObject **) _PyObjectArray_FromStackRefArray(_PyStackRef *input, Py_
PyAPI_FUNC(void) _PyObjectArray_Free(PyObject **array, PyObject **scratch);

PyAPI_FUNC(PyObject *) _PyEval_GetANext(PyObject *aiter);
PyAPI_FUNC(PyObject *) _PyEval_LoadGlobal(PyObject *globals, PyObject *builtins, PyObject *name);
PyAPI_FUNC(void) _PyEval_LoadGlobalStackRef(PyObject *globals, PyObject *builtins, PyObject *name, _PyStackRef *writeto);
PyAPI_FUNC(PyObject *) _PyEval_GetAwaitable(PyObject *iterable, int oparg);
PyAPI_FUNC(PyObject *) _PyEval_LoadName(PyThreadState *tstate, _PyInterpreterFrame *frame, PyObject *name);

Expand Down
4 changes: 3 additions & 1 deletion Include/internal/pycore_dict.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ extern "C" {

#include "pycore_object.h" // PyManagedDictPointer
#include "pycore_pyatomic_ft_wrappers.h" // FT_ATOMIC_LOAD_SSIZE_ACQUIRE
#include "pycore_stackref.h" // _PyStackRef

// Unsafe flavor of PyDict_GetItemWithError(): no error checking
extern PyObject* _PyDict_GetItemWithError(PyObject *dp, PyObject *key);
Expand Down Expand Up @@ -100,10 +101,11 @@ extern void _PyDictKeys_DecRef(PyDictKeysObject *keys);
*/
extern Py_ssize_t _Py_dict_lookup(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr);
extern Py_ssize_t _Py_dict_lookup_threadsafe(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr);
extern Py_ssize_t _Py_dict_lookup_threadsafe_stackref(PyDictObject *mp, PyObject *key, Py_hash_t hash, _PyStackRef *value_addr);

extern Py_ssize_t _PyDict_LookupIndex(PyDictObject *, PyObject *);
extern Py_ssize_t _PyDictKeys_StringLookup(PyDictKeysObject* dictkeys, PyObject *key);
PyAPI_FUNC(PyObject *)_PyDict_LoadGlobal(PyDictObject *, PyDictObject *, PyObject *);
PyAPI_FUNC(void)_PyDict_LoadGlobalStackRef(PyDictObject *, PyDictObject *, PyObject *, _PyStackRef *);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PyAPI_FUNC(void)_PyDict_LoadGlobalStackRef(PyDictObject *, PyDictObject *, PyObject *, _PyStackRef *);
PyAPI_FUNC(_PyStackRef )_PyDict_LoadGlobalStackRef(PyDictObject *, PyDictObject *, PyObject *);

Likewise


/* Consumes references to key and value */
PyAPI_FUNC(int) _PyDict_SetItem_Take2(PyDictObject *op, PyObject *key, PyObject *value);
Expand Down
126 changes: 113 additions & 13 deletions Objects/dictobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -1496,6 +1496,97 @@
return ix;
}

Py_ssize_t
_Py_dict_lookup_threadsafe_stackref(PyDictObject *mp, PyObject *key, Py_hash_t hash, _PyStackRef *value_addr)
{
PyDictKeysObject *dk;
DictKeysKind kind;
Py_ssize_t ix;

ensure_shared_on_read(mp);

dk = _Py_atomic_load_ptr(&mp->ma_keys);
kind = dk->dk_kind;

if (kind != DICT_KEYS_GENERAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is complicated and duplicates logic elsewhere. I think it might be worth only supporting a a single case or two. For example, unicode-only and non-split dict. Fallback to _Py_dict_lookup_threadsafe + PyStackRef_FromPyObjectSteal for other cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is worth changing anything. LOAD_GLOBAL is specialized incredibly well, so this only applies to a rare slow path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specializing at all in the free-threaded build so LOAD_GLOBAL is a scaling bottleneck now that fewer types are immortalized.

I don't think we want to wait for specialization to be made thread-safe, but we can treat this as a temporary measure and structure the code to reflect that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this case to fall back to _Py_dict_lookup_threadsafe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks backwards to me. The common case should be DICT_KEYS_UNICODE for globals dictionaries, but this only handles DICT_KEYS_GENERAL efficiently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still too complicated. We only need to handle the common case of unicode keys. Globals are almost never split dictionaries.

I think this should look something like the following. You can refactor it further by extracting a _Py_TryXGetStackRef() that's like _Py_TryXGetRef but returns a _PyStackRef.

    PyDictKeysObject *dk = _Py_atomic_load_ptr(&mp->ma_keys);
    if (dk->dk_kind == DICT_KEYS_UNICODE && PyUnicode_CheckExact(key)) {
        Py_ssize_t ix = unicodekeys_lookup_unicode_threadsafe(dk, key, hash);
        if (ix == DKIX_EMPTY) {
            *value_addr = PyStackRef_NULL;
            return ix;
        }
        else if (ix >= 0) {
            PyObject **addr_of_value = &DK_UNICODE_ENTRIES(dk)[ix].me_value;
            PyObject *value = _Py_atomic_load_ptr(addr_of_value);
            if (value == NULL) {
                *value_addr = PyStackRef_NULL;
                return DKIX_EMPTY;
            }
            if (_Py_IsImmortal(value) || _PyObject_HasDeferredRefcount(value)) {
                *value_addr =  (_PyStackRef){ .bits = (uintptr_t)value | Py_TAG_DEFERRED };
                return ix;
            }
            if (_Py_TryIncrefCompare(addr_of_value, value)) {
                *value_addr = PyStackRef_FromPyObjectSteal(value);
                return ix;
            }
        }
    }

    PyObject *obj;
    Py_ssize_t ix = _Py_dict_lookup_threadsafe(mp, key, hash, &obj);
    if (ix >= 0 && obj != NULL) {
        *value_addr = PyStackRef_FromPyObjectSteal(obj);
    }
    else {
        *value_addr = PyStackRef_NULL;
    }
    return ix;

if (PyUnicode_CheckExact(key)) {
ix = unicodekeys_lookup_unicode_threadsafe(dk, key, hash);
}
else {
ix = unicodekeys_lookup_generic_threadsafe(mp, dk, key, hash);
}
if (ix == DKIX_KEY_CHANGED) {
goto read_failed;
}

if (ix >= 0) {
Copy link
Member

@picnixz picnixz Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put the case if (ix < 0) first to reflect the if (rc < 0) constructions that we usually do?

if (kind == DICT_KEYS_SPLIT) {
PyDictValues *values = _Py_atomic_load_ptr(&mp->ma_values);
if (values == NULL) {
goto read_failed;
}

uint8_t capacity = _Py_atomic_load_uint8_relaxed(&values->capacity);
if (ix >= (Py_ssize_t)capacity) {
goto read_failed;
}

*value_addr = PyStackRef_FromPyObjectNew(values->values[ix]);
if (PyStackRef_IsNull(*value_addr)) {
goto read_failed;
}
if (values != _Py_atomic_load_ptr(&mp->ma_values)) {
goto read_failed;
}
}
else {
*value_addr = PyStackRef_FromPyObjectNew(DK_UNICODE_ENTRIES(dk)[ix].me_value);
if (PyStackRef_IsNull(*value_addr)) {
goto read_failed;
}
if (dk != _Py_atomic_load_ptr(&mp->ma_keys)) {
goto read_failed;
}
}
}
else {
*value_addr = PyStackRef_NULL;
}
}
else {
ix = dictkeys_generic_lookup_threadsafe(mp, dk, key, hash);
if (ix == DKIX_KEY_CHANGED) {
goto read_failed;
}
if (ix >= 0) {
*value_addr = PyStackRef_FromPyObjectNew(DK_ENTRIES(dk)[ix].me_value);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand the TSAN failures on dictionaries here. @colesbury do you have a clue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread-safe it the value is not deferred or immortal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait but doesn't PyStackRef_FromPyObjectNew incref the value if it's not deferred or immortal, which make it thread safe? Or maybe I'm missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Py_INCREF() on a value that might be concurrently freed is not thread-safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay that makes sense now. So one possible solution would be to introduce another variant that tries to incref and is allowed to fail? Like in https://github.com/python/cpython/blob/main/Objects/dictobject.c#L1435

Or we can check if it's deferred/immortal, and only if it's not we do the TryXGetRef thing, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, do the check that it's deferred/immortal and if not do the TryXGetRef thing.

I think it's best to write it out explicitly first. If there are other uses of it, we can refactor it into a function later.

if (PyStackRef_IsNull(*value_addr)) {
goto read_failed;
}
if (dk != _Py_atomic_load_ptr(&mp->ma_keys)) {
goto read_failed;
}
}
else {
*value_addr = PyStackRef_NULL;
}
}

return ix;

PyObject *value;
read_failed:
// In addition to the normal races of the dict being modified the _Py_TryXGetRef
// can all fail if they don't yet have a shared ref count. That can happen here
// or in the *_lookup_* helper. In that case we need to take the lock to avoid
// mutation and do a normal incref which will make them shared.
Py_BEGIN_CRITICAL_SECTION(mp);
ix = _Py_dict_lookup(mp, key, hash, &value);
*value_addr = value == NULL ? PyStackRef_NULL : PyStackRef_FromPyObjectNew(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use the _Py_dict_lookup_threadsafe_stackref helper here since you acquired the lock? This would save the PyObject *value declaration (and you could consider inlining the helper actually)

Py_END_CRITICAL_SECTION();
return ix;
}

#else // Py_GIL_DISABLED

Py_ssize_t
Expand All @@ -1506,6 +1597,15 @@
return ix;
}

Py_ssize_t
_Py_dict_lookup_threadsafe_stackref(PyDictObject *mp, PyObject *key, Py_hash_t hash, _PyStackRef *value_addr)
{
PyObject *val;
Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &val);
*value_addr = value == NULL ? PyStackRef_NULL : PyStackRef_FromPyObjectNew(value);

Check failure on line 1605 in Objects/dictobject.c

View workflow job for this annotation

GitHub Actions / Address sanitizer

‘value’ undeclared (first use in this function); did you mean ‘val’?

Check failure on line 1605 in Objects/dictobject.c

View workflow job for this annotation

GitHub Actions / Hypothesis tests on Ubuntu

‘value’ undeclared (first use in this function); did you mean ‘val’?

Check failure on line 1605 in Objects/dictobject.c

View workflow job for this annotation

GitHub Actions / Windows (free-threading) / build (arm64)

'value': undeclared identifier [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]

Check warning on line 1605 in Objects/dictobject.c

View workflow job for this annotation

GitHub Actions / Windows (free-threading) / build (arm64)

'==': 'int' differs in levels of indirection from 'void *' [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]

Check failure on line 1605 in Objects/dictobject.c

View workflow job for this annotation

GitHub Actions / Windows (free-threading) / build (arm64)

'value': undeclared identifier [D:\a\cpython\cpython\PCbuild\_freeze_module.vcxproj]
return ix;
}

#endif

int
Expand Down Expand Up @@ -2391,31 +2491,31 @@
* key hash failed, key comparison failed, ...). Return NULL if the key doesn't
* exist. Return the value if the key exists.
*
* Returns a new reference.
* Stores a new stack reference.
*/
PyObject *
_PyDict_LoadGlobal(PyDictObject *globals, PyDictObject *builtins, PyObject *key)
void
_PyDict_LoadGlobalStackRef(PyDictObject *globals, PyDictObject *builtins, PyObject *key, _PyStackRef *res)
{
Py_ssize_t ix;
Py_hash_t hash;
PyObject *value;

hash = _PyObject_HashFast(key);
if (hash == -1) {
return NULL;
*res = PyStackRef_NULL;
}

/* namespace 1: globals */
ix = _Py_dict_lookup_threadsafe(globals, key, hash, &value);
if (ix == DKIX_ERROR)
return NULL;
if (ix != DKIX_EMPTY && value != NULL)
return value;
ix = _Py_dict_lookup_threadsafe_stackref(globals, key, hash, res);
if (ix == DKIX_ERROR) {
*res = PyStackRef_NULL;
}
if (ix != DKIX_EMPTY && !PyStackRef_IsNull(*res)) {
return;
}

/* namespace 2: builtins */
ix = _Py_dict_lookup_threadsafe(builtins, key, hash, &value);
assert(ix >= 0 || value == NULL);
return value;
ix = _Py_dict_lookup_threadsafe_stackref(builtins, key, hash, res);
assert(ix >= 0 || PyStackRef_IsNull(*res));
}

/* Consumes references to key and value */
Expand Down
25 changes: 13 additions & 12 deletions Python/bytecodes.c
Original file line number Diff line number Diff line change
Expand Up @@ -1438,21 +1438,21 @@ dummy_func(

inst(LOAD_FROM_DICT_OR_GLOBALS, (mod_or_class_dict -- v)) {
PyObject *name = GETITEM(FRAME_CO_NAMES, oparg);
PyObject *v_o;
int err = PyMapping_GetOptionalItem(PyStackRef_AsPyObjectBorrow(mod_or_class_dict), name, &v_o);
if (err < 0) {
PyObject *v_o = NULL;
if (PyMapping_GetOptionalItem(PyStackRef_AsPyObjectBorrow(mod_or_class_dict), name, &v_o) < 0) {
ERROR_NO_POP();
}
if (v_o == NULL) {
if (PyDict_CheckExact(GLOBALS())
&& PyDict_CheckExact(BUILTINS()))
{
v_o = _PyDict_LoadGlobal((PyDictObject *)GLOBALS(),
(PyDictObject *)BUILTINS(),
name);
if (v_o == NULL) {
_PyDict_LoadGlobalStackRef((PyDictObject *)GLOBALS(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with PyMapping_GetOptionalItem above.
I think it would be better to change the whole instruction to use stack refs, or leave it as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed stackrefs from the instruction to leave it as is.

(PyDictObject *)BUILTINS(),
name,
STACK_ENTRY(v));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does STACK_ENTRY do? Does it have any side effects, or is it just equivalent to &v?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically equivalent to &stack_pointer[whatever_v_index_is]. It has no side effects.

if (PyStackRef_IsNull(v)) {
if (!_PyErr_Occurred(tstate)) {
/* _PyDict_LoadGlobal() returns NULL without raising
/* _PyDict_LoadGlobalStackRef() sets NULL without raising
* an exception if the key doesn't exist */
_PyEval_FormatExcCheckArg(tstate, PyExc_NameError,
NAME_ERROR_MSG, name);
Expand All @@ -1476,8 +1476,10 @@ dummy_func(
}
}
}
if (v_o != NULL) {
v = PyStackRef_FromPyObjectSteal(v_o);
}
DECREF_INPUTS();
v = PyStackRef_FromPyObjectSteal(v_o);
}

inst(LOAD_NAME, (-- v)) {
Expand Down Expand Up @@ -1507,10 +1509,9 @@ dummy_func(

op(_LOAD_GLOBAL, ( -- res, null if (oparg & 1))) {
PyObject *name = GETITEM(FRAME_CO_NAMES, oparg>>1);
PyObject *res_o = _PyEval_LoadGlobal(GLOBALS(), BUILTINS(), name);
ERROR_IF(res_o == NULL, error);
_PyEval_LoadGlobalStackRef(GLOBALS(), BUILTINS(), name, STACK_ENTRY(res));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions:

  1. Make _PyEval_LoadGlobalStackRef and _PyDict_LoadGlobalStackRef return int error codes
  2. Use &STACK_ENTRY(res) because it makes it more clear that the value is a pointer and because it means the STACK_ENTRY() transformation preserves the type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will apply 2. but not 1. The reasoning is to keep consistency with _PyEval_LoadGlobal (non-stackref version).

ERROR_IF(PyStackRef_IsNull(res), error);
null = PyStackRef_NULL;
res = PyStackRef_FromPyObjectSteal(res_o);
}

macro(LOAD_GLOBAL) =
Expand Down
20 changes: 11 additions & 9 deletions Python/ceval.c
Original file line number Diff line number Diff line change
Expand Up @@ -3072,15 +3072,14 @@ _PyEval_GetANext(PyObject *aiter)
return awaitable;
}

PyObject *
_PyEval_LoadGlobal(PyObject *globals, PyObject *builtins, PyObject *name)
void
_PyEval_LoadGlobalStackRef(PyObject *globals, PyObject *builtins, PyObject *name, _PyStackRef *writeto)
{
PyObject *res;
if (PyDict_CheckExact(globals) && PyDict_CheckExact(builtins)) {
res = _PyDict_LoadGlobal((PyDictObject *)globals,
_PyDict_LoadGlobalStackRef((PyDictObject *)globals,
(PyDictObject *)builtins,
name);
if (res == NULL && !PyErr_Occurred()) {
name, writeto);
if (PyStackRef_IsNull(*writeto) && !PyErr_Occurred()) {
/* _PyDict_LoadGlobal() returns NULL without raising
* an exception if the key doesn't exist */
_PyEval_FormatExcCheckArg(PyThreadState_GET(), PyExc_NameError,
Expand All @@ -3090,22 +3089,25 @@ _PyEval_LoadGlobal(PyObject *globals, PyObject *builtins, PyObject *name)
else {
/* Slow-path if globals or builtins is not a dict */
/* namespace 1: globals */
PyObject *res;
if (PyMapping_GetOptionalItem(globals, name, &res) < 0) {
return NULL;
*writeto = PyStackRef_NULL;
return;
}
if (res == NULL) {
/* namespace 2: builtins */
if (PyMapping_GetOptionalItem(builtins, name, &res) < 0) {
return NULL;
*writeto = PyStackRef_NULL;
return;
}
if (res == NULL) {
_PyEval_FormatExcCheckArg(
PyThreadState_GET(), PyExc_NameError,
NAME_ERROR_MSG, name);
}
}
*writeto = PyStackRef_FromPyObjectSteal(res);
}
return res;
}

PyObject *
Expand Down
6 changes: 3 additions & 3 deletions Python/executor_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 13 additions & 11 deletions Python/generated_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading