Skip to content

Syntax for typing multi-dimensional arrays #516

@shoyer

Description

@shoyer

As part of the larger project for multi-dimensional arrays (#513), one of the first questions I would like to settle is what syntax for typing data-types and shapes should look like.

Both dtype and shape should be optional, and it should be possible to define multi-dimensional arrays for which either or both of these are generic:

  • dtype: indicates the data type for array elements, e.g., np.float64
  • shape: indicates the shape of the multi-dimensional array, a tuple of zero or more integers. We would like to support integer and variable sized dimensions, and variable numbers of dimensions. These are most naturally represented with indexing by a variadic number of integer, variable, colon : and/or ellipsis ... arguments, e.g., NDArray[1, N, :, ...] for an array with dimensions of size 1, size N, and arbitrary size, followed by 0 or more arbitrary sized dimensions.

For NumPy, ideally we would like to add basic typing support for dtype (using Generic) even before typing for shape is possible. But we'd like to know what the ultimate syntax should look like, so we don't paint ourselves into a corner.

One key question: can we safely rely on using a single generic argument for dtypes (e.g., np.ndarray[np.float64]) as indicating an array without any shape constraints?

My doc (same as in the master issue) considers a number of options under the "Possible syntax" section.

So far, I think the best option is some variation of "two generic arguments", for dtype and shape. But this could quickly get annoyingly verbose when sprinkled all over a code-base, e.g., np.ndarray[np.float32, Shaped[..., N, M]]:

  • It would be nice to support syntax like np.ndarray[np.float32] (the multi-dimensional equivalent of List[float]) as an alias for np.ndarray[np.float32, Any], but we don't yet have optional arguments for generics (variadic arguments are a somewhat awkward fit for a single argument).
  • It would also be nice to allow omitting Shaped[], e.g., by writing dimensions as variadic generics to the array type like np.ndarray[np.float32, ..., N, M]. One possible ambiguity is how to specify scalar arrays: np.ndarray[np.float32,] looks very similar to np.ndarray[np.float32]. But scalar arrays are rare enough that these could potentially be resolved by disallowing np.ndarray[np.float32,] in favor of requiring np.ndarray[np.float32, Shape[()]].

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: featureDiscussions about new features for Python's type annotations

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions