Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data API] Array Object Implementation #261

Merged
merged 29 commits into from
Apr 25, 2023

Conversation

roaffix
Copy link
Contributor

@roaffix roaffix commented Feb 5, 2023

Purpose

Data API documentation - https://data-apis.org/array-api/latest/purpose_and_scope.html

Change List

  • Array Object - class Array

    • Parameters
      • src -> x. Motivation: src is commonly used to name the source code folders and not as a parameter in the class. Other options like arr (is used in the class), a (not intuitive), or object (likely meant by ANY object, but it is not true in this case) seem like not a good fit, so I chose x is as it is an intuitive parameter to any function. Also, it occurs in the specification, e.g., here.
      • dims -> shape. Motivation: dims (or ndim) is more commonly used as int, and shape as tuple, so the change is understandable and should not cause any confusion.
      • is_device -> pointer_source. Motivation: add more intuition, already discussed with @syurkevi, but most likely is subject to change in further PRs with Device implementation.
      • dtype, offset, and strides are left unchanged due to their pretty clear purpose.
    • Remove BaseClass because it has only one purpose to set a default arr value as ctypes.c_void_p, so it was moved to the Array class.
    • Added mocks for required new operators that will be implemented in follow-up PRs
    • Minor logical changes behind C functions processing in the Python class operators wrappers
    • Minor changes in __repr__ to make Array more descriptive during the prototyping. Will be merged with display() method as numpy was selected as a state-of-the-art example
    • Renamed the element method to size as it is more common in data APIs.
    • Renamed the to_ctype method to to_ctype_array as it actually returns the C Array
    • Removed dims as has the same functionality as the shape method and the last one is required by the specification
    • Renamed numdims to ndim as it is shorter and no more mess with other dim-like synonyms included. And yes. it is mentioned in the specification
    • Removed the type method as it has no applied sense. The value can now be retrieved by accessing the specific Dtype value, e.g., Array().dtype.c_api_value (more about data types changes below)
    • Major code refactoring to provide a code style more intuitive and fits the PEP recommendations.
    • Other methods that were implemented in the original wrapper partially will be moved to the array_api. The whole list and key changes will be provided in Change List.
  • Data Types - class Dtype

    • Changed to dataclass (also, subject to change) because its main purpose was to be a "bridge" between python types and C types, so the batch of different dictionaries was reduced to classes with 4 parameters and a couple of helper methods
    • Parameters
      • typecode - previously to_dtype()
      • c_type - previously to_c_type()
      • typename - left unchanged
      • c_api_value - previously was an _Enum that was a match to C++ library dtype Enum. (P.S. naming was discussed with @syurkevi and I really liked that idea)
  • Other (subject to change in follow-up PRs)

    • Changed the implementation of dim4() which is now converted to class CShape. The main purpose was to convert Python shapes to C Shapes, so the logic remains the same, but the current solution helps to avoid extra code duplication and bugs during the type conversions.
    • Separated c_dim_t from the arch investigation
    • Added mocks for planned follow-up changes with Device, Dtype functions, utils (all() and any() methods), and ArrayFire custom backend solution
  • Testing

    • Added mypy typings that harshly burst code intuition and readability
    • Changed custom namings for types from ctypes. This adds more intuition and simplifies debugging process
    • Removed import * to simplify debugging process and avoid recursive re-importing as it meets in the original library
    • Added primitive unit tests
  • Dev Flow

    • Added CI
    • Added flake8 and autopep8 code style checkers
    • Added isort imports checker
    • Added flake8-quotes for quotes usage unification to avoid the mess with docstrings and comments
    • Added mypy for typing as dev requirements to the new dev flow sample
    • Added test suite with pytest
    • Added test coverage report (currently code coverage is about 83% - planned 100%)

Notes

This is the first PR among the series of Data API implementations. A lot of changes are subject to change in follow-ups and a bunch of bags may occur till the whole array_api implementation. Any suggestions, bug reports, and feature requests are highly welcome (but may be implemented sooner or later due to the marked plan of development).



class Array:
def __init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: move to asarray() function
follow constructor like done in numpy/etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a custom array object, numpy.array_api.Array, which is returned by
all functions in this module. All functions in the array API namespace
implicitly assume that they will only receive this object as input. The only
way to create instances of this object is to use one of the array creation
functions. It does not have a public constructor on the object itself. The
object is a small wrapper class around numpy.ndarray. The main purpose of it
is to restrict the namespace of the array object to only those dtypes and
only those methods that are required by the spec, as well as to limit/change
certain behavior that differs in the spec. In particular:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR ought to describe a new structure with fewer changes to ArrayFire API. First of many, tbh :)
Changing init now will affect a major part of this PR and follow-up #262. Can we postpone these changes (that are crucial I agree) to the third one, so as to avoid huge code merge issues? Proposed changes in #262 will simplify the implementation of asarray method, thus I wanted to keep creation functions (https://data-apis.org/array-api/latest/API_specification/creation_functions.html) till the finished backend.

arrayfire/array_api/array_object.py Show resolved Hide resolved
# TODO
return NotImplemented

def __len__(self) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove, not part of the spec. Would be good to verify all of these and remove extra functions or make them private

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move/group all non-standard functions into separate region, at bottom? delineate with comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now grouped by another principle - magic methods, attributes, and aux functions. Mixing it all under another group will lead us to a messy search across the source code. I'm proposing using built-in tags like #NOTE to highlight such methods with comments if it's necessary.

return out

@property
def size(self) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And still, we won't have a case where size equals None, but making the return Optional will lead us to major logical code changes if we want to keep typing work.
Mypy doesn't implement it in their source code as well https://github.com/numpy/numpy/blob/6f3e1f458e04d13bdd56cff5669f9fd96a25fb66/numpy/array_api/_array_object.py#L1088
My point is to keep int as an expected return and not try to imagine cases that likely never happen in the future with None return.
Or, let's switch to TDD and add valid test case for None return and then fix typing in the source code 🤷‍♂️

@@ -0,0 +1,5 @@
# Build requirements
wheel~=0.38.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loosen version requirements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already loosened :)
We stick to the 0.38.X version because it's not the latest stable release and we want to be sure that other minor version bumps won't affect our dev setup. ~= means we are OK with bug fixes in fixes versions, thus X variable changes won't affect us at all.
FYI https://peps.python.org/pep-0440/#compatible-release

setup.cfg Outdated Show resolved Hide resolved
@syurkevi syurkevi merged commit 94ebafa into arrayfire:master Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants