Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC and WIP: Add MypyTypeInferenceProvider #831

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rominf
Copy link

@rominf rominf commented Dec 7, 2022

Summary

This change is RFC and WIP (please read whole change message).
Commit message is basically the same ATM, no need to read it for now.

Add MypyTypeInferenceProvider as an alternative for
TypeInferenceProvider. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field MypyType.mypy_type.

Let's assume we have the following code in the file x.py which we want
to inspect:

x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f

Then to play with mypy types one should use the code like:

import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True

Why?

  1. TypeInferenceProvider requires pyre (pyre-check on PyPI) to be
    installed. mypy is more popular than pyre. If the organization uses
    mypy already (which is almost always the case), it may be difficult
    to assure colleagues (including security team) that "we need yet
    another type checker".
  2. Even though it is possible to run pyre without watchman installation,
    this is not advertised. watchman installation is not always possible
    because of system requirements, or because of the security
    requirements like "we install only our favorite GNU/Linux
    distribution packages".
  3. TypeInferenceProvider usage requires pyre start command to be run
    before the execution, and pyre stop - after the execution. This may
    be inconvenient, especially for the cases when pyre was not used
    before.
  4. Types produced by pyre in TypeInferenceProvider are just strings.
    For example, it's not easily possible to infer that some variable is
    enum instance. MypyTypeInferenceProvider makes it easy, see the
    code above.

Drawbacks:

  1. Speed. mypy is slower than pyre, so is MypyTypeInferenceProvider
    comparing to TypeInferenceProvider.
    How to partially solve this:
    1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
    2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
    3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
  2. Two providers are doing similar things in LibCST will be present,
    this can potentially lead to the situation when there is a need
    install two type checkers to get all codemods from the library
    running.
    Alternatives considered:
    1. Put MypyTypeInferenceProvider inside separate library (say,
      LibCST-mypy or libcst-mypy on PyPI). This will explicitly
      separate MypyTypeInferenceProvider from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
        library, for example, if-elif-else to match converter
        (it needs powerful type inference), they are doomed to not be
        shipped with LibCST.
    2. Implement base class for inferred type, which inherits from str
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting TypeInferenceProvider
      type checker (mypy or pyre; user can do this via environmental
      variable). If the code inside LibCST requires just shallow type
      information (so, just str is enough), then the code can run with
      any type checker. The remaining code (such as if-elif-else to
      match converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason pyre check cannot find
mypy library.

Related to:

Test Plan

Check the message above and unittests.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 7, 2022
@rominf rominf force-pushed the rominf/mypy-provider branch 4 times, most recently from 3e16e20 to da2d4b7 Compare December 8, 2022 04:40
This change is RFC and WIP (please read whole change message).

Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.

Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f
```

Then to get play with mypy types one should use the code like:
```python
import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True
```

Why?

1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
   installed. mypy is more popular than pyre. If the organization uses
   mypy already (which is almost always the case), it may be difficult
   to assure colleagues (including security team) that "we need yet
   another type checker".
2. Even though it is possible to run pyre without watchman installation,
   this is not advertised. watchman installation is not always possible
   because of system requirements, or because of the security
   requirements like "we install only our favorite GNU/Linux
   distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
   before the execution, and `pyre stop` - after the execution. This may
   be inconvenient, especially for the cases when pyre was not used
   before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
   For example, it's not easily possible to infer that some variable is
   enum instance. `MypyTypeInferenceProvider` makes it easy, see the
   code above.

Drawbacks:

1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
   comparing to `TypeInferenceProvider`.
   How to partially solve this:
   1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
   2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
   3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
   this can potentially lead to the situation when there is a need
   install two type checkers to get all codemods from the library
   running.
   Alternatives considered:
   1. Put `MypyTypeInferenceProvider` inside separate library (say,
       LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
       separate `MypyTypeInferenceProvider` from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
         library, for example, `if-elif-else` to `match` converter
	 (it needs powerful type inference), they are doomed to not be
	 shipped with LibCST.
   2. Implement base class for inferred type, which inherits from `str`
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting `TypeInferenceProvider`
      type checker (mypy or pyre; user can do this via environmental
      variable). If the code inside LibCST requires just shallow type
      information (so, just `str` is enough), then the code can run with
      any type checker. The remaining code (such as `if-elif-else` to
      `match` converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.
@rominf rominf force-pushed the rominf/mypy-provider branch from da2d4b7 to 78f4b1d Compare December 8, 2022 04:45
@rominf rominf changed the title RFC: Add MypyTypeInferenceProvider RFC and WIP: Add MypyTypeInferenceProvider Dec 8, 2022
@rominf
Copy link
Author

rominf commented Dec 9, 2022

What do you think? Will you consider to merge this addition if I improve it and address the feedback?

@zsol
Copy link
Member

zsol commented Jan 24, 2023

Hi @rominf, thanks so much for this contribution. I'm in favor of supporting a TypeInferenceProvider-like provider that can rely on mypy. I'm chiefly concerned about two providers providing the same functionality but having different APIs: codemods that want to rely on type information will have to choose a provider implementation.

I think the right way forward here is to come up with a representation of Python types that both mypy and pyre results can be mapped onto, and have a TypeInferenceProvider that returns this common representation. It sounds similar to what you propose in 2/ii, except I think we should find a representation that's richer than a bare str.

@Kludex
Copy link
Contributor

Kludex commented May 1, 2023

I need this. I've released libcst-mypy, and gave credits to @rominf.

If you are against, or you want the PyPI namespace, let me know.

@Kludex
Copy link
Contributor

Kludex commented Dec 27, 2023

Just an update... The package I've created, it's just too slow to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants