Skip to content

Commit

Permalink
RFC: Add MypyTypeInferenceProvider
Browse files Browse the repository at this point in the history
This change is RFC (please read whole change message).

Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.

Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f
```

Then to get play with mypy types one should use the code like:
```python
import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True
```

Why?

1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
   installed. mypy is more popular than pyre. If the organization uses
   mypy already (which is almost always the case), it may be difficult
   to assure colleagues (including security team) that "we need yet
   another type checker". `MypyTypeInferenceProvider` requires the
   latest mypy only.
2. Even though it is possible to run pyre without watchman installation,
   this is not advertised. watchman installation is not always possible
   because of system requirements, or because of the security
   requirements like "we install only our favorite GNU/Linux
   distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
   before the execution, and `pyre stop` - after the execution. This may
   be inconvenient, especially for the cases when pyre was not used
   before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
   For example, it's not easily possible to infer that some variable is
   enum instance. `MypyTypeInferenceProvider` makes it easy, see the
   code above.

Drawbacks:

1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
   comparing to `TypeInferenceProvider`.
   How to partially solve this:
   1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
   2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
   3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
   this can potentially lead to the situation when there is a need
   install two typecheckers to get all codemods from the library
   running.
   Alternatives considered:
   1. Put `MypyTypeInferenceProvider` inside separate library (say,
       LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
       separate `MypyTypeInferenceProvider` from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
         library, for example, `if-elif-else` to `match` converter
	 (it needs powerful type inference), they are doomed to not be
	 shipped with LibCST, which makes the latter less attractive for
	 end users.
   2. Implement base class for inferred type, which inherits from `str`
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting `TypeInferenceProvider`
      typechecker (mypy or pyre; user can do this via enviromental
      variable). If the code inside LibCST requires just shallow type
      information (so, just `str` is enough), then the code can run with
      any typechecker. The remaining code (such as `if-elif-else` to
      `match` converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.

Related to:

* #451
* pyastrx/pyastrx#40
* python/mypy#12513
* python/mypy#4868
  • Loading branch information
Roman Inflianskas committed Dec 7, 2022
1 parent f668e88 commit 3e16e20
Show file tree
Hide file tree
Showing 5 changed files with 309 additions and 0 deletions.
2 changes: 2 additions & 0 deletions libcst/metadata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
ExpressionContextProvider,
)
from libcst.metadata.full_repo_manager import FullRepoManager
from libcst.metadata.mypy_type_inference_provider import MypyTypeInferenceProvider
from libcst.metadata.name_provider import (
FullyQualifiedNameProvider,
QualifiedNameProvider,
Expand Down Expand Up @@ -74,6 +75,7 @@
"ClassScope",
"ComprehensionScope",
"ScopeProvider",
"MypyTypeInferenceProvider",
"ParentNodeProvider",
"QualifiedName",
"QualifiedNameSource",
Expand Down
96 changes: 96 additions & 0 deletions libcst/metadata/mypy_type_inference_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

from pathlib import Path
from typing import Dict, List, Mapping, Optional, TYPE_CHECKING

import libcst as cst
from libcst._position import CodeRange
from libcst.helpers import calculate_module_and_package
from libcst.metadata.base_provider import BatchableMetadataProvider
from libcst.metadata.position_provider import PositionProvider

try:
import mypy

MYPY_INSTALLED = True
except ImportError:
MYPY_INSTALLED = False


if TYPE_CHECKING:
import mypy.nodes

import libcst.metadata.mypy_utils


def raise_on_mypy_non_installed() -> None:
if not MYPY_INSTALLED:
raise RuntimeError("mypy is not installed, please install it")


class MypyTypeInferenceProvider(
BatchableMetadataProvider["libcst.metadata.mypy_utils.MypyType"]
):
"""
Access inferred type annotation through `mypy <http://mypy-lang.org/>`_.
"""

METADATA_DEPENDENCIES = (PositionProvider,)

@classmethod
def gen_cache(
cls, root_path: Path, paths: List[str], timeout: Optional[int] = None
) -> Mapping[
str, Optional["libcst.metadata.mypy_utils.MypyTypeInferenceProviderCache"]
]:
raise_on_mypy_non_installed()

import mypy.build
import mypy.main

from libcst.metadata.mypy_utils import MypyTypeInferenceProviderCache

targets, options = mypy.main.process_options(paths)
options.preserve_asts = True
options.fine_grained_incremental = True
options.use_fine_grained_cache = True
mypy_result = mypy.build.build(targets, options=options)
cache = {}
for path in paths:
module = calculate_module_and_package(str(root_path), path).name
cache[path] = MypyTypeInferenceProviderCache(
module_name=module,
mypy_file=mypy_result.graph[module].tree,
)
return cache

def __init__(
self,
cache: Optional["libcst.metadata.mypy_utils.MypyTypeInferenceProviderCache"],
) -> None:
from libcst.metadata.mypy_utils import CodeRangeToMypyNodesBinder

super().__init__(cache)
self._mypy_node_locations: Dict[CodeRange, "mypy.nodes.Node"] = {}
if cache is None:
return
code_range_to_mypy_nodes_binder = CodeRangeToMypyNodesBinder(cache.module_name)
code_range_to_mypy_nodes_binder.visit_mypy_file(cache.mypy_file)
self._mypy_node_locations = code_range_to_mypy_nodes_binder.locations

def _parse_metadata(self, node: cst.CSTNode) -> None:
range = self.get_metadata(PositionProvider, node)
if range in self._mypy_node_locations:
self.set_metadata(node, self._mypy_node_locations.get(range))

def visit_Name(self, node: cst.Name) -> Optional[bool]:
self._parse_metadata(node)

def visit_Attribute(self, node: cst.Attribute) -> Optional[bool]:
self._parse_metadata(node)

def visit_Call(self, node: cst.Call) -> Optional[bool]:
self._parse_metadata(node)
145 changes: 145 additions & 0 deletions libcst/metadata/mypy_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
from dataclasses import dataclass, field
from typing import Dict, Optional, Union

import mypy.build
import mypy.main
import mypy.modulefinder
import mypy.nodes
import mypy.options
import mypy.patterns
import mypy.traverser
import mypy.types
import mypy.typetraverser

from libcst._add_slots import add_slots
from libcst._position import CodePosition, CodeRange


@add_slots
@dataclass(frozen=True)
class MypyTypeInferenceProviderCache:
module_name: str
mypy_file: mypy.nodes.MypyFile


@add_slots
@dataclass(frozen=True)
class MypyType:
is_type_constructor: bool
mypy_type: Optional[Union[mypy.types.Type, mypy.nodes.TypeInfo]] = None
fullname: str = field(init=False)

def __post_init__(self) -> None:
if isinstance(self.mypy_type, mypy.types.Type):
fullname = str(self.mypy_type)
else:
fullname = self.mypy_type.fullname
if self.is_type_constructor:
fullname = f"typing.Type[{fullname}]"
object.__setattr__(self, "fullname", fullname)

def __str__(self) -> str:
return self.fullname


class CodeRangeToMypyNodesBinder(
mypy.traverser.TraverserVisitor, mypy.typetraverser.TypeTraverserVisitor
):
def __init__(self, module_name: str) -> None:
super().__init__()
self.locations: Dict[CodeRange, MypyType] = {}
self.in_type_alias_expr = False
self.module_name = module_name

# Helpers

@staticmethod
def get_code_range(o: mypy.nodes.Context) -> CodeRange:
return CodeRange(
start=CodePosition(o.line, o.column),
end=CodePosition(o.end_line, o.end_column),
)

@staticmethod
def check_bounds(o: mypy.nodes.Context) -> bool:
return (
(o.line is not None)
and (o.line >= 1)
and (o.column is not None)
and (o.column >= 0)
and (o.end_line is not None)
and (o.end_line >= 1)
and (o.end_column is not None)
and (o.end_column >= 0)
)

def record_type_location_using_code_range(
self,
code_range: CodeRange,
t: Optional[Union[mypy.types.Type, mypy.nodes.TypeInfo]],
is_type_constructor: bool,
) -> None:
if t is not None:
self.locations[code_range] = MypyType(
is_type_constructor=is_type_constructor, mypy_type=t
)

def record_type_location(
self,
o: mypy.nodes.Context,
t: Optional[Union[mypy.types.Type, mypy.nodes.TypeInfo]],
is_type_constructor: bool,
) -> None:
if self.check_bounds(o):
self.record_type_location_using_code_range(
code_range=self.get_code_range(o),
t=t,
is_type_constructor=is_type_constructor,
)

def record_location_by_name_expr(
self, code_range: CodeRange, o: mypy.nodes.NameExpr, is_type_constructor: bool
) -> None:
if isinstance(o.node, mypy.nodes.Var):
self.record_type_location_using_code_range(
code_range=code_range, t=o.node.type, is_type_constructor=False
)
elif isinstance(o.node, mypy.nodes.TypeInfo):
self.record_type_location_using_code_range(
code_range=code_range, t=o.node, is_type_constructor=is_type_constructor
)

# Actual visitors

def visit_var(self, o: mypy.nodes.Var) -> None:
super().visit_var(o)
self.record_type_location(o=o, t=o.type, is_type_constructor=False)

def visit_name_expr(self, o: mypy.nodes.NameExpr) -> None:
super().visit_name_expr(o)
# Implementation in base classes is omitted, record it if it is variable or class
self.record_location_by_name_expr(
self.get_code_range(o), o, is_type_constructor=True
)

def visit_member_expr(self, o: mypy.nodes.MemberExpr) -> None:
super().visit_member_expr(o)
# Implementation in base classes is omitted, record it
# o.def_var should not be None after mypy run, checking here just to be sure
if o.def_var is not None:
self.record_type_location(o=o, t=o.def_var.type, is_type_constructor=False)

def visit_call_expr(self, o: mypy.nodes.CallExpr) -> None:
super().visit_call_expr(o)
if isinstance(o.callee, mypy.nodes.NameExpr):
self.record_location_by_name_expr(
code_range=self.get_code_range(o), o=o.callee, is_type_constructor=False
)

def visit_instance(self, o: mypy.types.Instance) -> None:
super().visit_instance(o)
self.record_type_location(o=o, t=o, is_type_constructor=False)
65 changes: 65 additions & 0 deletions libcst/metadata/tests/test_mypy_type_inference_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

import sys
from pathlib import Path
from unittest import skipIf

import libcst as cst
from libcst import MetadataWrapper
from libcst.metadata.mypy_type_inference_provider import MypyTypeInferenceProvider
from libcst.testing.utils import data_provider, UnitTest
from libcst.tests.test_pyre_integration import TEST_SUITE_PATH


def _test_simple_class_helper(test: UnitTest, wrapper: MetadataWrapper) -> None:
mypy_nodes = wrapper.resolve(MypyTypeInferenceProvider)
m = wrapper.module
assign = cst.ensure_type(
cst.ensure_type(
cst.ensure_type(
cst.ensure_type(m.body[1].body, cst.IndentedBlock).body[0],
cst.FunctionDef,
).body.body[0],
cst.SimpleStatementLine,
).body[0],
cst.AnnAssign,
)
self_number_attr = cst.ensure_type(assign.target, cst.Attribute)
test.assertEqual(str(mypy_nodes[self_number_attr]), "builtins.int")

# self
test.assertEqual(
str(mypy_nodes[self_number_attr.value]), "libcst.tests.pyre.simple_class.Item"
)
collector_assign = cst.ensure_type(
cst.ensure_type(m.body[3], cst.SimpleStatementLine).body[0], cst.Assign
)
collector = collector_assign.targets[0].target
test.assertEqual(
str(mypy_nodes[collector]), "libcst.tests.pyre.simple_class.ItemCollector"
)
items_assign = cst.ensure_type(
cst.ensure_type(m.body[4], cst.SimpleStatementLine).body[0], cst.AnnAssign
)
items = items_assign.target
test.assertEqual(
str(mypy_nodes[items]), "typing.Sequence[libcst.tests.pyre.simple_class.Item]"
)


class MypyTypeInferenceProviderTest(UnitTest):
@data_provider(
((TEST_SUITE_PATH / "simple_class.py", TEST_SUITE_PATH / "simple_class.json"),)
)
def test_simple_class_types(self, source_path: Path, data_path: Path) -> None:
file = str(source_path)
repo_root = Path(__file__).parents[3]
cache = MypyTypeInferenceProvider.gen_cache(repo_root, [file])
wrapper = MetadataWrapper(
cst.parse_module(source_path.read_text()),
cache={MypyTypeInferenceProvider: cache[file]},
)
_test_simple_class_helper(self, wrapper)
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ hypothesis>=4.36.0
hypothesmith>=0.0.4
jupyter>=1.0.0
maturin>=0.8.3,<0.14
mypy>=0.991
nbsphinx>=0.4.2
prompt-toolkit>=2.0.9
pyre-check==0.9.9; platform_system != "Windows"
Expand Down

0 comments on commit 3e16e20

Please sign in to comment.