Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QualifiedNameProvider matcher unusable with Windows process pools #435

Closed
Zac-HD opened this issue Dec 21, 2020 · 7 comments
Closed

QualifiedNameProvider matcher unusable with Windows process pools #435

Zac-HD opened this issue Dec 21, 2020 · 7 comments
Labels
bug Something isn't working codemod Bundled codemods, visitors, metadata providers

Comments

@Zac-HD
Copy link
Contributor

Zac-HD commented Dec 21, 2020

I have a MVP for HypothesisWorks/hypothesis#2705, but discovered that on Windows it can only format a single file at a time. The underlying problem is that multiprocessing on Windows works via spawn (vs fork on Linux), and lambdas cannot be pickled but are practically required to use matchers. Related to #294 in that running tests on Windows would presumably detect this too?

import libcst as cst
import libcst.matchers as m
from libcst.codemod import VisitorBasedCodemodCommand

def match_qualname(name):
    return m.MatchMetadataIfTrue(
        cst.metadata.QualifiedNameProvider,
        lambda qualnames: all(n.name == name for n in qualnames),  # <-- here's the problem
    )

class HypothesisFixComplexMinMagnitude(VisitorBasedCodemodCommand):
    DESCRIPTION: str = "Upgrade from deprecated or inefficient uses of Hypothesis."
    METADATA_DEPENDENCIES = (cst.metadata.QualifiedNameProvider,)

    @m.call_if_inside(
        m.Call(metadata=match_qualname("hypothesis.strategies.complex_numbers"),
               args=[m.Arg(value=m.Name("None"), keyword=m.Name("min_magnitude"))])
    )
    def leave_Arg(self, original_node: cst.Arg, updated_node: cst.Arg) -> cst.Arg:
        if m.matches(updated_node.keyword, m.Name("min_magnitude")):
            return updated_node.with_changes(value=cst.Integer("0"))
        return updated_node

# From https://github.com/Zac-HD/hypothesis/commit/5002c516332f5692be18d3f551157e694400c517
# aka  https://github.com/HypothesisWorks/hypothesis/compare/master...Zac-HD:codemods
# Save two files with this code, try to codemod them, and you'll get the error.
from hypothesis.strategies import complex_numbers

complex_numbers(min_magnitude=None)

It's impractical to write a top-level function for each of the fifty-ish qualnames we want to handle, and unfortunately functools.partial has the same problem as lambdas.

The obvious workaround is to give up some performance and run in a single process on Windows. Unfortunately that doesn't work either because --jobs=1 still opens a process pool...

@thatch
Copy link
Contributor

thatch commented Dec 21, 2020

I don't quite follow what the "all" is doing, but a callable object may be picklable enough to find the next windows bug...

class QualnameMatcher:
  def __init__(self, name: str):
    self.name = name
  def __call__(self, qualnames) -> bool:
    return all(n.name == self.name for n in qualnames)

...

return m.MatchMetadataIfTrue(
  cst.metadata.QualifiedNameProvider,
  QualnameMatcher(name),  # <-- callable object now
)

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 22, 2020

(the all is because there might be several possibilities, and in this case too few is better than too many auto-fixes)

The class approach does indeed fix a pickle problem, but:

KeyError: Call(
    func=DoNotCare(), 
    args=[
        ZeroOrMore(DoNotCare()), 
        Arg(
            value=Name(value='None', lpar=DoNotCare(), rpar=DoNotCare(), metadata=DoNotCare()), 
            keyword=Name(value='min_magnitude', lpar=DoNotCare(), rpar=DoNotCare(), metadata=DoNotCare()), 
            equal=DoNotCare(), comma=DoNotCare(), star=DoNotCare(), whitespace_after_star=DoNotCare(), whitespace_after_arg=DoNotCare(), metadata=DoNotCare()
        ), 
        ZeroOrMore(DoNotCare())
    ], 
    lpar=DoNotCare(), rpar=DoNotCare(), whitespace_after_func=DoNotCare(), whitespace_before_args=DoNotCare(), 
    metadata=MatchMetadataIfTrue(
        key=<class 'libcst.metadata.scope_provider.QualifiedNameProvider'>, 
        func=<hypothesis.extra.codemods.QualnameMatcher object at 0x000001B9FB3AD780>
    )
)

Traceback (most recent call last):
  File "libcst/codemod/_cli.py", line 294, in _execute_transform
    output_tree = transformer.transform_module(input_tree)
  File "libcst/codemod/_command.py", line 72, in transform_module
    tree = super().transform_module(tree)
  File "libcst/codemod/_codemod.py", line 108, in transform_module
    return self.transform_module_impl(tree_with_metadata)
  File "libcst/codemod/_visitor.py", line 32, in transform_module_impl
    return tree.visit(self)
  File "libcst/_nodes/module.py", line 91, in visit
    result = super(Module, self).visit(visitor)
  File "libcst/_nodes/base.py", line 225, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
  File "libcst/_nodes/module.py", line 75, in _visit_and_replace_children
    body=visit_body_sequence(self, "body", self.body, visitor),
  File "libcst/_nodes/internal.py", line 200, in visit_body_sequence
    return tuple(visit_body_iterable(parent, fieldname, children, visitor))
  File "libcst/_nodes/internal.py", line 172, in visit_body_iterable
    new_child = child.visit(visitor)
  File "libcst/_nodes/base.py", line 225, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
  File "libcst/_nodes/statement.py", line 424, in _visit_and_replace_children
    body=visit_sequence(self, "body", self.body, visitor),
  File "libcst/_nodes/internal.py", line 156, in visit_sequence
    return tuple(visit_iterable(parent, fieldname, children, visitor))
  File "libcst/_nodes/internal.py", line 140, in visit_iterable
    new_child = child.visit(visitor)
  File "libcst/_nodes/base.py", line 225, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
  File "libcst/_nodes/statement.py", line 331, in _visit_and_replace_children
    value=visit_required(self, "value", self.value, visitor),
  File "libcst/_nodes/internal.py", line 81, in visit_required
    result = node.visit(visitor)
  File "libcst/_nodes/base.py", line 225, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
  File "libcst/_nodes/expression.py", line 2343, in _visit_and_replace_children
    args=visit_sequence(self, "args", self.args, visitor),
  File "libcst/_nodes/internal.py", line 156, in visit_sequence
    return tuple(visit_iterable(parent, fieldname, children, visitor))
  File "libcst/_nodes/internal.py", line 140, in visit_iterable
    new_child = child.visit(visitor)
  File "libcst/_nodes/base.py", line 234, in visit
    leave_result = visitor.on_leave(self, with_updated_children)
  File "libcst/matchers/_visitors.py", line 510, in on_leave
    self._matchers, getattr(self, f"leave_{type(original_node).__name__}", None)
  File "libcst/matchers/_visitors.py", line 421, in _should_allow_visit
    all_matchers, obj
  File "libcst/matchers/_visitors.py", line 399, in _all_positive_matchers_true
    if all_matchers[matcher] is None:

Oddly this also only happens if I'm transforming multiple files.

@thatch
Copy link
Contributor

thatch commented Dec 23, 2020

Can you point me at a more complete repro? I assume this is using your dummy change, but anything else? Against what target?

I'm wondering if this is a latent issue with spawn, but if so we should be able to repro on linux too.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 23, 2020

Full script - just replacing the lambda with QualnameMatcher above
import libcst as cst
import libcst.matchers as m
from libcst.codemod import VisitorBasedCodemodCommand

class QualnameMatcher:
    def __init__(self, names):
        self.names = frozenset(names)
    def __call__(self, qualnames):
        return all(n.name in self.names for n in qualnames)

def match_qualname(*names):
    return m.MatchMetadataIfTrue(
        cst.metadata.QualifiedNameProvider, QualnameMatcher(names)
    )

class HypothesisFixComplexMinMagnitude(VisitorBasedCodemodCommand):
    DESCRIPTION = "Upgrade from deprecated or inefficient uses of Hypothesis."
    METADATA_DEPENDENCIES = (cst.metadata.QualifiedNameProvider,)

    @m.call_if_inside(
        m.Call(metadata=match_qualname("hypothesis.strategies.complex_numbers"),
               args=[m.Arg(value=m.Name("None"), keyword=m.Name("min_magnitude"))])
    )
    def leave_Arg(self, original_node, updated_node):
        if m.matches(updated_node.keyword, m.Name("min_magnitude")):
            return updated_node.with_changes(value=cst.Integer("0"))
        return updated_node
echo "from hypothesis.strategies import complex_numbers; complex_numbers(min_magnitude=None)" > t.py
cp t.py tt.py
python -m libcst.tool codemod codemods.HypothesisFixComplexMinMagnitude t.py tt.py

though you'll also need to copy that to a module that libcst.tool knows to look in, of course.

@thatch
Copy link
Contributor

thatch commented Dec 26, 2020

Alright, spent another half-hour on this while waiting for an OS update to finish.

The good news is I can repro on Linux. https://gist.github.com/thatch/a7b45eb23309a14cc78b8e42a58aff66

The middling news is, I think it's probably just a missing __hash__ method to not use id. The dict giving the KeyError has a likely candidate, but it is failing the hash lookup.

The bad news is I don't yet understand which one. I see in the repr your QualnameMatcher is a different id at least.
It looks like libcst.matchers.Call was never intended to be used this way -- it has unsafe_hash=False which per https://docs.python.org/3/library/dataclasses.html which might be using the id. (I didn't verify.)

I did try to set unsafe_hash=True but one of the items is a list. I think this would be a bigger change.

The way I "fixed" this in bowler was avoiding processes entirely -- search for "in_process" in https://github.com/facebookincubator/Bowler/blob/master/bowler/tool.py -- suboptimal performance, but works until I sort out a better way. What are your thoughts about doing something similar here?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 26, 2020

For HypothesisWorks/hypothesis#2712 I've already gone to a different approach - because I want to run multiple transformations on each file, I have a refactor(code: str) -> str function, and that works just fine in a process pool.

Of course I'd still like the codemod to be usable via libcst.tool on Windows, but with hypothesis codemod working I don't consider that a blocker.

@zsol zsol added the bug Something isn't working label May 19, 2021
@zsol zsol added the codemod Bundled codemods, visitors, metadata providers label Jun 16, 2022
@zsol
Copy link
Member

zsol commented Sep 25, 2024

I think this will be fixed by #1204.

@zsol zsol closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working codemod Bundled codemods, visitors, metadata providers
Projects
None yet
Development

No branches or pull requests

3 participants