Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor resolvelib 1.1.0 #13001

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Oct 11, 2024

Fixes #12768
Fixes #12317
Fixes #12305
Fixes #12497
Fixes #12754 (comment)

Draft PR of resolvelib beta to see if it passes tests. Will mark as ready to review and ask for review/merge when final is ready.

@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 18, 2024

So I checked how resolution performed using https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks, and I have good news and bad news.

Firstly, as expected, due to correctness fixes #12768 and #12317 resolve instead of showing ResolutionImpossible.

Some unexpected good news is this resolves #12305.

Some slightly bad news is this turns #12990 from a Build Failure to a ResolutionTooDeep, but this can be fixed if #13017 lands.

Some worse news is that the requirements in astral-sh/uv#1398 go from resolving (after some time), to ResolutionTooDeep, and I don't have an immediate optimization to fix this. It will require deeper investigation, but I don't think that should stop pip from vendoring resolvelib as it fixes correctness errors and other cases that do produce ResolutionTooDeep.

@notatallshaw
Copy link
Member Author

Ready for review / merge once 24.3 is closed.

@notatallshaw
Copy link
Member Author

It turns out I accidentally broke depth by changing some resolvelib behavior: pdm-project/pdm#3235 (comment)

I am going to investigate further on this to see if it's causing the performance regression I see in astral-sh/uv#1398, if so I'm going to make a PR on resolvelib, or if removing depth doesn't make any difference on performance then it might be worth dropping as a preference like PDM has.

@notatallshaw
Copy link
Member Author

notatallshaw commented Nov 10, 2024

Okay, I've tried fixing depths and removing depths and I ran the scenarios in https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks to see what is the difference (the left hand side is fixing, the right hand side is removing):

uv run compare.py --github-repo-1 "notatallshaw/pip" --git-commit-1 3334e33066c30bef609d54b4d888030ad5dbef61 --github-repo-2 "notatallshaw/pip" --git-commit-2  03d9fd86ab28e9b61066d0e5b1395ddc842b29f6
Reading inline script metadata from `compare.py`
Installed 8 packages in 10ms
Difference for scenario scenarios/problematic.toml - autogluon:
    	Number of requirements processed: 197 -> 177
    	Number of packages processed: 261 -> 241

Difference for scenario scenarios/problematic.toml - TTS:
    	Success: False -> True.
    	Failure Reason: Resolution Too Deep -> None.
    	Number of requirements processed: 143 -> 156
    	Number of packages processed: 357 -> 222

Difference for scenario scenarios/problematic.toml - boto3-urllib3-transient:
    	Number of requirements processed: 429 -> 141
    	Number of packages processed: 811 -> 2675  (Both ended up with ResolutionTooDeep)

Difference for scenario scenarios/problematic.toml - apache-airflow-beam:
    	Number of packages processed: 312 -> 161

Difference for scenario scenarios/problematic.toml - kedro-test:
    	Number of requirements processed: 335 -> 337
    	Number of packages processed: 522 -> 565

Difference for scenario scenarios/problematic.toml - sphinx:
    	Number of requirements processed: 71 -> 79
    	Number of packages processed: 166 -> 199

Difference for scenario scenarios/problematic.toml - backtracks-to-old-scipy:
    	Number of requirements processed: 74 -> 86
    	Number of packages processed: 111 -> 129 (Both end in build failure)

Difference for scenario scenarios/problematic.toml - django-stubs:
    	Number of packages processed: 36 -> 25

Difference for scenario scenarios/big-packages.toml - apache-airflow-all:
    	Not the same install files.
    	Number of requirements processed: 601 -> 597
    	Number of packages processed: 733 -> 696

Difference for scenario scenarios/big-packages.toml - acryl-datahub-all:
    	Number of requirements processed: 347 -> 334
    	Number of packages processed: 711 -> 504

There are some interesting takeaways:

  1. Using depth as a preference causes the TTS Scenario to result in ResolutionTooDeep, vendoring resolvelib 1.1.0 causes this scenario to resolve because it breaks the depth preference
  2. Some scenarios resolve faster with depth removed as a preference, some are faster with depth, except for ResolutionTooDeep cases it seems removing is in general better
  3. Cases where both hit ResolutionTooDeep (i.e. boto3-urllib3-transient) the number of packages processed (and hence the wall clock time) can end up being much bigger with removing depths because it goes deep on one package (in this case botocore because it has thousands of releases)
  4. Of the resolutions which succeeded only the apache-airflow-all scenario produced a different installation result, and without going into too much details it's because apache airflow doesn't put lower bounds on their providers packages: Put reasonable lower bounds on providers in Airflow's extras apache/airflow#39100, making their resolution search space massive

My conclusions from this are:

  1. On the whole, removing depth seems to be a net positive, probably because resolvelib now backjumps and skips of unnecessary backtracking state, and it’s worth removing like PDM already has
  2. It may be worth reducing the maximum number of backtracking steps before hitting ResolutionTooDeep

@notatallshaw notatallshaw force-pushed the vendor-resolvelib-1.1.0 branch from d309e2c to 3a157bb Compare November 10, 2024 20:14
@notatallshaw notatallshaw marked this pull request as ready for review November 10, 2024 20:35
@notatallshaw
Copy link
Member Author

notatallshaw commented Nov 10, 2024

This is ready for review now.

I have removed depth as a preference, which unfortunately means removing the only unit tests that were unit testing the PipProvider (though it is tested greatly as part of functional tests), but I will add additional unit tests to get_preference, and friends, in a follow up PR (that will replace #13017) once this lands.

@ichard26 ichard26 requested a review from uranusjr December 9, 2024 04:02
@notatallshaw
Copy link
Member Author

notatallshaw commented Dec 14, 2024

I'm marking this back to draft while we discuss performance issues with resolvelib 1.1, see discussion here: sarugaku/resolvelib#180

At some point we might have to decide whether to vendor this version of resolvelib, which includes an important correctness fix, and some performance improvements, but some big performance regressions, or decide to wait for a resolvelib 1.2 which might be able to regain some of the performance back in very deep backtracking situations (e.g. boto3 / urllib3).

I would very much like input from other maintainers on that, both here and over on the resolvelib performance issue.

@ichard26 ichard26 mentioned this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment