Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Several standard library models #17454

Merged
merged 28 commits into from
Oct 3, 2024
Merged

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Sep 13, 2024

This PR is merging the long-lived branch containing the modelling work to stop extracting the standard library by default.

Pull Request checklist

All query authors

Internal query authors only

  • Autofixes generated based on these changes are valid, only needed if this PR makes significant changes to .ql, .qll, or .qhelp files. See the documentation (internal access required).
  • Changes are validated at scale (internal access required).

yoff and others added 21 commits June 25, 2024 14:13
- empty models for now
- `summaryModel` of `codeql/python-all` will be added to shortly.
- `quote` together with `re.compile` recover regex injection alerts on haiwen/seahub
- `quote_plus` recovers the URL redirection alert on DemocracyClub/EveryElection
- `unquote` recovers path injection alerts on `cloudera/hue`
- it was tedious finding justifications for the rest..
There is already a model there so we add to that one.

We did observe that this existing model was blocked by the external MaD model.
This is concerning and needs to be cleared up.
Two of the generated summaries have been excluded:
 - ["re", "Member[split]", "Argument[0,pattern:]", "ReturnValue", "taint"]
   From the documentation, it is not clear why pattern should figure in the return value, as that is the part denoting split point and thus all those instances are filtered out.
   From the implementation
     Spit function: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L199
     _compile function being called by split: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280
   We see that in case the pattern is already a compiled `Pattern`, it is returned directly from _compile and could thus be part of the return value from split. This is probably not possible to arrange for an attacker, and so an FP in practice.

 - ["urllib2", "Member[unquote]", "Argument[0,string:]", "ReturnValue", "taint"]
   urllib2 seems to be only in Python2 (e.g. https://docs.python.org/2.7/library/urllib2.html) and I cannot locate the function unquote.
Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
It is not clear from the code how this could happen and
I do not remember the path I saw, perhaps it was unreasonable.
It would be nice to simplify to a single sequence content type..
MaD row numbers in provenance column
@yoff yoff changed the title Python: Stop extracting the standard library by default Python: Several standard library models Sep 24, 2024
@yoff yoff marked this pull request as ready for review September 24, 2024 18:16
@yoff yoff requested a review from a team as a code owner September 24, 2024 18:16
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I already approved all these changes as part of other chains of PRs, right?

In that case, I only have one nitpick: With this PR we have two files for modeling stdlib with MaD (https://github.com/github/codeql/blob/main/python/ql/lib/semmle/python/frameworks/Stdlib.model.yml) so we should consolidate to only using one location. Personally I feel the obvious place to put modeling of a new library would be ql/lib/semmle/python/frameworks/<package>.model.yml (such as the one for Asyncpg). So unless you strongly object, would you mind moving the new models to python/ql/lib/semmle/python/frameworks/Stdlib.model.yml? 🙏

python/ql/lib/change-notes/2024-09-24-std-lib-models.md Outdated Show resolved Hide resolved
@yoff
Copy link
Contributor Author

yoff commented Oct 1, 2024

With this PR we have two files for modeling stdlib with MaD

Nice. I had totally missed that we had started using MaD for StdLib. In that case I will put the new models there too. I will also adjust #17565.

tausbn
tausbn previously approved these changes Oct 1, 2024
@yoff yoff requested a review from RasmusWL October 1, 2024 13:51
@yoff yoff requested a review from RasmusWL October 3, 2024 08:19
@yoff yoff merged commit 91f1cf1 into github:main Oct 3, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants