Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python/DSVW repro #17635

Draft
wants to merge 50 commits into
base: main
Choose a base branch
from
Draft

Python/DSVW repro #17635

wants to merge 50 commits into from

Commits on Jun 25, 2024

  1. python: Start modelling using MaD

    - empty models for now
    - `summaryModel` of `codeql/python-all` will be added to shortly.
    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    df406b4 View commit details
    Browse the repository at this point in the history
  2. python: add modelling for urlib.parse

    - `quote` together with `re.compile` recover regex injection alerts on haiwen/seahub
    - `quote_plus` recovers the URL redirection alert on DemocracyClub/EveryElection
    - `unquote` recovers path injection alerts on `cloudera/hue`
    - it was tedious finding justifications for the rest..
    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    281ac05 View commit details
    Browse the repository at this point in the history
  3. python: move model to Stdlib.yml

    There is already a model there so we add to that one.
    
    We did observe that this existing model was blocked by the external MaD model.
    This is concerning and needs to be cleared up.
    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    c004ffa View commit details
    Browse the repository at this point in the history
  4. python: compress models

    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    d410136 View commit details
    Browse the repository at this point in the history
  5. Python: move models

    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    1e97600 View commit details
    Browse the repository at this point in the history
  6. python: undo changes to qlpack

    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    b80a711 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2118f23 View commit details
    Browse the repository at this point in the history
  8. Python: model fnmatch.filter

    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    501cda4 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    bc55117 View commit details
    Browse the repository at this point in the history
  10. Python: MaD summary models

    Two of the generated summaries have been excluded:
     - ["re", "Member[split]", "Argument[0,pattern:]", "ReturnValue", "taint"]
       From the documentation, it is not clear why pattern should figure in the return value, as that is the part denoting split point and thus all those instances are filtered out.
       From the implementation
         Spit function: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L199
         _compile function being called by split: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280
       We see that in case the pattern is already a compiled `Pattern`, it is returned directly from _compile and could thus be part of the return value from split. This is probably not possible to arrange for an attacker, and so an FP in practice.
    
     - ["urllib2", "Member[unquote]", "Argument[0,string:]", "ReturnValue", "taint"]
       urllib2 seems to be only in Python2 (e.g. https://docs.python.org/2.7/library/urllib2.html) and I cannot locate the function unquote.
    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    bdc4808 View commit details
    Browse the repository at this point in the history
  11. Python: codecs.open

    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    eb32cbe View commit details
    Browse the repository at this point in the history
  12. Python: model more loggers

    yoff committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    571be8b View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2024

  1. Python: fix compilation

    yoff committed Jun 26, 2024
    Configuration menu
    Copy the full SHA
    b261145 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a3076f4 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. Apply suggestions from code review

    Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
    yoff and RasmusWL authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    bbc3ff2 View commit details
    Browse the repository at this point in the history
  2. Python: remove strange sink

    It is not clear from the code how this could happen and
    I do not remember the path I saw, perhaps it was unreasonable.
    yoff committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    59f9532 View commit details
    Browse the repository at this point in the history
  3. Python: Add value steps for sequence elements

    It would be nice to simplify to a single sequence content type..
    yoff committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    5ddfe75 View commit details
    Browse the repository at this point in the history
  4. Python: add tests for loggers

    yoff committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    77a0087 View commit details
    Browse the repository at this point in the history
  5. Python: adjust test expectations

    MaD row numbers in provenance column
    yoff committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    e40ae2e View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2024

  1. Configuration menu
    Copy the full SHA
    e30f725 View commit details
    Browse the repository at this point in the history
  2. Python: update test expectations

    This is MaD...
    yoff committed Jul 22, 2024
    Configuration menu
    Copy the full SHA
    3434c38 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2024

  1. Python: add change note

    yoff committed Sep 24, 2024
    Configuration menu
    Copy the full SHA
    f95926e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e7f9b5b View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2024

  1. python: capture flow through comprehensions

    - add comprehension functions as `DataFlowCallable`s
    - add comprehension call as `DataFlowCall`
    - create capture argument node for comprehension calls
    yoff committed Sep 25, 2024
    Configuration menu
    Copy the full SHA
    fc2dc28 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2024

  1. Python: use comprehension function argument

    For a comprehension `[x for x in l]
    - `l` is now a legal argument (in DataFlowPublic)
    - `l` is the argument of the comprehension function (in DataFlowDispatch)
    - the parameter of the comprehension function is being read rather than `l` (in IterableUnpacking)
    Thus the read that used to cross callable boundaries is now split into a arg-param edge and a read from that param.
    yoff committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    294092b View commit details
    Browse the repository at this point in the history
  2. Python: use synthetic node for comprehension capture argument

    We used to use the CfgNode for the comprehension itself.
    In cases where that is also an argument, say
    ```python
    ",".join([x for x in l])
    ```
    that would be an argument to two different calls causing a dataflow consistency violation.
    yoff committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    72530a8 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2024

  1. Python: flow through yield

    - add yield as a dataflow return
    - replace comprehension store step
       with a store step to the yield
    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    d4ea62e View commit details
    Browse the repository at this point in the history
  2. Python: fix dataflow inconsistencies

    - adjust scope of argument, the argument is outside the called function
    - add missing post-update nodes for the new arguments
    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    310819d View commit details
    Browse the repository at this point in the history
  3. Python: add location to node

    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    3ef05a6 View commit details
    Browse the repository at this point in the history
  4. Python: update test expectations

    We now have a new callable, yielding new enclosing callables
    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    f9f46f0 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ded3974 View commit details
    Browse the repository at this point in the history
  6. Python: adjust test expectations

    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    fb07a56 View commit details
    Browse the repository at this point in the history
  7. Python: use yield step also for taint

    Using the comprehension store step meant that all comprehensions would receive taint.
    This because comprehension flow now goes via a callable, meaning they share the return node.
    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    7392d18 View commit details
    Browse the repository at this point in the history
  8. Python: use known sanitiser

    - also adjust test expectations in experimental
    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    a22ea6c View commit details
    Browse the repository at this point in the history
  9. Python: add missing qldoc

    More doc is needed, but this should turn the tests green
    yoff committed Sep 30, 2024
    Configuration menu
    Copy the full SHA
    438e664 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    dacc0ab View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2024

  1. Python: add change note

    yoff committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    e0a3c8a View commit details
    Browse the repository at this point in the history
  2. Update python/ql/lib/change-notes/2024-09-24-std-lib-models.md

    Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
    yoff and RasmusWL authored Oct 1, 2024
    Configuration menu
    Copy the full SHA
    2eac11e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2b6aab1 View commit details
    Browse the repository at this point in the history
  4. Python: valid change note

    yoff committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    64890a1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7816f34 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    cef8744 View commit details
    Browse the repository at this point in the history
  7. Python: MaD expectations

    yoff committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    05910de View commit details
    Browse the repository at this point in the history
  8. Python: use imprecise content in cp

    We had accidentally used precise content leadingto blowup
    yoff committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    f39dc41 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    38b1eb7 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    02d4da2 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    4040195 View commit details
    Browse the repository at this point in the history
  12. Merge branch 'stdlib-optparse' of https://github.com/yoff/codeql into…

    … python/DSVW-repro
    yoff committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    495bf71 View commit details
    Browse the repository at this point in the history
  13. Python: DSVW repro test

    yoff committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    dff02cf View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2024

  1. Python: missing steps for repro

    - API graph subscript operator to understand comprehensions
    - captureJumpStep to not require definig value to exist
    - stdlib modelling: finditer returns list of match objects
      - adjust taint output of finditer
      - adjust `ReMatchMethodsSummary.getACall`
    yoff committed Oct 2, 2024
    Configuration menu
    Copy the full SHA
    6df1f5a View commit details
    Browse the repository at this point in the history