Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeQL version 2.18.2 doubles the amount of time spent compiling CodeQL databases #17489

Closed
flowerhack opened this issue Sep 16, 2024 · 13 comments
Labels
awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. question Further information is requested Stale

Comments

@flowerhack
Copy link

Hi there,

I'm a committer for the Chromium project & we've been experimenting with building CodeQL databases of Chromium.

Recently, we upgraded the version of CodeQL we were using to compile our databases from 2.18.1 to 2.18.2, so we could include some fixes y'all have recently merged for bugs Chromium ran into (thank you!).

However, we noticed a huge spike in the time it takes to build a CodeQL database after making this switch—it increased the build time by ~2x (and in particular some of our builders started timing out).

We can increase builder timeouts if need be, but we'd like to understand if there's some more fundamental problem here (and if so, is it specific to Chromium or has it been seen on other projects, etc), or if that sharp of an increase in build time is considered a bug.

If it's useful for your analysis, you can view our builder here: https://ci.chromium.org/ui/p/chromium/builders/ci/linux-codeql-generator. The purple failing builds started occuring around 2331, which is when we made the version switch, and started working again at 2337 when we switched the version back.

You'll notice, if you expand out the steps, that the time on the index_target.py step is what is particularly taking longer. That step is primarily a wrapper around the commands to compile the CodeQL database; you can view the script here in case that's useful.

I see that 2.18.3 is now available; I may try to update with that version number, in case this was a known bug that was fixed in the very next version, but I'm reporting this now on the assumption that that's not the case.

Thanks for any help you can offer!

@flowerhack flowerhack added the question Further information is requested label Sep 16, 2024
@jketema
Copy link
Contributor

jketema commented Sep 16, 2024

Hi,

Thanks for the above. It's not immediately clear to me what is taking more time, as it's not obvious for me where I can find any of the console output. Is it the tracing step or the finalization step that is taking more time, or both?

Note that with both 2.18.2 and 2.18.3 you're likely running into this issue: #17332. We fixed one possible cause of this (the fix for which will land in 2.19.0; to be released either this week or next), but we have not yet been able to confirm that the fix works for Chromium.

@flowerhack
Copy link
Author

There isn't any output available for the purple builds, unfortunately, since they timed out before they had a chance to upload their logs.

However, we did have one build run to completion while running CodeQL version 2.18.2, which I think will be helpful for debugging. You can view those logs by going to https://ci.chromium.org/ui/p/chromium/builders/ci/linux-codeql-generator/2332/overview, unchecking "Hide Succeeded Steps", and click "CodeQL databases with logs" next to step 19, "gsutil upload (2)". (You will need some kind of Google login to download, but any Google account should do, e.g. a gmail or similar. Let me know if you have any issues downloading the logs and we can come up with something else.)

At a cursory glance, the logs do indicate that the "TRAP import" step (which I believe is the lion's share of the finalization step?) takes ~30min longer on 2.18.2 than 2.18.1, which is significant, but I don't think explains most of the difference, since an extra 30min wouldn't lead to a doubling of the total time.

@jketema
Copy link
Contributor

jketema commented Sep 17, 2024

Thanks for the link.

At a cursory glance, the logs do indicate that the "TRAP import" step (which I believe is the lion's share of the finalization step?)

That's correct.

takes ~30min longer on 2.18.2 than 2.18.1, which is significant, but I don't think explains most of the difference, since an extra 30min wouldn't lead to a doubling of the total time.

Agreed.

In general we expect a traced build to be 4x~5x slower than a regular build. From what you write it seems to be more than that, is that correct? One of the main reasons this might happen is memory pressure. The extractor can consume quite a lot of memory in certain cases (which is not something we can easily fix), which can lead to swapping. I'm not sure how much insight you have in memory usage on your build servers, but this is probably the first thing we should check. If it is a memory pressure issue, then the general advice is to either (a) reduce the parallelism of the build, or (b) switch to a machine with more memory.

@smowton
Copy link
Contributor

smowton commented Sep 17, 2024

Had a prod at this out of curiosity -- it appears the actual databases have roughly doubled in size:

Build 2324 has

Running command in /b/s/w/ir/x/w/src: [/b/s/w/ir/x/w/src/third_party/ninja/ninja, -C, /b/s/w/ir/x/w/src/out/release, d8] ...
Finalizing codeql db.
Running TRAP import for CodeQL database at /b/s/w/ir/x/w/rc/codeql_dbsw08p5wmb/d8...
Grouping TRAP files by link target
Scanning TRAP files
Assembling TRAP files
Importing TRAP files
Merging relations
Finished writing database (relations: 2.03 GiB; string pool: 304.03 MiB).

Build 2332 has

Running command in /b/s/w/ir/x/w/src: [/b/s/w/ir/x/w/src/third_party/ninja/ninja, -C, /b/s/w/ir/x/w/src/out/release, d8] ...
Finalizing codeql db.
Running pre-finalize script /b/s/w/ir/x/w/codeql/cpp/tools/pre-finalize.sh in /b/s/w/ir/x/w/src.
Running command in /b/s/w/ir/x/w/src: [/b/s/w/ir/x/w/codeql/cpp/tools/pre-finalize.sh]
Running TRAP import for CodeQL database at /b/s/w/ir/x/w/rc/codeql_dbsho8v9azo/d8...
Grouping TRAP files by link target
Grouping unlinked TRAP files together
Scanning TRAP files
Importing TRAP files
Merging relations
Finished writing database (relations: 5.77 GiB; string pool: 304.03 MiB).

I note the surprising new appearance of the pre-finalize step, though this doesn't seem to have been changed recently and for C/C++ only exists for TRAP-caching purposes, which I believe is still beta and off by default? But crucially the db relations have ballooned from 2.03GiB to 5.77 GiB.

@jketema
Copy link
Contributor

jketema commented Sep 17, 2024

But crucially the db relations have ballooned from 2.03GiB to 5.77 GiB.

This is within the range of what I expect given the extractor issues that were fixed, which had quite some impact (especially since some were outright extractor crashes). However, it is potentially larger than it should be due to the issue underlying #17332.

@flowerhack
Copy link
Author

In general we expect a traced build to be 4x~5x slower than a regular build. From what you write it seems to be more than that, is that correct?

We have about a 2x increase in build time between CodeQL 2.18.1 vs 2.18.2. However, that was just the version change. I do not believe we switched from a "regular" to a traced build with the version change. (To make sure I understand correctly: by "traced build", you mean using codeql database trace-command, correct? That command is what we have always used.)

the general advice is to either (a) reduce the parallelism of the build, or (b) switch to a machine with more memory.

The machine this job was run on is 32 cores with 256GB of memory. I can look into whether machines with more resources are available, but I'm curious how this compares to (1) machines used by projects of a similar size that you've seen (if any), and (2) if the build time is still surprising given those specs (or if that feels about right)?

@jketema
Copy link
Contributor

jketema commented Sep 17, 2024

In general we expect a traced build to be 4x~5x slower than a regular build. From what you write it seems to be more than that, is that correct?

We have about a 2x increase in build time between CodeQL 2.18.1 vs 2.18.2. However, that was just the version change. I do not believe we switched from a "regular" to a traced build with the version change. (To make sure I understand correctly: by "traced build", you mean using codeql database trace-command, correct? That command is what we have always used.)

That's indeed what I mean by a traced build. However, apologies for not being clear enough. What I'm interested in knowing is how the traced build time compares with the build time of your normal builds (using comparable machines). I would like to establish if the time of the traced build even with the newer version of CodeQL still within the bounds of what we think is acceptable.

the general advice is to either (a) reduce the parallelism of the build, or (b) switch to a machine with more memory.

The machine this job was run on is 32 cores with 256GB of memory. I can look into whether machines with more resources are available, but I'm curious how this compares to (1) machines used by projects of a similar size that you've seen (if any), and (2) if the build time is still surprising given those specs (or if that feels about right)?

Before switching machines, it would really make sense to try to understand whether this is a memory pressure issue or whether there's something else going on. If the latter, switching machines is likely not going to help. To answer your questions:

  1. There are really no comparable projects except for ones that embed Chromium in one way or another, and I do not have number for those. Generally what is done on codebases of comparable size is to create a separate database per executable, but that's not something that really seems to apply here.
  2. This is what I'm trying to establish above, but for that I need to know how long a normal build takes on a machine with similar specs.

@flowerhack
Copy link
Author

(Sorry for the delayed reply here; was AFK for a bit.)

Ah, I understand your quesiton now.

re: build time comparison: An untraced build of Chromium takes ~5-6hours on one of our machines. A traced build using CodeQL version 2.18.1 takes 7-8hrs. The traced build using CodeQL version 2.18.2 take 14-15hrs.

It sounds like, based on what you're saying, anything up to ~25hrs for a traced Chromium build would be in the acceptable bounds for CodeQL; is that correct? (Is there more information I can provide re: pinning down whether this is memory pressure or something else?)

@jketema
Copy link
Contributor

jketema commented Sep 26, 2024

re: build time comparison: An untraced build of Chromium takes ~5-6hours on one of our machines. A traced build using CodeQL version 2.18.1 takes 7-8hrs. The traced build using CodeQL version 2.18.2 take 14-15hrs.

It sounds like, based on what you're saying, anything up to ~25hrs for a traced Chromium build would be in the acceptable bounds for CodeQL; is that correct?

Yes, that is correct, and since we did fix a number of fatal errors that affected a lot of files, the jump in time the traced build takes makes sense. The extracted database should be much more complete now.

(Is there more information I can provide re: pinning down whether this is memory pressure or something else?)

I think we're ok for now, since the traced build time is not excessive (although admittedly it takes a long time on a project this size). It would be good to keep an eye on this and let us know if you see another large jump the traced build time.

@jketema
Copy link
Contributor

jketema commented Sep 26, 2024

But crucially the db relations have ballooned from 2.03GiB to 5.77 GiB.

This is within the range of what I expect given the extractor issues that were fixed, which had quite some impact (especially since some were outright extractor crashes). However, it is potentially larger than it should be due to the issue underlying #17332.

We expect the size will have gone down again with CodeQL 2.19.0.

@rvermeulen rvermeulen added the awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. label Oct 16, 2024
@rvermeulen
Copy link
Contributor

Hi @flowerhack,

Thanks again for your report.
Were you able to test the latest 2.19 version as suggested by @jketema?

Thanks!

Copy link
Contributor

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 30, 2024
Copy link
Contributor

github-actions bot commented Nov 7, 2024

This issue was closed because it has been inactive for 7 days.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

4 participants