Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17.11 VS hang in EndBuild #10709

Closed
JanKrivanek opened this issue Sep 26, 2024 · 3 comments
Closed

17.11 VS hang in EndBuild #10709

JanKrivanek opened this issue Sep 26, 2024 · 3 comments
Assignees
Labels
Cost:M Work that requires one engineer up to 2 weeks performance Priority:1 Work that is critical for the release, but we could probably ship without triaged

Comments

@JanKrivanek
Copy link
Member

JanKrivanek commented Sep 26, 2024

Context

VS hangs during solution close - turins out to be caused by hang in BuildManager.EndBuild
There are almost 3k cases hit in wild on 17.11

Analysis

Cab: https://watsonportal.microsoft.com/CabAnalysis?CabIdentifier=https://eaus2watcab01.blob.core.windows.net/global-202409/f1d6f9d9-57c1-47af-91c9-1973b6d19519.zip
Part of: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2254960

VS hanging due to BuildManager.EndBuild stuck waiting on the _noActiveSubmissionsEvent or _noNodesActiveEvent (not possible to distinguish due to release build optimizations).

image

Based on stacks the CancelAllSubmissionsAsync was called.
At the same time the InProcNode is still running:

image

Though based on the code it seems like we do not want to kill InProcNode in VS:

/// <summary>
/// Shutdown the inprocess node when the build finishes. By default this is false
/// since visual studio needs to keep the inprocess node around after the build finishes.
/// </summary>
public bool ShutdownInProcNodeOnBuildFinish
{
get => _shutdownInProcNodeOnBuildFinish;
set => _shutdownInProcNodeOnBuildFinish = value;
}

However it would mean that the node count (_activeNodes.Count) wouldn't reach zero and we wouldn't be able to signal the _noNodesActiveEvent:

private void CheckForActiveNodesAndCleanUpSubmissions()
{
Debug.Assert(Monitor.IsEntered(_syncLock));
if (_activeNodes.Count == 0)
{
var submissions = new List<BuildSubmissionBase>(_buildSubmissions.Values);
foreach (BuildSubmissionBase submission in submissions)
{
// The submission has not started do not add it to the results cache
if (!submission.IsStarted)
{
continue;
}
if (!CompleteSubmissionFromCache(submission))
{
submission.CompleteResultsWithException(new BuildAbortedException());
}
// If we never received a project started event, consider logging complete anyhow, since the nodes have
// shut down.
submission.CompleteLogging();
CheckSubmissionCompletenessAndRemove(submission);
}
_noNodesActiveEvent?.Set();
}
}

This feels unlikely - as it would hang always when we'd have InProcNode in VS.

Next steps in investigation

Debug through solution close in VS and find out if CheckForActiveNodesAndCleanUpSubmissions ever signals

@JanKrivanek JanKrivanek self-assigned this Sep 26, 2024
@JanKrivanek JanKrivanek added the Cost:M Work that requires one engineer up to 2 weeks label Sep 29, 2024
@JanKrivanek
Copy link
Member Author

On internal user hitting this reported the issue stopped occuring after enabling long paths support on his system

@AR-May AR-May added Priority:2 Work that is important, but not critical for the release triaged Priority:1 Work that is critical for the release, but we could probably ship without and removed Priority:2 Work that is important, but not critical for the release labels Oct 1, 2024
@JanKrivanek
Copy link
Member Author

I got an internal customer dump - the blocking event is for sure _noActiveSubmissionsEvent
deeper analysis pending...

@JanKrivanek
Copy link
Member Author

JanKrivanek commented Oct 2, 2024

It turns out there is a race between CancelAllSubmissions and EndBuild - that are seemingly synchronous, but CancelAllSubmissions enqueues bit of logic for async processing.

When both methods called after each other - which is expected during solution close - they can race and deadlock

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cost:M Work that requires one engineer up to 2 weeks performance Priority:1 Work that is critical for the release, but we could probably ship without triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants