Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MSBuild crashes on parallel build using docker #43750

Open
pchalamet opened this issue Sep 27, 2024 · 6 comments
Open

[Bug]: MSBuild crashes on parallel build using docker #43750

pchalamet opened this issue Sep 27, 2024 · 6 comments
Labels
Area-NetSDK Bug untriaged Request triage from a team member

Comments

@pchalamet
Copy link

pchalamet commented Sep 27, 2024

Issue Description

Random crashes in MSBuild where trying to parallelize .net builds in Docker:

  • System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
  • System.ApplicationException: Object synchronization method was called from an unsynchronized block of code.

Steps to Reproduce

docker run --rm --net=host --name 8C91EC60706A76686DEE83F23CE80DD78D48E014DCCB1F7389F9F5EF9D9BFF09 -v /var/run/docker.sock:/var/run/docker.sock -v /Users/pierre/.terrabuild/home/containers:/root -v /Users/pierre/.terrabuild/home/tmp:/tmp -v /Users/pierre/src/MagnusOpera/terrabuild/terrabuild/src:/terrabuild -w /terrabuild/Terrabuild.Common --entrypoint dotnet -e DOTNET_CLI_TELEMETRY_OPTOUT -e DOTNET_NOLOGO -e DOTNET_SKIP_FIRST_TIME_EXPERIENCE mcr.microsoft.com/dotnet/sdk:8.0.302 build --no-dependencies --configuration Debug

9/27/2024 11:51:49 AM ERR System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
9/27/2024 11:51:49 AM ERR    at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
9/27/2024 11:51:49 AM ERR    at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
9/27/2024 11:51:49 AM ERR    at NuGet.Common.Migrations.MigrationRunner.Run(String migrationsDirectory)
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Cli.Program.Main(String[] args)
9/27/2024 11:51:49 AM ERR
9/27/2024 11:51:49 AM OUT
docker run --rm --net=host --name DAB5E60C96ACE37A01B06B64DFD9CD55E4ED14F2C614AA512BA51291FD95266E -v /var/run/docker.sock:/var/run/docker.sock -v /Users/pierre/.terrabuild/home/containers:/root -v /Users/pierre/.terrabuild/home/tmp:/tmp -v /Users/pierre/src/MagnusOpera/terrabuild/terrabuild/src:/terrabuild -w /terrabuild/Terrabuild.PubSub --entrypoint dotnet -e DOTNET_CLI_TELEMETRY_OPTOUT -e DOTNET_NOLOGO -e DOTNET_SKIP_FIRST_TIME_EXPERIENCE mcr.microsoft.com/dotnet/sdk:8.0.302 build --no-dependencies --configuration Debug

9/27/2024 11:51:49 AM ERR System.ApplicationException: Object synchronization method was called from an unsynchronized block of code.
9/27/2024 11:51:49 AM ERR    at System.Threading.Mutex.ReleaseMutex()
9/27/2024 11:51:49 AM ERR    at NuGet.Common.Migrations.MigrationRunner.Run(String migrationsDirectory)
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
9/27/2024 11:51:49 AM ERR    at Microsoft.DotNet.Cli.Program.Main(String[] args)
9/27/2024 11:51:49 AM ERR
9/27/2024 11:51:49 AM OUT

Expected Behavior

Crashes are random. I expect this to always work.

Actual Behavior

Exception thrown. See above.

Analysis

Call stacks are provided for analysis.

Versions & Configurations

I have this in my .bashrc (as they are passed to Docker):

export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
export DOTNET_NOLOGO=true
export DOTNET_CLI_TELEMETRY_OPTOUT=true

also running on Intel mac and Arm mac (both Sequoia 15 but was crashing with previous versions). This crashes the same for both machine at the same rate.

.net sdk version (8.0.302) is specified on the docker command line.

for sources (to reproduce the build), use this: https://github.com/MagnusOpera/Terrabuild/tree/44ce393db4e8ad891cf072389c7a2023096bc44f

@pchalamet pchalamet added the Bug label Sep 27, 2024
@baronfel
Copy link
Member

This is happening in the .NET SDK itself before MSBuild is ever invoked, so I will move it to the SDK repo.

@baronfel baronfel transferred this issue from dotnet/msbuild Sep 27, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-NetSDK untriaged Request triage from a team member labels Sep 27, 2024
@baronfel
Copy link
Member

cc @zivkan / @nkolev92 - the NuGet migrations that were added a while back are erroring inconsistently for this user. What are the runtime requirements of the migrations in terms of file permissions, etc?

@nkolev92
Copy link
Contributor

@jeffkl is our hotseat, but @kartheekp-ms was involved in the original migration code.

@jeffkl
Copy link
Contributor

jeffkl commented Sep 27, 2024

@pchalamet
Copy link
Author

pchalamet commented Sep 30, 2024

I've tried chmod'ing tmp folder as advised (777 also tested with sticky bit 1777) but this does not change anything.

It feels like it's a CreateMutex misbehavior in .net when used across several docker instance. when I starts all those docker instances, I mount /tmp to a global host directory (which is 1777 chmod'ed) - as well home folder (~). The goal is to amortize the initialization cost and allow all instances to hit the global NuGet cache.

But if synchronization is broken - at least for init - I guess it's also broken for concurrent cache access as well. I've tried to think about it and the way CreateMutex works. I've ended to pass --pid=host and --ipc=host to docker - which definitively makes sense when considering such primitive.

This led to a drastic amount of error on x64. On Arm I can still observe the error System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'

Do you have internal guidance at Microsoft how to allow shared memory for multiple .net-runtime docker instances ? Looks that the crux of the problem.

At least I see:

  • --pid=host
  • --ipc=host
  • mounting /tmp and ~ to dedicated shared volumes + chmod 1777 on shared volumes

Is there something else to make it work reliably ?


Docker parameters are now:

docker run --rm --net=host --name DAB5E60C96ACE37A01B06B64DFD9CD55E4ED14F2C614AA512BA51291FD95266E --pid=host --ipc=host -v /var/run/docker.sock:/var/run/docker.sock -v /Users/pct/.terrabuild/home/containers:/root -v /Users/pct/.terrabuild/home/tmp:/tmp -v /Users/pct/src/MagnusOpera/terrabuild/terrabuild/src:/terrabuild -w /terrabuild/Terrabuild.PubSub --entrypoint dotnet mcr.microsoft.com/dotnet/sdk:8.0.302 build --no-dependencies --configuration Debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-NetSDK Bug untriaged Request triage from a team member
Projects
None yet
Development

No branches or pull requests

5 participants