Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncProgress hangs when grpc context closes #3406

Open
NickYadance opened this issue Aug 2, 2024 · 0 comments
Open

syncProgress hangs when grpc context closes #3406

NickYadance opened this issue Aug 2, 2024 · 0 comments
Labels

Comments

@NickYadance
Copy link

Bug report:

The download requests hangs occasionally with full cpu usage, due to syncProgress running into busy foo-loop when context closes.

pprof:

File: agent
Type: cpu
Time: Aug 1, 2024 at 11:53pm (CST)
Duration: 30.19s, Total samples = 59.73s (197.82%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 57010ms, 95.45% of 59730ms total
Dropped 122 nodes (cum <= 298.65ms)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
   20670ms 34.61% 34.61%    20670ms 34.61%  runtime.procyield
   12200ms 20.43% 55.03%    38720ms 64.83%  runtime.lock2
    6130ms 10.26% 65.29%    56600ms 94.76%  runtime.selectgo
    4170ms  6.98% 72.28%     6550ms 10.97%  runtime.unlock2
    3580ms  5.99% 78.27%     3580ms  5.99%  runtime.osyield
    3460ms  5.79% 84.06%     3460ms  5.79%  runtime.futex
    2590ms  4.34% 88.40%    41330ms 69.19%  runtime.sellock
    2460ms  4.12% 92.52%    59590ms 99.77%  d7y.io/dragonfly/v2/client/daemon/peer.(*fileTask).syncProgress
    1100ms  1.84% 94.36%     7670ms 12.84%  runtime.selunlock
     650ms  1.09% 95.45%      650ms  1.09%  runtime.cheaprand (inline)
(pprof) list syncProgress
Total: 59.73s
ROUTINE ======================== d7y.io/dragonfly/v2/client/daemon/peer.(*fileTask).syncProgress in /Users/root/go/pkg/mod/git.garena.com/shopee/search_recommend/engine/data-deliver/third-party/dragonfly2/[email protected]/client/daemon/peer/peertask_file.go
     2.46s     59.59s (flat, cum) 99.77% of Total
         .          .    123:func (f *fileTask) syncProgress() {
         .          .    124:   defer f.span.End()
         .          .    125:   for {
     170ms     56.78s    126:           select {
     120ms      120ms    127:           case <-f.peerTaskConductor.successCh:
         .          .    128:                   f.storeToOutput()
         .          .    129:                   return
      40ms       40ms    130:           case <-f.peerTaskConductor.failCh:
         .          .    131:                   f.span.RecordError(fmt.Errorf(f.peerTaskConductor.failedReason))
         .          .    132:                   f.sendFailProgress(f.peerTaskConductor.failedCode, f.peerTaskConductor.failedReason)
         .          .    133:                   return
     1.98s      2.48s    134:           case <-f.ctx.Done():
     150ms      170ms    135:           case piece := <-f.pieceCh:
         .          .    136:                   if piece.Finished {
         .          .    137:                           continue
         .          .    138:                   }
         .          .    139:                   pg := &FileTaskProgress{
         .          .    140:                           State: &ProgressState{

func (f *fileTask) syncProgress() {
defer f.span.End()
for {
select {
case <-f.peerTaskConductor.successCh:
f.storeToOutput()
return
case <-f.peerTaskConductor.failCh:
f.span.RecordError(fmt.Errorf(f.peerTaskConductor.failedReason))
f.sendFailProgress(f.peerTaskConductor.failedCode, f.peerTaskConductor.failedReason)
return
case <-f.ctx.Done():
case piece := <-f.pieceCh:
if piece.Finished {
continue
}
pg := &FileTaskProgress{
State: &ProgressState{
Success: true,
Code: commonv1.Code_Success,
Msg: "downloading",
},
TaskID: f.peerTaskConductor.GetTaskID(),
PeerID: f.peerTaskConductor.GetPeerID(),
ContentLength: f.peerTaskConductor.GetContentLength(),
CompletedLength: f.peerTaskConductor.completedLength.Load(),
PeerTaskDone: false,
}
select {
case <-f.progressStopCh:
case f.progressCh <- pg:
f.Debugf("progress sent, %d/%d", pg.CompletedLength, pg.ContentLength)
case <-f.ctx.Done():
f.Warnf("send progress failed, file task context done due to %s", f.ctx.Err())
return
}
}
}
}

Expected behavior:

The syncProgress should return when grpc sctx closes.

How to reproduce it:

Environment:

  • Dragonfly version: v2.1.0-4349e27
  • OS: ubuntu
  • Kernel (e.g. uname -a):
  • Others:
@NickYadance NickYadance added the bug label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant