Skip to content

Commit

Permalink
[xla:cpu] Optimize ThunkExecutor::Execute part #1
Browse files Browse the repository at this point in the history
name                                     old cpu/op   new cpu/op   delta
BM_SelectAndScatterF32/128/process_time   889µs ± 1%   740µs ± 3%  -16.70%
BM_SelectAndScatterF32/256/process_time  3.64ms ± 2%  3.00ms ± 1%  -17.64%
BM_SelectAndScatterF32/512/process_time  15.3ms ± 1%  13.1ms ± 3%  -14.61%

PiperOrigin-RevId: 658063846
  • Loading branch information
ezhulenev authored and copybara-github committed Jul 31, 2024
1 parent 2556f9f commit bc35df3
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions xla/service/cpu/runtime/thunk_executor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,12 @@ tsl::AsyncValueRef<ThunkExecutor::ExecuteEvent> ThunkExecutor::Execute(
Execute(state.get(), params, ReadyQueue(source_.begin(), source_.end()),
/*lock=*/params.session.Join());

// If execution already completed (all kernels executed in the caller thread),
// immediately return the result to avoid wasteful reference counting below.
if (ABSL_PREDICT_TRUE(state->execute_event.IsAvailable())) {
return std::move(state->execute_event);
}

// Move execute state to the execute event callback to ensure that it is kept
// alive while thunk executor has pending tasks.
auto execute_event = state->execute_event;
Expand Down

0 comments on commit bc35df3

Please sign in to comment.