-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[L0 v2] make device allocation resident and support multi-device buffers #2166
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this stack of commits has become unreviewable.
// TODO: change to use | ||
// https://spec.oneapi.io/level-zero/latest/core/api.html#ze-device-ip-version-ext-t | ||
// when that is stable. | ||
bool isPVC(ur_device_handle_t hDevice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to some commons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -51,55 +51,81 @@ void *ur_host_mem_handle_t::getPtr(ur_device_handle_t hDevice) { | |||
return ptr; | |||
} | |||
|
|||
ur_result_t ur_device_mem_handle_t::migrateBufferTo(ur_device_handle_t hDevice, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be doing the same thing as the current adapter. The memory copy should be added to the command list as a dependency for the related enqueue operation.
This whole synchronous migration when setting the parameter seems like a temporary hack that became permanent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be the case that just leaving the memory on the device, without migrating between devices is a better option - we should run some benchmarks first. If we decide that we want to migrate the memory, then yes, we should make this asynchronous (and also modify getPtr() function which is used by enqueueKernel, etc.). For now, this function in only used for initialization for copying the data from host to the device - perhaps I should rename the function to avoid confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, ok, I see. No, the name is fine, I was just fixated on that scenario.
I think the best approach here would be to create the memory handle in a 'pending' state, to be populated with data asynchronously on first use. My only concern would be concurrency (i.e., what happens when two threads operate on such 'pending' operation at the same time. we'd have to come up with some way to synchronize...).
For now leave it as-is, maybe create a task so that we don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For initialization, we need to do it synchronously I think because the host pointer might not be valid after urMemBufferCreate. We could use some buffers in DRAM (as I did initially) but again, we would probably need to benchmark and see if it makes sense.
For any migration that happens after that, if we just pass all the required dependencies to enqueueMemcpy and either return a signal event or (for in-inorder) just execute that enqueueMemcpy on the commandList that's used for the next operation (kernelLaunch, etc.) we don't need any extra synchronization I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I guess we could make the logic to decide whether to migrate or not a bit more intelligent that just yes or no - perhaps we could keep some statistics to see how often the memory is accessed from which device and then migrate (or not) based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, makes sense. let's just do the simple thing and do the copy synchronously. trying to do a copy in dram could make sense for small buffers, but will break down for large things.
The device to device migration path I think may be a bit trickier, because it may involve multiple command lists. We need to start with some tests I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we definitely need more tests - I created a task for that.
6c997fc
to
3d7b66f
Compare
I'll rebase once #2016 is merged |
f61b966
to
ee1bac4
Compare
In case of UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC the underlying native error was not being translated to UR correctly (it was returned as is). Fix this by adding native error handlers and implement one for LEVEL_ZERO provider
on all devices in the context to match the default behavior of legacy adapter.
ee1bac4
to
9d44d0e
Compare
9d44d0e
to
0678778
Compare
Done. |
283d65f
to
1263037
Compare
and make failures for urMemImage optional. There is no image support on PVC, and those tests won't be run.
1263037
to
f369220
Compare
No description provided.