Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0 v2] make device allocation resident and support multi-device buffers #2166

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

igchor
Copy link
Member

@igchor igchor commented Oct 2, 2024

No description provided.

@github-actions github-actions bot added ci/cd Continuous integration/devliery common Changes or additions to common utilities conformance Conformance test suite issues. level-zero L0 adapter specific issues labels Oct 2, 2024
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stack of commits has become unreviewable.

// TODO: change to use
// https://spec.oneapi.io/level-zero/latest/core/api.html#ze-device-ip-version-ext-t
// when that is stable.
bool isPVC(ur_device_handle_t hDevice) {
Copy link
Contributor

@pbalcer pbalcer Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to some commons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -51,55 +51,81 @@ void *ur_host_mem_handle_t::getPtr(ur_device_handle_t hDevice) {
return ptr;
}

ur_result_t ur_device_mem_handle_t::migrateBufferTo(ur_device_handle_t hDevice,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be doing the same thing as the current adapter. The memory copy should be added to the command list as a dependency for the related enqueue operation.

This whole synchronous migration when setting the parameter seems like a temporary hack that became permanent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be the case that just leaving the memory on the device, without migrating between devices is a better option - we should run some benchmarks first. If we decide that we want to migrate the memory, then yes, we should make this asynchronous (and also modify getPtr() function which is used by enqueueKernel, etc.). For now, this function in only used for initialization for copying the data from host to the device - perhaps I should rename the function to avoid confusion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ok, I see. No, the name is fine, I was just fixated on that scenario.

I think the best approach here would be to create the memory handle in a 'pending' state, to be populated with data asynchronously on first use. My only concern would be concurrency (i.e., what happens when two threads operate on such 'pending' operation at the same time. we'd have to come up with some way to synchronize...).
For now leave it as-is, maybe create a task so that we don't forget.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For initialization, we need to do it synchronously I think because the host pointer might not be valid after urMemBufferCreate. We could use some buffers in DRAM (as I did initially) but again, we would probably need to benchmark and see if it makes sense.

For any migration that happens after that, if we just pass all the required dependencies to enqueueMemcpy and either return a signal event or (for in-inorder) just execute that enqueueMemcpy on the commandList that's used for the next operation (kernelLaunch, etc.) we don't need any extra synchronization I think.

Copy link
Member Author

@igchor igchor Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I guess we could make the logic to decide whether to migrate or not a bit more intelligent that just yes or no - perhaps we could keep some statistics to see how often the memory is accessed from which device and then migrate (or not) based on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, makes sense. let's just do the simple thing and do the copy synchronously. trying to do a copy in dram could make sense for small buffers, but will break down for large things.

The device to device migration path I think may be a bit trickier, because it may involve multiple command lists. We need to start with some tests I think.

Copy link
Member Author

@igchor igchor Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely need more tests - I created a task for that.

@igchor
Copy link
Member Author

igchor commented Oct 3, 2024

this stack of commits has become unreviewable.

I'll rebase once #2016 is merged

@igchor igchor force-pushed the memory_migration branch 2 times, most recently from f61b966 to ee1bac4 Compare October 4, 2024 00:00
In case of UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC
the underlying native error was not being translated
to UR correctly (it was returned as is).

Fix this by adding native error handlers and implement one
for LEVEL_ZERO provider
on all devices in the context to match the default
behavior of legacy adapter.
@igchor igchor marked this pull request as ready for review October 4, 2024 19:38
@igchor igchor requested review from a team as code owners October 4, 2024 19:38
@igchor
Copy link
Member Author

igchor commented Oct 4, 2024

this stack of commits has become unreviewable.

I'll rebase once #2016 is merged

Done.

@igchor igchor force-pushed the memory_migration branch 2 times, most recently from 283d65f to 1263037 Compare October 7, 2024 21:57
and make failures for urMemImage optional.

There is no image support on PVC, and those tests
won't be run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/cd Continuous integration/devliery common Changes or additions to common utilities conformance Conformance test suite issues. level-zero L0 adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants