Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve GpuJsonToStructs performance #11560

Open
ttnghia opened this issue Oct 4, 2024 · 0 comments
Open

[FEA] Improve GpuJsonToStructs performance #11560

ttnghia opened this issue Oct 4, 2024 · 0 comments
Assignees
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf epic Issue that encompasses a significant feature or body of work feature request New feature or request improve performance A performance related task/issue

Comments

@ttnghia
Copy link
Collaborator

ttnghia commented Oct 4, 2024

The performance of our current GpuJsonToStructs is not good. When running the profiling, it looks like this:

Image

In the particular test case for the profiling above, the only useful work is only what to the end of the read_json range (just above 300ms), which is less than 50% of the entire GpuJsonToStructs projection (>800ms). The rest are just overhead, but it consists mostly of hundreds of small kernel calls and stream syncs due to pure copying data from the intermediate result to the final output.

We can do a lot better by reducing the unnecessary overhead, or improving them by a way that they can run in a much less time. If we divide the runtime of GpuJsonToStructs into sections:

Image

The improvement can be done by the following tasks:

@ttnghia ttnghia added cudf_dependency An issue or PR with this label depends on a new feature in cudf epic Issue that encompasses a significant feature or body of work feature request New feature or request improve performance A performance related task/issue labels Oct 4, 2024
@ttnghia ttnghia self-assigned this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf_dependency An issue or PR with this label depends on a new feature in cudf epic Issue that encompasses a significant feature or body of work feature request New feature or request improve performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

1 participant