-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote IO: http support #464
base: branch-24.12
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++, not python yet
701f603
to
b886419
Compare
Co-authored-by: Lawrence Mitchell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, overall looking good to me
const std::size_t nbytes = size * nmemb; | ||
if (ctx->size < ctx->offset + nbytes) { | ||
ctx->overflow_error = true; | ||
return CURL_WRITEFUNC_ERROR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Nothing can be done, because it's in the curl API, but if nbytes == CURL_WRITEFUNC_ERROR
and ctx->size < ctx->offset + nbytes
then curl won't notice that we returned an error here, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nbytes == CURL_WRITEFUNC_ERROR
cannot happen, CURL_WRITEFUNC_ERROR
is defined as 0xFFFFFFFF
, which is greater than CURL_MAX_WRITE_SIZE
.
cdef extern from "<kvikio/remote_handle.hpp>" nogil: | ||
cdef cppclass cpp_RemoteEndpoint "kvikio::RemoteEndpoint": | ||
pass | ||
|
||
cdef cppclass cpp_HttpEndpoint "kvikio::HttpEndpoint": | ||
cpp_HttpEndpoint(string url) except + | ||
|
||
cdef cppclass cpp_RemoteHandle "kvikio::RemoteHandle": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdef extern from "<kvikio/remote_handle.hpp>" nogil: | |
cdef cppclass cpp_RemoteEndpoint "kvikio::RemoteEndpoint": | |
pass | |
cdef cppclass cpp_HttpEndpoint "kvikio::HttpEndpoint": | |
cpp_HttpEndpoint(string url) except + | |
cdef cppclass cpp_RemoteHandle "kvikio::RemoteHandle": | |
cdef extern from "<kvikio/remote_handle.hpp>" nogil namespace "kvikio": | |
cdef cppclass RemoteEndpoint: | |
pass | |
cdef cppclass HttpEndpoint: | |
HttpEndpoint(string url) except + | |
cdef cppclass RemoteHandle: |
?
Since RemoteEndPoint
/HttpEndpoint
and RemoteHandle
are not used by any python cdef class names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep mixing C++ and Python objects when reading Cython, I think always using cpp_
makes it a bit easier?
self.process = multiprocessing.Process( | ||
target=LocalHttpServer._server, | ||
args=(queue, str(self.root_path), self.range_support, self.max_lifetime), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Since we're starting a new process here to run the server, why do we also run the server in its own thread in _server
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle the max_lifetime
conda/recipes/libkvikio/meta.yaml
Outdated
@@ -83,6 +84,7 @@ outputs: | |||
{% else %} | |||
- libcufile-dev # [linux] | |||
{% endif %} | |||
- libcurl>=7.87.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an exact pinning in host:
(next to cuda-version
on line 77). Then it will use a run-export to get the compatible run pinning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
conda/recipes/libkvikio/meta.yaml
Outdated
@@ -52,6 +52,7 @@ requirements: | |||
{% else %} | |||
- libcufile-dev # [linux] | |||
{% endif %} | |||
- libcurl>=7.87.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an exact pinning here -- building against a specific (possibly older) version will allow users to have anything equal or newer at runtime. See other comment below about run pinnings.
The problem you want to avoid is that this host pinning could pull libcurl 7.999
at build time (anything >=7.87
), but your run pinning below is >=7.87.0
which might be incompatible. Pinning a specific version at build time and then relying on run-exports for a compatible runtime pinning is the cleanest solution here, and it's how we handle most dependencies like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, I understand meta.yaml
a little bit better now :)
Co-authored-by: Bradley Dice <[email protected]> Co-authored-by: Lawrence Mitchell <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
Support read directly from a http server like:
This PR is the first step to support S3 using libcurl instead of aws-s3-sdk, which has some pros and cons:
S3Context
in libcudf and cudf to handle shutdown correctly. This is not a problem in libcurl, see https://curl.se/libcurl/c/libcurl.html underGlobal constants
.AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
.