Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support streaming bodies in the client #15

Open
evert opened this issue Sep 15, 2014 · 24 comments
Open

Support streaming bodies in the client #15

evert opened this issue Sep 15, 2014 · 24 comments

Comments

@evert
Copy link
Member

evert commented Sep 15, 2014

Moved from here:

https://github.com/fruux/sabre-dav/issues/321

@Hywan
Copy link
Member

Hywan commented Jan 12, 2015

Hello :-),

The Sabre\HTTP\Client::parseCurlResult method computes an array containing the response index. This index contains a Sabre\HTTP\Response object. This object extends Sabre\HTTP\Message. On this object, we have the getBody, getBodyAsStream and getBodyAsString methods.

So, because this response from Sabre\HTTP\Client::parseCurlResult is returned by the doRequest method, basically, this feature is already supported.

QED ■.

@evert
Copy link
Member Author

evert commented Jan 12, 2015

The point of this ticket is, that someone may be using the library to do downloads of very large files.

In those cases, we want to ensure that the entire file is accessed as a stream, and never placed into memory.

So while it's possible right now to convert the string into a stream, the goal is to change the client a bit so it uses a stream under the hood as well.

@Hywan
Copy link
Member

Hywan commented Jan 12, 2015

@evert But the result is already in memory because cURL gives you all the “stuff”: https://github.com/fruux/sabre-http/blob/c55cbc1daa91293cda92ea4b90de79c743c4a149/lib/Client.php#L483. I will check if cURL can gives only few informations in order to create a stream.

@Hywan
Copy link
Member

Hywan commented Jan 12, 2015

An interesting link from a friend of me @pmartin: http://stackoverflow.com/questions/1342583/manipulate-a-string-that-is-30-million-characters-long/1342760#1342760. However, I don't know how it works if we don't want to load the response yet we receive it but later: When reading the stream only.

@Hywan
Copy link
Member

Hywan commented Jan 12, 2015

In case where the user has a stream ready and with the write permission, she can give this stream to the HTTP client and the response will be copied into this stream. This answer to one use-case.

@evert
Copy link
Member Author

evert commented Jan 12, 2015

streams in request bodies already 100% work, this is about turning a HTTP response into a stream.

@Hywan
Copy link
Member

Hywan commented Jan 12, 2015

@evert How does it work. I missed it in the source code?

@evert
Copy link
Member Author

evert commented Jan 12, 2015

@Hywan
Copy link
Member

Hywan commented Jan 12, 2015

@evert Yup, it's for sending a request. What you would like to do is for receiving a response, right?

@evert
Copy link
Member Author

evert commented Jan 12, 2015

Indeed, yes!

@h44z
Copy link

h44z commented Apr 12, 2015

Any news on this? I am very interrested in using streams as im going to down/upload large files >2GB with the webdav client.

@Hywan
Copy link
Member

Hywan commented Apr 12, 2015

@h44z Not from me yet.

@evert
Copy link
Member Author

evert commented Apr 13, 2015

This is still something that interesting for us, but we haven't had time implementing it yet.

@evert
Copy link
Member Author

evert commented May 19, 2015

So the problem here is that curl actually doesn't have an easy way for us to just access the stream resource, as far as I can tell.

The only way we can progressively get access to the stream, is by using the CURLOPT_WRITEFUNCTION option, but that only gives us 'bits of the string' as opposed to a full-on stream.

With that function, we could send everything to a temporary stream (php://temp/) which would cache the result in memory, but that only solves part of the problem.

Ideally we'd want the response to return as soon as it starts coming in and not after all the bytes have arrived, and ideally we would want to not have to cache/buffer it anywhere.

I don't see an easy way to do that.

@staabm
Copy link
Member

staabm commented May 19, 2015

CURLOPT_WRITEFUNCTION seems to be invoked when a certain sized chunk was downloaded by CURL. So it looks more or less like a "real" stream.

Which parts of the problem doesnt this solve for you, could you elaborate?

@evert
Copy link
Member Author

evert commented May 19, 2015

Well, it would be really nice if we can do something like :

$request = "...";
$response = $client->send($request);
stream_copy_to_stream('php://output', $response->getBodyAsStream());

This should:

  1. Use real stream resources internally
  2. Never buffer anything, anywhere.

@staabm
Copy link
Member

staabm commented May 19, 2015

Absolutely. So we need to change from using a string and use e.g. a php tmp stream in
https://github.com/fruux/sabre-http/blob/master/lib/Client.php#L496

this will be not "real" streaming but it will be way better as what we have ATM.
whats the actual shop stopper? Did I miss something?

@evert
Copy link
Member Author

evert commented May 19, 2015

The problem with using the temporary stream is that it only partially solve the goals. It does not:

  1. Avoid a cache/buffer. The entire response will be stored in memory or on disk.
  2. We can't start reading until the entire response is in, because we can't write and read the string at the same time.

@evert
Copy link
Member Author

evert commented May 19, 2015

It's better than nothing though.

@staabm
Copy link
Member

staabm commented May 19, 2015

Avoid a cache/buffer. The entire response will be stored in memory or on disk.

right, but thats how streaming works nevertheless... php://memory seems like a good fit for that.

We can't start reading until the entire response is in, because we can't write and read the string at the same time.

hm IIRC we could do this using a non-blocking read/write stream, couldn't we?

@staabm
Copy link
Member

staabm commented May 19, 2015

maybe it would also be a good occasion to use a different http client, e.g. https://github.com/amphp/artax

(so we dont need to workaround curl limitations)

@evert
Copy link
Member Author

evert commented May 19, 2015

right, but thats how streaming works nevertheless... php://memory seems like a good fit for that.

That's not really true... If I didn't use curl and used PHP's built-in HTTP stream wrappers, there would be no buffer.

Here's an example (and note that I did stream_copy_to_stream() wrong in my previous code snippet, sorry about that):

stream_copy_to_stream(
   fopen('http://example.org/','r'),
   fopen('php://output','w')
);

If I do the same with a temporary stream, it would look more like this:

$tmp = fopen('php://temp','r+');

stream_copy_to_stream(
   fopen('http://example.org/','r'),
   fopen($tmp,'w')
);
rewind($tmp);

stream_copy_to_stream(
   $tmp,
   fopen('php://output','w')
);

This last example has two passes, and requires a buffer (disk or memory depending on the size) and this is exactly how the curl example with WRITEFUNCTIONwould work as well. This is far from ideal. The use-case I want to solve is indeed the '2GB download' use-case, and if I force people to store the entire thing on disk first that would be sub-optimal.

We can't start reading until the entire response is in, because we can't write and read the string at the same time.

hm IIRC we could do this using a non-blocking read/write stream, couldn't we?

Not at the same time, and not without a buffer. Perhaps with steam_select() and mkfifo() :)

maybe it would also be a good occasion to use a different http client, e.g. https://github.com/amphp/artax

Not a bad idea =)

@staabm
Copy link
Member

staabm commented May 19, 2015

I think the only really good solution will be a different http client.
With artax you can even use single threaded concurrency in case some requests can be made in parallel which could be another perf. win.

@evert
Copy link
Member Author

evert commented May 19, 2015

I'll definitely look into it. My preference would go for something lightweight, so maybe artax is that =)

In the future I want to kick of sabre/davclient again, so that will be good timing to dig into that.

Gasol pushed a commit to Gasol/http that referenced this issue Dec 12, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 12, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 13, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 14, 2018
In order to resolve sabre-io#82, Remove the substr function in
`parseCurlResult()` and register receiveCurlHeader as callback function by
using CURLOPT_HEADERFUNCTION instead, To avoid additional memory copy
of response body (from $response to $responseBody).

This changes is not intended to resolve issue sabre-io#15
Gasol pushed a commit to Gasol/http that referenced this issue Dec 14, 2018
In order to resolve sabre-io#82 to avoid additional memory copy of response body
(from $response to $responseBody), We deperecated `parseCurlResult` method
which use `substr` function to separate HTTP headers and body and create new
method `parseCurlResponse` instead. We register `receiveCurlHeader` method
as callback function by curl_setopt using `CURLOPT_HEADERFUNCTION` option.

This changes is not intended to resolve issue sabre-io#15

See sabre-io#115 (comment)
Gasol pushed a commit to Gasol/http that referenced this issue Dec 14, 2018
In order to resolve sabre-io#82 to avoid additional memory copy of response body
(from $response to $responseBody), We deperecated `parseCurlResult` method
which use `substr` function to separate HTTP headers and body and create new
method `parseCurlResponse` instead. We register `receiveCurlHeader` method
as callback function by curl_setopt using `CURLOPT_HEADERFUNCTION` option.

This changes is not intended to resolve issue sabre-io#15

See sabre-io#115 (comment)
Gasol pushed a commit to Gasol/http that referenced this issue Dec 14, 2018
In order to resolve sabre-io#82 to avoid additional memory copy of response body
(from $response to $responseBody), We deperecated `parseCurlResult` method
which use `substr` function to separate HTTP headers and body and create new
method `parseCurlResponse` instead. We register `receiveCurlHeader` method
as callback function by curl_setopt using `CURLOPT_HEADERFUNCTION` option.

This changes is not intended to resolve issue sabre-io#15

See sabre-io#115 (comment)
staabm pushed a commit that referenced this issue Mar 18, 2019
In order to resolve #82 to avoid additional memory copy of response body
(from $response to $responseBody), We deperecated `parseCurlResult` method
which use `substr` function to separate HTTP headers and body and create new
method `parseCurlResponse` instead. We register `receiveCurlHeader` method
as callback function by curl_setopt using `CURLOPT_HEADERFUNCTION` option.

This changes is not intended to resolve issue #15

See #115 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants