fix erroneous retries on a failed request to a newly opened socket #150

ahl · 2024-09-21T07:28:35Z

In migrating to hyper v1, we encountered an issue with reqwest that we tracked down to hyper-util. I've created a reproduction here.

We see what appears to be aberrant behavior when a client (reqwest::Client or hyper_util::client::legacy::Client) is making a request to a server that may close the connection deliberately. In particular, we see that the client opens a new connection and may try opening connections many many times! If the client is unable to start writing the request to the newly opened connection, it will open a new connection and try again until it's able to write some or all of the request to the socket prior to the server closing it.

This appears to be a result of #133 which reintroduced retry logic. It is quite similar to hyperium/hyper@ee61ea9 but diverges in some important ways. In particular:

#133

        loop {
            req = match self.try_send_request(req, pool_key.clone()).await {
                Ok(resp) => return Ok(resp),
                Err(TrySendError::Nope(err)) => return Err(err),
                Err(TrySendError::Retryable { mut req, error }) => {
                    if !self.config.retry_canceled_requests {
                        // if client disabled, don't retry
                        // a fresh connection means we definitely can't retry
                        return Err(error);
                    }

                    trace!(
                        "unstarted request canceled, trying again (reason={:?})",
                        error
                    );
                    *req.uri_mut() = uri.clone();
                    req
                }
            }
        }

hyperium/hyper@ee61ea9

        loop {
            match self.future.poll() {
                Ok(Async::Ready(resp)) => return Ok(Async::Ready(resp)),
                Ok(Async::NotReady) => return Ok(Async::NotReady),
                Err(ClientError::Normal(err)) => return Err(err),
                Err(ClientError::Canceled {
                    connection_reused,
                    req,
                    reason,
                }) => {
                    if !self.client.retry_canceled_requests || !connection_reused {
                        // if client disabled, don't retry
                        // a fresh connection means we definitely can't retry
                        return Err(reason);
                    }
                    trace!("unstarted request canceled, trying again (reason={:?})", reason);
                    let mut req = request::join(req);
                    req.set_proxy(self.is_proxy);
                    req.set_uri(self.uri.clone());
                    self.future = self.client.send_request(req, &self.domain);
                }
            }
        }

Note that the comment has been preserved across the years, but the critical check for connection_reused is absent on the new revision.

Here are the call-specific error types each commit introduced:

#133

enum TrySendError<B> {
    Retryable { error: Error, req: Request<B> },
    Nope(Error),
}

hyperium/hyper@ee61ea9

pub(crate) enum ClientError<B> {
    Normal(::Error),
    Canceled {
        connection_reused: bool,
        req: (::proto::RequestHead, Option<B>),
        reason: ::Error,
    }
}

It seems as though the newer code may have been accidentally similar to the older code rather than intentionally omitting connection_reused, but I may be wrong.

In this case, we are establishing a new connection. The documentation for retry_canceled_requests suggests that the setting should only be applicable for pooled connections that have been reused:

Set whether to retry requests that get disrupted before ever starting to write.

This means a request that is queued, and gets given an idle, reused connection, and then encounters an error immediately as the idle connection was found to be unusable.

When this is set to false, the related ResponseFuture would instead resolve to an Error::Cancel.

This fix borrows from the older code. With it applied, the reproducer above issues a single connection request (which fails, as expected).

I've deleted commented out code that appears to be no longer relevant in that it applies to functionality that is either implemented by #133 (and this fix) or may no longer be applicable. If these deletions were overly cavalier or simply unwanted, I'm happy to revert them.

seanmonstar

Thank you!

I see what happened, when the code was transferred over I consolidated the "cant retry" concept into just Nope, assuming that the only case was establishing a connection (the connection_for() call). But this rightly fixes the case where a connection was created, but then errors immediately afterwards.

ahl · 2024-09-23T21:17:47Z

@seanmonstar thanks for this; do you know when we can expect a new release of hyper-util?

seanmonstar · 2024-09-23T21:29:37Z

Landing some dependency updates, and then release likely tomorrow.

fix erroneous retries on a failed request to a newly opened socket

7c44eeb

seanmonstar approved these changes Sep 23, 2024

View reviewed changes

seanmonstar merged commit d3e9699 into hyperium:master Sep 23, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix erroneous retries on a failed request to a newly opened socket #150

fix erroneous retries on a failed request to a newly opened socket #150

ahl commented Sep 21, 2024

seanmonstar left a comment

ahl commented Sep 23, 2024

seanmonstar commented Sep 23, 2024

fix erroneous retries on a failed request to a newly opened socket #150

fix erroneous retries on a failed request to a newly opened socket #150

Conversation

ahl commented Sep 21, 2024

seanmonstar left a comment

Choose a reason for hiding this comment

ahl commented Sep 23, 2024

seanmonstar commented Sep 23, 2024