Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Server Write Timeout #26

Open
DaneSlattery opened this issue Sep 26, 2024 · 19 comments
Open

HTTP Server Write Timeout #26

DaneSlattery opened this issue Sep 26, 2024 · 19 comments

Comments

@DaneSlattery
Copy link

Is it possible for a handler to be stuck during the write_all phase of a response? Take the code below, if conn.write_all was writing a large buffer or over a slow connection, and the client closed the connection.

Would the handler be stuck trying to write_all forever?

impl<'b, T, const N: usize> Handler<'b, T, N> for HttpHandler
where
    T: Read + Write,
    T::Error: Send + Sync + std::error::Error + 'static,
{
    type Error = anyhow::Error;

    async fn handle(&self, conn: &mut Connection<'b, T, N>) -> Result<(), Self::Error> {
        let headers = conn.headers()?;

        if !matches!(headers.method, Some(Method::Get)) {
            conn.initiate_response(405, Some("Method Not Allowed"), &[])
                .await?;
        } else if !matches!(headers.path, Some("/")) {
            conn.initiate_response(404, Some("Not Found"), &[]).await?;
        } else {
            conn.initiate_response(200, Some("OK"), &[("Content-Type", "text/plain")])
                .await?;

            conn.write_all(b"Hello world!").await?;
        }

        Ok(())
    }
}
@ivmarkov
Copy link
Owner

I think if the client closed the write pipe (read pipe, from the client POV) you should get an error...

Do you experience this?

@DaneSlattery
Copy link
Author

I am trying to drill down into it. I seem to be experiencing something like this where the http server begins to refuse connections.

The remaining services seem to continue working (mqtt, modbus, uart , WiFi etc) , but http goes dark. It’s kind of hard to replicate, and I’ve only noticed it after coming back to an esp32 that’s been running for a few days

@AnthonyGrondin
Copy link
Contributor

Can you check if it's happening for a specific connection type.
I'm currently hitting issues when using edge-http + esp-mbedtls where the handler would hang on read(), if the connection type is Close.

See esp-rs/esp-mbedtls#38 (comment)

@AnthonyGrondin
Copy link
Contributor

AnthonyGrondin commented Oct 7, 2024

Actually I just realized I'm hitting this exact same issue without esp-mbedtls in the middle. Just using esp-wifi + edge-http, I'm trying to spawn a captive portal on my device, but the request hangs after handling because the client sent the header "Connection: Close", making edge-http hang on this line:

while request.io.read(&mut buf).await? > 0 {}

Perhaps yours is the same issue I'm hitting.

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 7, 2024

@AnthonyGrondin

I think there is a bug indeed, and I might understand from where it is coming: a request with Connection: Close is something different from a response with Connection: Close:

  • A request with Connection: Close which also is having a missing Content-Length header and a missing Transfer-Encoding: chunked header should be assumed to have Content-Length: 0 (this is the main part of the bug - we do not assume that)
  • A response with Connection: Close which also is having a missing Content-Length header and a missing Transfer-Encoding header can be as long as it gets because the server will close the connection in the end

Finally, if the server gets Connection: Close in the request, it should produce Connection: Close in the response too, and should close the connection, no matter what. This is the second part of the bug, as I don't think we are doing that.

Let me see how to fix that. In any case, if you have a wireshark or Chrome header of the HTTP client request that cause the server to hang, that would be very useful!

@AnthonyGrondin
Copy link
Contributor

Here's a trace of the request that's giving me issues.

As you can see, the server (192.168.11.1) never sends a response

image

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 7, 2024

I think I fixed it.

Can you try this branch?

Unfortunately, I managed to mess up my fix together with my other commit which adds inline documentation, so I could imagine, the fix would be very difficult to review. :(

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 7, 2024

PR here: #33

@ivmarkov ivmarkov mentioned this issue Oct 7, 2024
@AnthonyGrondin
Copy link
Contributor

I tested on my side, thinking initially that it worked, but it turns out it still hangs when I send a request with Connection: close

$ curl -v http://192.168.11.1/hotspot-detect.html
*   Trying 192.168.11.1:80...
* Connected to 192.168.11.1 (192.168.11.1) port 80
> GET /hotspot-detect.html HTTP/1.1
> Host: 192.168.11.1
> User-Agent: curl/8.9.1
> Accept: */*
> 
* Request completely sent off
< HTTP/1.1 302 Found
< Location: /
< Connection: Keep-Alive
< Transfer-Encoding: Chunked
< 
* Connection #0 to host 192.168.11.1 left intact
$ curl -v http://192.168.11.1/hotspot-detect.html -H "connection: close"
*   Trying 192.168.11.1:80...
* Connected to 192.168.11.1 (192.168.11.1) port 80
> GET /hotspot-detect.html HTTP/1.1
> Host: 192.168.11.1
> User-Agent: curl/8.9.1
> Accept: */*
> connection: close
> 
* Request completely sent off
< HTTP/1.1 302 Found
< Location: /
< Connection: Close
< 

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 7, 2024

Huh...

Here's the same test that you did except I did it with the http_server example:

ivan@ivan-laptop:~/Downloads$ curl -v http://m800:8881
*   Trying 192.168.10.189:8881...
* Connected to m800 (192.168.10.189) port 8881 (#0)
> GET / HTTP/1.1
> Host: m800:8881
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Connection: Keep-Alive
< Transfer-Encoding: Chunked
< 
* Connection #0 to host m800 left intact
Hello world!ivan@ivan-laptop:~/Downloads$ curl -v http://m800:8881 -H "connection: close"
*   Trying 192.168.10.189:8881...
* Connected to m800 (192.168.10.189) port 8881 (#0)
> GET / HTTP/1.1
> Host: m800:8881
> User-Agent: curl/7.81.0
> Accept: */*
> connection: close
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Connection: Close
< 
* Closing connection 0
Hello world!ivan@ivan-laptop:~/Downloads$ 

For me, it works in both cases (keep-alive and close) and - for close - the connection is closed immediately.
You sure you are testing the correct branch?

@AnthonyGrondin
Copy link
Contributor

I'm testing with esp-wifi, perhaps the driver acts differently than STD. I've ensured that I use the latest commit from the PR. I do notice a different behaviour; The server no longer refuse connections / hang when I send requests with Connection: close. I can spawn 2 curl requests using the Connection: close header, and I'm still able to fetch the page. I wasn't able to do so initially. It would freeze before logging "Request done".

struct CaptivePortalHandler;

impl<'b, T, const N: usize> Handler<'b, T, N> for CaptivePortalHandler
where
    T: Read + Write,
    T::Error: Send + Sync,
{
    type Error = HandlerError<T>;

    async fn handle(
        &self,
        connection: &mut edge_http::io::server::Connection<'b, T, N>,
    ) -> Result<(), Self::Error> {
        let headers = connection.headers().unwrap();

        log::warn!("got request: {:?}", headers);

        let r = match (headers.method, headers.path.unwrap_or("")) {
            // Captive portal configuration page
            (Some(Method::Get), "/") | (Some(Method::Get), "") => {
                connection
                    .initiate_response(200, Some("OK"), &[("Content-Type", "text/html")])
                    .await?;
                connection
                    .write_all(DEFAULT_CAPTIVE.as_bytes())
                    .await
            }
            // Routes to redirect to trigger a captive portal detection
            (Some(Method::Get), "/generate_204")    // Android captive portal redirect
            | (Some(Method::Get), "/redirect")      // Microsoft redirect
            | (Some(Method::Get), "/hotspot-detect.html") // Apple iOS
            | (Some(Method::Get), "/canonical.html") // Firefox captive portal
            => {
                log::error!("Captive portal URL. Redirecting...");
                connection
                    .initiate_response(302, Some("Found"), &[("Location", "/")])
                    .await
            }
            (_, path) => {
                log::warn!("404 Not found: {:?}", path);
                connection
                    .initiate_response(404, Some("Not Found"), &[("Content-Type", "text/plain")])
                    .await?;
                connection.write_all(path.as_bytes()).await
            }
        };

        log::error!("Request done");

        r
    }
}

My handler logs the following:

WARN - got request: RequestHeaders { http11: Some(true), method: Some(Get), path: Some("/hotspot-detect.html"), headers: Headers([Header { name: "Host", v
alue: "192.168.11.1" }, Header { name: "User-Agent", value: "curl/8.9.1" }, Header { name: "Accept", value: "*/*" }, Header { name: "connection", value: "
close" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "
", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { n
ame: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Head
er { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }
, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value
: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }]) }
ERROR - Captive portal URL. Redirecting...
ERROR - Request done

Interestingly wireshark sees the response as TCP raw bytes, and not as an HTTP stream

Here's the wireshark trace for the request above:
image

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 7, 2024

It would freeze before logging "Request done".

But in your own logs, I can see the "Request done" being printed?

My handler logs the following:

WARN - got request: RequestHeaders { http11: Some(true), method: Some(Get), path: Some("/hotspot-detect.html"), headers: Headers([Header { name: "Host", v
alue: "192.168.11.1" }, Header { name: "User-Agent", value: "curl/8.9.1" }, Header { name: "Accept", value: "*/*" }, Header { name: "connection", value: "
close" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "
", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { n
ame: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Head
er { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }
, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value
: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }, Header { name: "", value: "" }]) }
ERROR - Captive portal URL. Redirecting...
ERROR - Request done

As in here ^^^ ?

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 7, 2024

Are you saying that - before the PR - it used to freeze before reaching "Request done"? Now - with the PR - it does NOT freeze and logs "Request done".

If so, can you describe, like what happens now? This is what I still do not understand? The connection is not physically closed (the socket itself), or...?

@AnthonyGrondin
Copy link
Contributor

AnthonyGrondin commented Oct 7, 2024

Yes. Before the PR, making a request would freeze before reaching "Request done"

Now, with this PR, it does successfully send a response but Wireshark sees the response as TCP bytes (underlined in red), not HTTP.
You can also see in the wireshark trace that my client (192.168.11.51) ACK the response from the server.

Now, with this PR, it hangs after receiving the response, as if cURL was expecting the socket to be closed.
You can see it sends [TCP Keep-Alive] packets, until the server close the connection with a RST

image

EDIT: For context, this trace contains two requests, the first one ending with a [FIN, ACK], that where I manually closed cURL with CTRL + C

The second request, I let it hang until it got closed by the server.

@DaneSlattery
Copy link
Author

DaneSlattery commented Oct 8, 2024

Just tried this out and getting the expected behavior on connection close:

Still doesn't cover my refused connections, which I came back to yesterday after the esp ran for the weekend. It seems like after a while, the http server does recover and begin accepting requests again, but I'm struggling to figure out what the issue is.

esp@b887d62fc147:~/http_test$ /http_test$ curl -v http://192.16
*   Trying 192.168.1.100:8881...
* Connected to 192.168.1.100 (192.168.1.100) port 8881 (#0)
> GET / HTTP/1.1
> Host: 192.168.1.100:8881
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Connection: Keep-Alive
< Transfer-Encoding: Chunked
< 
* Connection #0 to host 192.168.1.100 left intact
Hello world!esp@b887d62fc147:~/http_test$ curl -v http://192.168.1.100:8881 -H "Connection:close"
*   Trying 192.168.1.100:8881...
* Connected to 192.168.1.100 (192.168.1.100) port 8881 (#0)
> GET / HTTP/1.1
> Host: 192.168.1.100:8881
> User-Agent: curl/7.88.1
> Accept: */*
> Connection:close
> 
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Connection: Close
< 
* Closing connection 0

My code is based on esp-idf though with my dependencies as

[dependencies]
log = { version = "0.4", default-features = false }
esp-idf-svc = { version = "0.49", default-features = false }
edge-http = { git = "https://github.com/ivmarkov/edge-net.git", branch = "connection-type", features = [
    "std",
] }
edge-nal = { git = "https://github.com/ivmarkov/edge-net.git", branch = "connection-type" }
edge-nal-std = { git = "https://github.com/ivmarkov/edge-net.git", branch = "connection-type" }
embedded-io-async = { version = "0.6", default-features = false }
futures-lite = "2.3.0"
anyhow = "1"
embassy-time = { version = "0.3.0", features = ["generic-queue"] }
futures = "*"

main.rs

use std::thread::Scope;

use edge_http::io::server::{Connection, DefaultServer, Handler};
use edge_http::Method;
use edge_nal::TcpBind;

use embedded_io_async::{Read, Write};

use esp_idf_svc::eventloop::EspSystemEventLoop;
use esp_idf_svc::hal::prelude::Peripherals;
use esp_idf_svc::hal::task::block_on;
use esp_idf_svc::log::EspLogger;
use esp_idf_svc::nvs::EspDefaultNvsPartition;
use esp_idf_svc::sys::EspError;
use esp_idf_svc::timer::EspTaskTimerService;
use esp_idf_svc::wifi::{AsyncWifi, AuthMethod, ClientConfiguration, Configuration, EspWifi};
use futures::executor::LocalPool;
use futures::task::LocalSpawnExt;
use futures::FutureExt;
use log::info;
const SSID: &str = "redacted";
const PASSWORD: &str = "redacted";
fn main() -> Result<(), anyhow::Error> {
    // env_logger::init_from_env(
    //     env_logger::Env::default().filter_or(env_logger::DEFAULT_FILTER_ENV, "info"),
    // );

    esp_idf_svc::sys::link_patches();
    esp_idf_svc::timer::embassy_time_driver::link();
    esp_idf_svc::io::vfs::initialize_eventfd(5)?;
    EspLogger::initialize_default();

    std::thread::Builder::new()
        .stack_size(60000)
        .spawn(run_main)
        .unwrap()
        .join()
        .unwrap()
        .unwrap();

    Ok(())
}

fn run_main() -> Result<(), anyhow::Error> {
    let mut local_executor = LocalPool::new();
    // let spawner = local_executor.spawner();

    local_executor.spawner().spawn_local(
        async move {
            let _wifi = wifi_create().await?;
            let mut server = DefaultServer::new();
            run(&mut server).await?;

            Result::<_, anyhow::Error>::Ok(())
        }
        .map(Result::unwrap),
    )?;

    local_executor.run();

    log::info!("Thread execution finised ");

    Ok(())
}

pub async fn run(server: &mut DefaultServer) -> Result<(), anyhow::Error> {
    let addr = "0.0.0.0:8881";

    info!("Running HTTP server on {addr}");

    let acceptor = edge_nal_std::Stack::new()
        .bind(addr.parse().unwrap())
        .await?;
    info!("bind");
    server.run(acceptor, HttpHandler, None).await?;
    info!("run");
    Ok(())
}

struct HttpHandler;

impl<'b, T, const N: usize> Handler<'b, T, N> for HttpHandler
where
    T: Read + Write,
    T::Error: Send + Sync + std::error::Error + 'static,
{
    type Error = anyhow::Error;

    async fn handle(&self, conn: &mut Connection<'b, T, N>) -> Result<(), Self::Error> {
        let headers = conn.headers()?;

        if !matches!(headers.method, Some(Method::Get)) {
            conn.initiate_response(405, Some("Method Not Allowed"), &[])
                .await?;
        } else if !matches!(headers.path, Some("/")) {
            conn.initiate_response(404, Some("Not Found"), &[]).await?;
        } else {
            conn.initiate_response(200, Some("OK"), &[("Content-Type", "text/plain")])
                .await?;

            conn.write_all(b"Hello world!").await?;
        }

        Ok(())
    }
}
async fn wifi_create() -> Result<esp_idf_svc::wifi::EspWifi<'static>, EspError> {
    use esp_idf_svc::eventloop::*;
    use esp_idf_svc::hal::prelude::Peripherals;
    use esp_idf_svc::nvs::*;
    use esp_idf_svc::wifi::*;

    let sys_loop = EspSystemEventLoop::take()?;
    let timer_service = EspTaskTimerService::new()?;
    let nvs = EspDefaultNvsPartition::take()?;

    let peripherals = Peripherals::take()?;

    let mut esp_wifi = EspWifi::new(peripherals.modem, sys_loop.clone(), Some(nvs.clone()))?;
    let mut wifi = AsyncWifi::wrap(&mut esp_wifi, sys_loop.clone(), timer_service)?;

    wifi.set_configuration(&Configuration::Client(ClientConfiguration {
        ssid: SSID.try_into().unwrap(),
        password: PASSWORD.try_into().unwrap(),
        ..Default::default()
    }))?;

    wifi.start().await?;
    info!("Wifi started");

    wifi.connect().await?;
    info!("Wifi connected");

    wifi.wait_netif_up().await?;
    info!("Wifi netif up");

    Ok(esp_wifi)
}

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 8, 2024

Yes, the problem is only with embassy-net, where we have to implement a graceful TCP socket shutdown.

Though for STD we also don't call socket::shutdown and we probably should. I'll fix that too.

@DaneSlattery
Copy link
Author

DaneSlattery commented Oct 8, 2024

I have confirmed with the following handler that if I close the browser halfway through the for i in 0..10 loop, (my original guess for this issue is that the write would be trying to write forever to a closed socket, using up all the available handler tasks (this one has 4)), but that seems to work just fine:

 async fn handle(&self, conn: &mut Connection<'b, T, N>) -> Result<(), Self::Error> {
        let headers = conn.headers()?;

        if !matches!(headers.method, Some(Method::Get)) {
            conn.initiate_response(405, Some("Method Not Allowed"), &[])
                .await?;
        } else if !matches!(headers.path, Some("/")) {
            conn.initiate_response(404, Some("Not Found"), &[]).await?;
        } else {
            conn.initiate_response(200, Some("OK"), &[("Content-Type", "text/plain")])
                .await?;
            for i in 0..10 {
                conn.write_all(b"Hello world!").await?;
                info!("i = {i}");
                esp_idf_svc::hal::delay::FreeRtos::delay_ms(1000);
            }

            // conn.
        }

        Ok(())
    }
.[0;33mW (61655) edge_http::io::server: Handler task 0: Error when handling request: Handler(Connection reset by peer (os error 104)).[0m
.[0;32mI (61685) http_test: i = 0.[0m
.[0;32mI (62685) http_test: i = 1.[0m
.[0;32mI (63685) http_test: i = 2.[0m
.[0;33mW (64685) edge_http::io::server: Handler task 0: Error when handling request: Handler(Connection reset by peer (os error 104)).[0m
.[0;32mI (64715) http_test: i = 0.[0m
.[0;32mI (65715) http_test: i = 1.[0m
.[0;32mI (66715) http_test: i = 2.[0m
.[0;32mI (67715) http_test: i = 3.[0m
.[0;32mI (68715) http_test: i = 4.[0m
.[0;32mI (69715) http_test: i = 5.[0m
.[0;32mI (70715) http_test: i = 6.[0m
.[0;33mW (71715) edge_http::io::server: Handler task 0: Error when handling request: Handler(Connection reset by peer (os error 104)).[0m

So I really am struggling to re-create my issue in a reproducable way, but perhaps I'll point my code to this branch and see if it also helps things with the server write.

I suspect that futures in rust are cancelled when they are dropped, do you have any literature on that?

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 8, 2024

but perhaps I'll point my code to this branch and see if it also helps things with the server write.

This branch is now in master as it definitely fixed at least two issues related to how we handle "old fashioned" HTTP connections which have Connection: Close header.

I suspect that futures in rust are cancelled when they are dropped, do you have any literature on that?

TL;DR: Yes, they are. Future drop in Rust means cancellation. So futures routinely "close" stuff in their drop method.

Elaboration:
However - if you have to do something - like "closing stuff" - that needs async you currently cannot do it in the future drop fn, because the drop is always blocking (and ideally should not wait). This is known as the "rust async drop" problem, where having an "async drop" is just one out of many possibilities (having un-droppable types that you always need to manually destruct is another).

Anyway, I digress.

I'm planning to put some "global" timeouts in the HTTP server handling code too, so whatever happens, at some point the HTTP handlers will always get un-stuck.

@ivmarkov
Copy link
Owner

ivmarkov commented Oct 8, 2024

@DaneSlattery @AnthonyGrondin There's this branch now and the corresponding PR

@DaneSlattery

  • If you can look at the PR description, this PR now implements proper timeouts for the HTTP server, so with it - whatever happens - the server would never get stuck. You can use the default ones (30min for a request-response and 50sec for each IO operation) or you can tailor these to your liking
  • There is a proper socket shutdown now for STD, even though STD was working even without it

@AnthonyGrondin

  • Besides the above, I've also implemented proper socket shutdown for edge-nal-embassy, which should take care of the hanging CURL connection with Connection: Close.
  • However I would be very grateful if you could test that and return feedback, as setting up an embassy-net testing env will take me quite some time (still deep in STD land with Matter, etc.)

Also, and based on the yesterday's discussion with Dario, I hope the socket shutdown logic is as simple as it gets, and as correct as it gets (modulo trivial bugs), so once we confirm it works fine, you might want to replicate it in the esp-mbedtls + esp-wifi combo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants