Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E integration tests are flaky #25423

Open
hiltontj opened this issue Oct 3, 2024 · 4 comments
Open

E2E integration tests are flaky #25423

hiltontj opened this issue Oct 3, 2024 · 4 comments
Labels

Comments

@hiltontj
Copy link
Contributor

hiltontj commented Oct 3, 2024

From time to time, some of the integration tests fail for strange reasons. It may be due to how a port is being selected for the running influxdb3 serve binary that is spun up in the test harness.

There is a function used to select a random available port:

/// Get an available bind address on localhost
///
/// This binds a [`TcpListener`] to 127.0.0.1:0, which will randomly
/// select an available port, and produces the resulting local address.
/// The [`TcpListener`] is dropped at the end of the function, thus
/// freeing the port for use by the caller.
fn get_local_bind_addr() -> SocketAddr {
let ip = std::net::Ipv4Addr::new(127, 0, 0, 1);
let port = 0;
let addr = SocketAddrV4::new(ip, port);
TcpListener::bind(addr)
.expect("bind to a socket address")
.local_addr()
.expect("get local address")
}

However, since the bind address is dropped before it is passed in to spawn the server (it needs to be, otherwise the server would not be able to bind that address and would fail to start), then there is a chance that another process or integration test could take over that port before the binary is started here:

let server_process = command.spawn().expect("spawn the influxdb3 server process");

Here are some examples of failures that seem rather odd:

@hiltontj hiltontj added the v3 label Oct 3, 2024
@hiltontj
Copy link
Contributor Author

hiltontj commented Oct 3, 2024

One option would be to forego running the actual binary by spawning the influxdb3 serve command, and just call the code to run the service directly, as is done in this function:

pub async fn command(config: Config) -> Result<()> {

This would require some refactoring to make sure that the test harness is starting things exactly as is done for the actual running binary, but would allow us to pass in a bound TcpListener/SocketAddr directly, and not have the issue described above.

One problem I see with this is that, with the way we generate IDs for, e.g., databases and tables, using static atomics, if we were to have multiple test harnesses running in a single test, then they could be clashing for IDs.

@hiltontj
Copy link
Contributor Author

hiltontj commented Oct 3, 2024

Another option would be to have the binary log the port it is listening on, and scrape if from the STDOUT in the test harness code.

@hiltontj
Copy link
Contributor Author

hiltontj commented Oct 3, 2024

Another option would be to have an option in the influxdb3 serve command to write its port to a file, or notify some other service of the port it is listening on, and then gather that info from the test harness code after it has spawned the command.

@pauldix
Copy link
Member

pauldix commented Oct 3, 2024

I think I like the log option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants