Load Testing REST Server #134

vtnerd · 2024-09-22T23:20:07Z

This is some basic numbers for the REST server to determine what needs to be changed for bottlenecks. I used the wrk2 utility, which seems to place the server under decent load. monero-lws-daemon and monerod were both running on a ryzen 3900x box with 32 GiB RAM whereas wrk2 was done a laptop. A wired connection (to the same switch) was used to ensure that latencies were low and consistent.

Raw Performance Numbers

login

Running 10s test @ [internal_ip]:8080/login
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 476.29ms 840.19ms 4.60s 87.19%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 28.38ms
75.000% 594.43ms
90.000% 1.65s
99.000% 3.51s
99.900% 4.03s
99.990% 4.53s
99.999% 4.60s
100.000% 4.60s

[Mean = 476.288, StdDeviation = 840.193]
[Max = 4599.808, Total count = 174799]
[Buckets = 27, SubBuckets = 2048]

174807 requests in 10.00s, 34.34MB read
Requests/sec: 17483.81
Transfer/sec: 3.43MB

get_address_info

Running 10s test @ [internal_ip]:8080/get_address_info
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 526.71ms 816.44ms 4.26s 87.16%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 152.06ms
75.000% 519.17ms
90.000% 1.65s
99.000% 3.72s
99.900% 4.11s
99.990% 4.22s
99.999% 4.26s
100.000% 4.27s

[Mean = 526.713, StdDeviation = 816.440]
[Max = 4263.936, Total count = 174725]
[Buckets = 27, SubBuckets = 2048]

174733 requests in 10.00s, 58.99MB read
Requests/sec: 17473.53
Transfer/sec: 5.90MB

get_unspent_outs

Running 10s test @ [internal_ip]:8080/get_unspent_outs
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.08s 1.86s 7.82s 60.48%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 3.03s
75.000% 4.58s
90.000% 5.53s
99.000% 7.15s
99.900% 7.47s
99.990% 7.75s
99.999% 7.81s
100.000% 7.82s

[Mean = 3079.689, StdDeviation = 1861.498]
[Max = 7819.264, Total count = 71066]
[Buckets = 27, SubBuckets = 2048]

71074 requests in 10.00s, 14.30MB read
Requests/sec: 7106.51
Transfer/sec: 1.43MB

get_random_outs

Running 10s test @ [internal_ip]:8080/get_random_outs
8 threads and 100 connections
Thread calibration: mean lat.: 4970.170ms, rate sampling interval: 19087ms
Thread calibration: mean lat.: 5377.621ms, rate sampling interval: 16736ms
Thread calibration: mean lat.: 4950.556ms, rate sampling interval: 15835ms
Thread calibration: mean lat.: 5076.512ms, rate sampling interval: 17317ms
Thread calibration: mean lat.: 4642.232ms, rate sampling interval: 16113ms
Thread calibration: mean lat.: 6349.897ms, rate sampling interval: 19365ms
Thread calibration: mean lat.: 4918.584ms, rate sampling interval: 15532ms
Thread calibration: mean lat.: 4544.176ms, rate sampling interval: 15261ms
Thread Stats Avg Stdev Max +/- Stdev
Latency -nanus -nanus 0.00us 0.00%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 0.00us
75.000% 0.00us
90.000% 0.00us
99.000% 0.00us
99.900% 0.00us
99.990% 0.00us
99.999% 0.00us
100.000% 0.00us

[Mean = -nan, StdDeviation = -nan]
[Max = 0.000, Total count = 0]
[Buckets = 27, SubBuckets = 2048]

68 requests in 10.07s, 219.44KB read
Socket errors: connect 0, read 0, write 0, timeout 421
Requests/sec: 6.75
Transfer/sec: 21.80KB

Analysis

login and get_address_info both max around ~17,400 requests a second. I requested 20,000 to wrk2 - its not clear why this target couldn't be achieved (link limit or laptop limit).

get_unspent_outs maxed around ~7,100 requests a second. This is expected, and almost certainly due to the ZMQ call within the handler code. The values returned over ZMQ could be cached safely, but when the cache timeout hits the throughput will drop by 50%.

get_random_outs had really low throughput, which is to be expected. This also does an expensive ZMQ call in the handler. Caching this is a little more tricky because the random output selection will be delayed from real-time. Even with caching, when the cache time out hits the throughput of the REST threads will drop dramatically.

Steps from Here

In both cases, the requests/sec drop came from blocking ZMQ calls within the HTTP handler. The "correct" engineering fix to pause/resume the REST handlers so that the ZMQ calls never block any of the handler threads. This cannot be achieved with the epee HTTP server, because the response must be synchronous through this framework.

The steps (in-order) to achieve better throughput with the REST server:

Get a proof-of-concept working with boost::beast.
Test throughput on boost::beast - make sure the requests/sec are similar to the current HTTP server
If boost::beast passes tests (on login, and get_address_info), then get the code in a "shippable" state.
Incorporate AZMQ into the new boost::beast framework, such that get_unspent_outs and random_outs never block on ZMQ calls
Add caching to get_unspent_outs so that throughput on that endpoint improves
Do not cache get_random_outs, as its too risky to give stale data on that call.

The text was updated successfully, but these errors were encountered:

vtnerd · 2024-09-22T23:29:48Z

One point to clarify - the account used had 0 received and 0 spent. There are allocations in the code paths if the account has received or spent funds, that likely would've slowed the responses a little. Testing with the empty account was intentionally done to test the max throughput possible, and compare the slowdowns of the ZMQ calls. Subsequent load tests with accounts with received and spent funds will likely be done to see if its worth attempting a "streaming" design from LMDB directly into JSON.

vtnerd · 2024-09-22T23:34:14Z

Another clarification, this was using one REST thread. Despite the 32-threads on the server, I don't see a reason to increase the REST thread count because I wanted to test throughput of a single response thread.

vtnerd · 2024-09-23T19:47:20Z

I've gotten a quick proof-of-concept for boost::beast and the wrk2 load stresser shows that it handles roughly an additional ~600 requests/sec (or 18,000 request/sec total). The latency average and latency stddev are also lowered somewhat.

Given that there was no drop in performance when switching to boost::beast, I will move forward with the attempt to switch to AZMQ so that individual REST handlers can be suspended/resumed. This is likely a bigger overhaul, so expect a delay in updates to this change.

Raw Numbers

Running 10s test @ [internal_ip]:8080/get_address_info
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 460.01ms 269.25ms 952.83ms 57.54%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)

[Mean = 460.012, StdDeviation = 269.253]
[Max = 952.832, Total count = 180905]
[Buckets = 27, SubBuckets = 2048]

180913 requests in 10.00s, 50.21MB read
Requests/sec: 18093.27
Transfer/sec: 5.02MB

vtnerd · 2024-09-23T19:48:21Z

I should also mention that some additional constraints on boost::beast should be done somehow, but I have to dig into the library further to figure out how.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load Testing REST Server #134

Load Testing REST Server #134

vtnerd commented Sep 22, 2024

vtnerd commented Sep 22, 2024

vtnerd commented Sep 22, 2024

vtnerd commented Sep 23, 2024

vtnerd commented Sep 23, 2024

Load Testing REST Server #134

Load Testing REST Server #134

Comments

vtnerd commented Sep 22, 2024

Raw Performance Numbers

login

get_address_info

get_unspent_outs

get_random_outs

Analysis

Steps from Here

vtnerd commented Sep 22, 2024

vtnerd commented Sep 22, 2024

vtnerd commented Sep 23, 2024

Raw Numbers

vtnerd commented Sep 23, 2024