-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load Testing REST Server #134
Comments
One point to clarify - the account used had 0 received and 0 spent. There are allocations in the code paths if the account has received or spent funds, that likely would've slowed the responses a little. Testing with the empty account was intentionally done to test the max throughput possible, and compare the slowdowns of the ZMQ calls. Subsequent load tests with accounts with received and spent funds will likely be done to see if its worth attempting a "streaming" design from LMDB directly into JSON. |
Another clarification, this was using one REST thread. Despite the 32-threads on the server, I don't see a reason to increase the REST thread count because I wanted to test throughput of a single response thread. |
I've gotten a quick proof-of-concept for Given that there was no drop in performance when switching to Raw NumbersRunning 10s test @ [internal_ip]:8080/get_address_info [Mean = 460.012, StdDeviation = 269.253] 180913 requests in 10.00s, 50.21MB read |
I should also mention that some additional constraints on |
This is some basic numbers for the REST server to determine what needs to be changed for bottlenecks. I used the
wrk2
utility, which seems to place the server under decent load.monero-lws-daemon
andmonerod
were both running on a ryzen 3900x box with 32 GiB RAM whereaswrk2
was done a laptop. A wired connection (to the same switch) was used to ensure that latencies were low and consistent.Raw Performance Numbers
login
Running 10s test @ [internal_ip]:8080/login
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 476.29ms 840.19ms 4.60s 87.19%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 28.38ms
75.000% 594.43ms
90.000% 1.65s
99.000% 3.51s
99.900% 4.03s
99.990% 4.53s
99.999% 4.60s
100.000% 4.60s
[Mean = 476.288, StdDeviation = 840.193]
[Max = 4599.808, Total count = 174799]
[Buckets = 27, SubBuckets = 2048]
174807 requests in 10.00s, 34.34MB read
Requests/sec: 17483.81
Transfer/sec: 3.43MB
get_address_info
Running 10s test @ [internal_ip]:8080/get_address_info
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 526.71ms 816.44ms 4.26s 87.16%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 152.06ms
75.000% 519.17ms
90.000% 1.65s
99.000% 3.72s
99.900% 4.11s
99.990% 4.22s
99.999% 4.26s
100.000% 4.27s
[Mean = 526.713, StdDeviation = 816.440]
[Max = 4263.936, Total count = 174725]
[Buckets = 27, SubBuckets = 2048]
174733 requests in 10.00s, 58.99MB read
Requests/sec: 17473.53
Transfer/sec: 5.90MB
get_unspent_outs
Running 10s test @ [internal_ip]:8080/get_unspent_outs
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.08s 1.86s 7.82s 60.48%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 3.03s
75.000% 4.58s
90.000% 5.53s
99.000% 7.15s
99.900% 7.47s
99.990% 7.75s
99.999% 7.81s
100.000% 7.82s
[Mean = 3079.689, StdDeviation = 1861.498]
[Max = 7819.264, Total count = 71066]
[Buckets = 27, SubBuckets = 2048]
71074 requests in 10.00s, 14.30MB read
Requests/sec: 7106.51
Transfer/sec: 1.43MB
get_random_outs
Running 10s test @ [internal_ip]:8080/get_random_outs
8 threads and 100 connections
Thread calibration: mean lat.: 4970.170ms, rate sampling interval: 19087ms
Thread calibration: mean lat.: 5377.621ms, rate sampling interval: 16736ms
Thread calibration: mean lat.: 4950.556ms, rate sampling interval: 15835ms
Thread calibration: mean lat.: 5076.512ms, rate sampling interval: 17317ms
Thread calibration: mean lat.: 4642.232ms, rate sampling interval: 16113ms
Thread calibration: mean lat.: 6349.897ms, rate sampling interval: 19365ms
Thread calibration: mean lat.: 4918.584ms, rate sampling interval: 15532ms
Thread calibration: mean lat.: 4544.176ms, rate sampling interval: 15261ms
Thread Stats Avg Stdev Max +/- Stdev
Latency -nanus -nanus 0.00us 0.00%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 0.00us
75.000% 0.00us
90.000% 0.00us
99.000% 0.00us
99.900% 0.00us
99.990% 0.00us
99.999% 0.00us
100.000% 0.00us
[Mean = -nan, StdDeviation = -nan]
[Max = 0.000, Total count = 0]
[Buckets = 27, SubBuckets = 2048]
68 requests in 10.07s, 219.44KB read
Socket errors: connect 0, read 0, write 0, timeout 421
Requests/sec: 6.75
Transfer/sec: 21.80KB
Analysis
login
andget_address_info
both max around ~17,400 requests a second. I requested 20,000 towrk2
- its not clear why this target couldn't be achieved (link limit or laptop limit).get_unspent_outs
maxed around ~7,100 requests a second. This is expected, and almost certainly due to the ZMQ call within the handler code. The values returned over ZMQ could be cached safely, but when the cache timeout hits the throughput will drop by 50%.get_random_outs
had really low throughput, which is to be expected. This also does an expensive ZMQ call in the handler. Caching this is a little more tricky because the random output selection will be delayed from real-time. Even with caching, when the cache time out hits the throughput of the REST threads will drop dramatically.Steps from Here
In both cases, the requests/sec drop came from blocking ZMQ calls within the HTTP handler. The "correct" engineering fix to pause/resume the REST handlers so that the ZMQ calls never block any of the handler threads. This cannot be achieved with the epee HTTP server, because the response must be synchronous through this framework.
The steps (in-order) to achieve better throughput with the REST server:
boost::beast
.boost::beast
- make sure the requests/sec are similar to the current HTTP serverboost::beast
passes tests (onlogin
, andget_address_info
), then get the code in a "shippable" state.boost::beast
framework, such thatget_unspent_outs
andrandom_outs
never block on ZMQ callsget_unspent_outs
so that throughput on that endpoint improvesget_random_outs
, as its too risky to give stale data on that call.The text was updated successfully, but these errors were encountered: