Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Testing REST Server #134

Open
vtnerd opened this issue Sep 22, 2024 · 4 comments
Open

Load Testing REST Server #134

vtnerd opened this issue Sep 22, 2024 · 4 comments

Comments

@vtnerd
Copy link
Owner

vtnerd commented Sep 22, 2024

This is some basic numbers for the REST server to determine what needs to be changed for bottlenecks. I used the wrk2 utility, which seems to place the server under decent load. monero-lws-daemon and monerod were both running on a ryzen 3900x box with 32 GiB RAM whereas wrk2 was done a laptop. A wired connection (to the same switch) was used to ensure that latencies were low and consistent.

Raw Performance Numbers

login

Running 10s test @ [internal_ip]:8080/login
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 476.29ms 840.19ms 4.60s 87.19%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 28.38ms
75.000% 594.43ms
90.000% 1.65s
99.000% 3.51s
99.900% 4.03s
99.990% 4.53s
99.999% 4.60s
100.000% 4.60s

[Mean = 476.288, StdDeviation = 840.193]
[Max = 4599.808, Total count = 174799]
[Buckets = 27, SubBuckets = 2048]

174807 requests in 10.00s, 34.34MB read
Requests/sec: 17483.81
Transfer/sec: 3.43MB

get_address_info

Running 10s test @ [internal_ip]:8080/get_address_info
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 526.71ms 816.44ms 4.26s 87.16%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 152.06ms
75.000% 519.17ms
90.000% 1.65s
99.000% 3.72s
99.900% 4.11s
99.990% 4.22s
99.999% 4.26s
100.000% 4.27s

[Mean = 526.713, StdDeviation = 816.440]
[Max = 4263.936, Total count = 174725]
[Buckets = 27, SubBuckets = 2048]

174733 requests in 10.00s, 58.99MB read
Requests/sec: 17473.53
Transfer/sec: 5.90MB

get_unspent_outs

Running 10s test @ [internal_ip]:8080/get_unspent_outs
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.08s 1.86s 7.82s 60.48%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 3.03s
75.000% 4.58s
90.000% 5.53s
99.000% 7.15s
99.900% 7.47s
99.990% 7.75s
99.999% 7.81s
100.000% 7.82s

[Mean = 3079.689, StdDeviation = 1861.498]
[Max = 7819.264, Total count = 71066]
[Buckets = 27, SubBuckets = 2048]

71074 requests in 10.00s, 14.30MB read
Requests/sec: 7106.51
Transfer/sec: 1.43MB

get_random_outs

Running 10s test @ [internal_ip]:8080/get_random_outs
8 threads and 100 connections
Thread calibration: mean lat.: 4970.170ms, rate sampling interval: 19087ms
Thread calibration: mean lat.: 5377.621ms, rate sampling interval: 16736ms
Thread calibration: mean lat.: 4950.556ms, rate sampling interval: 15835ms
Thread calibration: mean lat.: 5076.512ms, rate sampling interval: 17317ms
Thread calibration: mean lat.: 4642.232ms, rate sampling interval: 16113ms
Thread calibration: mean lat.: 6349.897ms, rate sampling interval: 19365ms
Thread calibration: mean lat.: 4918.584ms, rate sampling interval: 15532ms
Thread calibration: mean lat.: 4544.176ms, rate sampling interval: 15261ms
Thread Stats Avg Stdev Max +/- Stdev
Latency -nanus -nanus 0.00us 0.00%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 0.00us
75.000% 0.00us
90.000% 0.00us
99.000% 0.00us
99.900% 0.00us
99.990% 0.00us
99.999% 0.00us
100.000% 0.00us

[Mean = -nan, StdDeviation = -nan]
[Max = 0.000, Total count = 0]
[Buckets = 27, SubBuckets = 2048]

68 requests in 10.07s, 219.44KB read
Socket errors: connect 0, read 0, write 0, timeout 421
Requests/sec: 6.75
Transfer/sec: 21.80KB

Analysis

login and get_address_info both max around ~17,400 requests a second. I requested 20,000 to wrk2 - its not clear why this target couldn't be achieved (link limit or laptop limit).

get_unspent_outs maxed around ~7,100 requests a second. This is expected, and almost certainly due to the ZMQ call within the handler code. The values returned over ZMQ could be cached safely, but when the cache timeout hits the throughput will drop by 50%.

get_random_outs had really low throughput, which is to be expected. This also does an expensive ZMQ call in the handler. Caching this is a little more tricky because the random output selection will be delayed from real-time. Even with caching, when the cache time out hits the throughput of the REST threads will drop dramatically.

Steps from Here

In both cases, the requests/sec drop came from blocking ZMQ calls within the HTTP handler. The "correct" engineering fix to pause/resume the REST handlers so that the ZMQ calls never block any of the handler threads. This cannot be achieved with the epee HTTP server, because the response must be synchronous through this framework.

The steps (in-order) to achieve better throughput with the REST server:

  • Get a proof-of-concept working with boost::beast.
  • Test throughput on boost::beast - make sure the requests/sec are similar to the current HTTP server
  • If boost::beast passes tests (on login, and get_address_info), then get the code in a "shippable" state.
  • Incorporate AZMQ into the new boost::beast framework, such that get_unspent_outs and random_outs never block on ZMQ calls
  • Add caching to get_unspent_outs so that throughput on that endpoint improves
  • Do not cache get_random_outs, as its too risky to give stale data on that call.
@vtnerd
Copy link
Owner Author

vtnerd commented Sep 22, 2024

One point to clarify - the account used had 0 received and 0 spent. There are allocations in the code paths if the account has received or spent funds, that likely would've slowed the responses a little. Testing with the empty account was intentionally done to test the max throughput possible, and compare the slowdowns of the ZMQ calls. Subsequent load tests with accounts with received and spent funds will likely be done to see if its worth attempting a "streaming" design from LMDB directly into JSON.

@vtnerd
Copy link
Owner Author

vtnerd commented Sep 22, 2024

Another clarification, this was using one REST thread. Despite the 32-threads on the server, I don't see a reason to increase the REST thread count because I wanted to test throughput of a single response thread.

@vtnerd
Copy link
Owner Author

vtnerd commented Sep 23, 2024

I've gotten a quick proof-of-concept for boost::beast and the wrk2 load stresser shows that it handles roughly an additional ~600 requests/sec (or 18,000 request/sec total). The latency average and latency stddev are also lowered somewhat.

Given that there was no drop in performance when switching to boost::beast, I will move forward with the attempt to switch to AZMQ so that individual REST handlers can be suspended/resumed. This is likely a bigger overhaul, so expect a delay in updates to this change.

Raw Numbers

Running 10s test @ [internal_ip]:8080/get_address_info
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 460.01ms 269.25ms 952.83ms 57.54%
Req/Sec -nan -nan 0.00 0.00%
Latency Distribution (HdrHistogram - Recorded Latency)

[Mean = 460.012, StdDeviation = 269.253]
[Max = 952.832, Total count = 180905]
[Buckets = 27, SubBuckets = 2048]

180913 requests in 10.00s, 50.21MB read
Requests/sec: 18093.27
Transfer/sec: 5.02MB

@vtnerd
Copy link
Owner Author

vtnerd commented Sep 23, 2024

I should also mention that some additional constraints on boost::beast should be done somehow, but I have to dig into the library further to figure out how.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant