-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow throughput when reading many small files #953
Comments
I think Mountpoint may not be correctly configuring the default network throughput on p4de instances. Could you try the argument Also, since you're going from EC2 to S3, |
@pkasravi I noticed that you configured Mountpoint to cache the file's content locally. In my tests, I observed an improvement when caching was disabled, which seems more in line with how Goofys operates. It might be good to compare the performance by disabling caching, to ensure we're comparing apples to apples. |
Hi @arsh I tried your suggestion but I'm not seeing much of a difference. Here are the results using the same code I shared above. I've also included the results from goofys (again same code) for comparison Goofys command:
|
I'm going to read 16,000 8MB files to observe the performance I get and will report back. Previously, I tested with 5,000 files and noticed some improvement by disabling caching. Could you run your test with caching disabled and logging enabled in MP? Please share the log file afterward. You can do this by running Mountpoint as follows:
More details on logging are here https://github.com/awslabs/mountpoint-s3/blob/main/doc/LOGGING.md#logging-to-a-file |
@arsh were you able to reproduce similar or different results? I ran my test with logging enabled, the log file was 1G. I've attached as much as github will allow me, let me know if it's useful to add the rest. |
Mountpoint for Amazon S3 version
mount-s3 1.7.2
AWS Region
us-west-2
Describe the running environment
Running in p4de EC2 instance on Amazon Linux 2 using instance profile credentials against an S3 Bucket in the same account.
Mountpoint options
What happened?
I am trying to read 16k small 8MB files in parallel from S3. I have been comparing Mountpoint-S3 and goofys. I am seeing a large difference in performance when using Mountpoint. With goofys I am able to read all the files in 44s, with Mountpoint it takes 478s. These timings are averaged over 5 test runs. Both goofys and Mountpoint are mounted to tmpfs file systems.
Relevant log output
No response
The text was updated successfully, but these errors were encountered: