Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #25838

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Shawnsuun
Copy link

What is the purpose of the change

Currently, when a Flink job finishes, it writes an archive as a single file that maps paths to JSON files. Flink History Server (FHS) job archives are pulled locally to where the FHS is running. This process creates a local directory structure that scales inefficiently as the number of jobs increases.

Key Problems

  • High inode usage in the file system due to nested directories for job archives.
  • Slower data retrieval and bottlenecks in job archive navigation at scale.
  • Challenges due to limited file system scalability.

Proposed Solution

Integrating RocksDB, a high-performance embedded database, as an alternative storage backend for job archives. RocksDB provides:

  • Faster job data retrieval.
  • Reduced inode consumption.
  • Enhanced scalability, especially in containerized environments.

The integration of RocksDB is implemented as a pluggable backend. The current file system storage remains intact, while RocksDB serves as an optional alternative for efficient storage and retrieval of job archives.


Brief Change Log

1. KVStore Interface

  • Introduced KVStore as an abstraction for key-value storage systems to enable flexible storage backends.
  • Added basic CRUD operations and advanced capabilities for managing job archives.

2. RocksDB Integration

  • Implemented HistoryServerRocksDBKVStore as the RocksDB-based implementation of the KVStore interface.
  • Mapped the hierarchical file-based job archive structure into key-value pairs for efficient storage and retrieval.

3. ArchiveFetcher Abstraction and Improvements

  • Introduced ArchiveFetcher as an abstract class to support multiple backends for job archive fetching.
  • Updated HistoryServerArchiveFetcher for file-based systems.
  • Created HistoryServerKVStoreArchiveFetcher to fetch job archives using RocksDB.

4. ServerHandler Abstraction and Improvements

  • Designed HistoryServerServerHandler as an abstract base class for handling HTTP requests, supporting pluggable backends.
  • Updated HistoryServerStaticFileServerHandler for file-based job archive serving.
  • Implemented HistoryServerKVStoreServerHandler to serve job data from RocksDB via REST APIs.

5. HistoryServer Updates

  • Modified HistoryServer to integrate the KVStore interface and support RocksDB as a pluggable backend.
  • Added configuration options in HistoryServerOptions to toggle between file-based and RocksDB storagen:
  • Add the following configuration options in your flink-conf.yaml file to enable RocksDB as the storage backend for the History Server.
    historyserver.storage.backend: kvstore

Verifying this change

This change added tests and can be verified as follows:

1. Testing

  • Unit Tests:

    • Added FhsRocksDBKVStoreTest to validate CRUD operations and resource cleanup for RocksDB.
    • Added HistoryServerKVStoreArchiveFetcherTest to ensure correct fetching and processing of job archives from RocksDB.
  • Integration Tests:

    • Built a Flink binary and configured flink-conf.yaml to test both file-based and RocksDB backends.
    • Verified archive retrieval via the History Server web UI and ensured backward compatibility with the file-based backend.
  • End-to-End Tests:

    • Conducted tests in a Kubernetes cluster with both RocksDB and file-based storage backends.
    • Verified correct behavior of the History Server in processing and displaying job archives for both storage backends in a real-world setup.

2. Performance Enhancements

  • Faster Archive Retrieval: Achieved a 4.25x improvement in fetching and processing archives with RocksDB compared to the traditional file system (tested in a production environment).
    • File system: 17 minutes for 100 archives.
    • RocksDB: 4 minutes for 100 archives.
  • Reduced Inode Usage: Reduced inode consumption by over 99.99%.
    • File system: Over 20 million inodes.
    • RocksDB: Only 79 inodes.
  • Lower Storage Usage: Achieved a 95.6% reduction in storage usage.
    • File system: 48 GB for 100 archives.
    • RocksDB: 2.1 GB for 100 archives.

These enhancements significantly improve scalability, reduce resource overhead, and make the History Server more responsive for large-scale deployments.


Does this pull request potentially affect one of the following parts:

  • Dependencies: No (using existing RocksDB dependency).
  • Public API: No.
  • Serializers: No.
  • Performance-sensitive code paths: Yes (job archive storage and retrieval).
  • Deployment or recovery: Yes (affects FHS deployment with the RocksDB backend option).
  • File system connectors: No.

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (not documented)

@Shawnsuun Shawnsuun marked this pull request as ready for review December 22, 2024 10:30
@flinkbot
Copy link
Collaborator

flinkbot commented Dec 22, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@venkata91
Copy link
Contributor

Thanks @Shawnsuun for adding pluggable storage backend and a new RocksDB storage backend support. This is a great addition.

At scale, we also see the current set of issues with the file/directory based storage backend for the HistoryServer especially with respect to the quick exhaustion of inodes.

Flink OSS has a formal process called FLIP - Flink Improvement proposals where any new designs are vetted by the community and goes for a voting process. See FLIP improvement proposals wiki to know more.

I am happy to help with reviewing this code. Thanks again!

@venkata91
Copy link
Contributor

@Shawnsuun There are test failures, can you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants