[FLINK-36429] [runtime-web] Enhancing Flink History Server File Storage and Retrieval with RocksDB #25838
+2,233
−192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Currently, when a Flink job finishes, it writes an archive as a single file that maps paths to JSON files. Flink History Server (FHS) job archives are pulled locally to where the FHS is running. This process creates a local directory structure that scales inefficiently as the number of jobs increases.
Key Problems
Proposed Solution
Integrating RocksDB, a high-performance embedded database, as an alternative storage backend for job archives. RocksDB provides:
The integration of RocksDB is implemented as a pluggable backend. The current file system storage remains intact, while RocksDB serves as an optional alternative for efficient storage and retrieval of job archives.
Brief Change Log
1. KVStore Interface
KVStore
as an abstraction for key-value storage systems to enable flexible storage backends.2. RocksDB Integration
HistoryServerRocksDBKVStore
as the RocksDB-based implementation of theKVStore
interface.3. ArchiveFetcher Abstraction and Improvements
ArchiveFetcher
as an abstract class to support multiple backends for job archive fetching.HistoryServerArchiveFetcher
for file-based systems.HistoryServerKVStoreArchiveFetcher
to fetch job archives using RocksDB.4. ServerHandler Abstraction and Improvements
HistoryServerServerHandler
as an abstract base class for handling HTTP requests, supporting pluggable backends.HistoryServerStaticFileServerHandler
for file-based job archive serving.HistoryServerKVStoreServerHandler
to serve job data from RocksDB via REST APIs.5. HistoryServer Updates
HistoryServer
to integrate theKVStore
interface and support RocksDB as a pluggable backend.HistoryServerOptions
to toggle between file-based and RocksDB storagen:Verifying this change
This change added tests and can be verified as follows:
1. Testing
Unit Tests:
FhsRocksDBKVStoreTest
to validate CRUD operations and resource cleanup for RocksDB.HistoryServerKVStoreArchiveFetcherTest
to ensure correct fetching and processing of job archives from RocksDB.Integration Tests:
flink-conf.yaml
to test both file-based and RocksDB backends.End-to-End Tests:
2. Performance Enhancements
These enhancements significantly improve scalability, reduce resource overhead, and make the History Server more responsive for large-scale deployments.
Does this pull request potentially affect one of the following parts:
Documentation