-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Implement approximate mode #440
Draft
scotts
wants to merge
13
commits into
pytorch:main
Choose a base branch
from
scotts:approx
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+520
−259
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Meta Open Source bot.
label
Dec 20, 2024
Initial performance numbers are encouraging. I generated a video with the following command:
That produced a 141 MB video file. I then ran our standard benchmark with:
And that yields:
Some explanations of what the options are:
For the full matrix, we could change the seek modes for the core options. Approximate mode is basically meeting our performance expectations here. |
scotts
commented
Dec 20, 2024
scotts
commented
Dec 20, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[WIP]
Comprehensive implementation of approximate mode. Modifies the C++ and Python sides to match. Adds parameters to existing tests, but does not add any new tests.
I know that doing all of the changes in one go - C++, core ops, Python public interface - is a bear to review. But I wanted to quickly get to a state where we could compare performance. Also, because the approach I took makes the decoder have two different modes of operation, it requires changes everywhere.
Starting with the C++
VideoDecoder
layer:enum
calledSeekMode
:exact
implies doing a full file scan, and we can use all of the prior algorithms for getting ranges of frames.approximate
implies not doing a full file scan, and we have to rely on the average FPS to calculate indices. We also assume that the minimum pts in seconds is 0, and the maximum is the duration. If there is no duration or average FPS in the metadata, that's an error. (TODO: make sure the C++ throws on all of those conditions.)VideoDecoder
, and it applies to all streams. I considered trying to make it apply only when we add a stream, but because we want it to control whether or not we do a full file scan, I made it a property of the entire decoder. We could potentially make it apply per stream, and do a lazy scan if anexact
mode stream is added.exact
. I modified a bunch of our core ops tests to reflect this fact, as it is now redundant to requestexact
seek mode and manually do a full file scan.[x, y, z]
, we will turn that into indices[a, b, c]
based on the average fps. If the average fps is wrong, or if the video has a variable frame rate, then that mapping may be wrong. In principle, we could have returned the correct frames.In the Python public API, we add:
To the Python
VideoDecoder
constructor. Not much logic actually changes here.We do have a lot of changes in the Python metadata to support the
VideoDecoder
:begin_stream_seconds
->begin_stream_seconds_from_content
: This was always only from the full scan. We need to be explicit now.begin_stream_seconds
is now a property that ifbegin_stream_seconds_from_content
is none, is just 0.end_stream_seconds
->end_stream_seconds_from_content
: Same reasoning as above.end_stream_seconds
is now a property that ifend_stream_seconds_from_content
is none returns theduration_seconds
.By changing the metadata in that manner, the Python
VideoDecoder
class remains mostly unchanged as the real logic is in resolving what metadata to use. Note that the metadata class has no knowledge of the seek mode; it just knows if some values are missing.On defaults, the current implementation has
exact
mode as the default everywhere. We could, however, decide to apply different defaults at each level. The levels we can apply defaults are:VideoDecoder
class. In some ways, the old behavior wasapproximate
as default, since scanning the file did not automatically happen.create_decoder_from
family of functions. Similar to the C++, the old behavior was closer toapproximate
as default.VideoDecoder
constructor. The old behavior was definitelyexact
as we always scanned.