Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing video frame properties on the fly is not supported by all filters #3317

Closed
whoyao opened this issue May 6, 2023 · 7 comments
Closed
Labels

Comments

@whoyao
Copy link
Contributor

whoyao commented May 6, 2023

🐛 Describe the bug

I used NVIDIA hardware encoders (h264_nvenc) in the project. Compared with v2.0.1, the latest code will perform the following validation on the input tensor:

InitFunc init_func = [](const torch::Tensor& t, AVFrame* f) {
    validate_video_input(t, f, 4);
    return init_interlaced(t);
};

Compared with the old version, the accepted channel has changed from 3 to 4, which means that developers cannot input rgb24 format data and need to complete padding operations externally.
However, after the conversion is completed, our input has become rgb0, and rgb0 is not a legal input. (get_src_pix_fmt only accepts AV_PIX_FMT_GRAY8, AV_PIX_FMT_RGB24, AV_PIX_FMT_BGR24, and AV_PIX_FMT_YUV444P as legal inputs)
In order to pass the validation, I pretended that the input video format was rgb24, and the filter seemed to work. I dumped the structure of the filter. The filter was initialized with rgb24, but the actual input data was rgb0.
The dump result of the filter is as follows:

+----------+
|    in    |default--[1024x1024 1:1 rgb24]--Parsed_null_0:default
| (buffer) |
+----------+

                                                     +--------------+
Parsed_null_0:default--[1024x1024 1:1 rgb24]--default|     out      |
                                                     | (buffersink) |
                                                     +--------------+

                                          +---------------+
in:default--[1024x1024 1:1 rgb24]--default| Parsed_null_0 |default--[1024x1024 1:1 rgb24]--out:default
                                          |    (null)     |
                                          +---------------+

It seems that the output video is fine, but if you open the ffmpeg log, you can see the following warnings:

[in @ 0x755f1600] filter context - w: 1024 h: 1024 fmt: 2, incoming frame - w: 1024 h: 1024 fmt: 119 pts_time: NOPTS
[in @ 0x755f1600] Changing video frame properties on the fly is not supported by all filters.

fmt: 2 represents AV_PIX_FMT_RGB24, while fmt: 119 represents AV_PIX_FMT_CUDA. The true pix_fmt is AV_PIX_FMT_CUDA and was set in configure_hw_accel.

Therefore, I believe that this usage still has hidden problems. So I submit a PR to add AV_PIX_FMT_CUDA as a valid format.

Snippet to reproduce the error is provided below.

import os
import shutil
import av
import numpy as np
from fractions import Fraction
import torch
import torchaudio
import torch.nn.functional as F
from torchaudio.io import StreamWriter, StreamReader

VIDEO_FPATH = 'rtmp://xxx.xxx.xxx.xxx/live/test'
VIDEO_URL = 'test.mp4'

video_config = dict(
    frame_rate=25,
    width=256,
    height=256,
    hw_accel='cuda:0',
    encoder='h264_nvenc',
    encoder_format='rgb0',
    encoder_option={'gpu': '0'},
)
stream = StreamWriter(VIDEO_FPATH, format='flv')
stream.add_video_stream(**video_config)
stream.add_audio_stream(sample_rate=44100, num_channels=2, encoder="aac")
stream_handler = stream.open()
streamer = StreamReader(VIDEO_URL)
streamer.add_basic_audio_stream(frames_per_chunk=44100,sample_rate=44100)
streamer.add_basic_video_stream(frames_per_chunk=25,frame_rate=25,width=256,height=256,format="rgb24")

for i, (audio_chunk, frame) in enumerate(streamer.stream()):
    if audio_chunk is not None:
        stream_handler.write_audio_chunk(1, audio_chunk)
    if len(frame.shape) == 3:
        frame = frame.unsqueeze(0)
    frame = F.pad(frame, (0, 0, 0, 0, 0, 1), 'constant', 0)
    stream_handler.write_video_chunk(0, frame.to('cuda:0'))

Versions

05/04/23 nightly (1e48af0)

Collecting environment information...
PyTorch version: 2.1.0a0+git979c5b4
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.31

Python version: 3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-126-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
GPU 5: Tesla V100-SXM2-32GB
GPU 6: Tesla V100-SXM2-32GB
GPU 7: Tesla V100-SXM2-32GB

Nvidia driver version: 510.85.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 1
Core(s) per socket: 40
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
Stepping: 5
CPU MHz: 2500.000
BogoMIPS: 5000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 2.5 MiB
L1i cache: 2.5 MiB
L2 cache: 320 MiB
L3 cache: 71.5 MiB
NUMA node0 CPU(s): 0-39
NUMA node1 CPU(s): 40-79
Vulnerability Itlb multihit: KVM: Vulnerable
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni

Versions of relevant libraries:
[pip3] flake8==5.0.4
[pip3] flake8-bugbear==22.9.11
[pip3] flake8-comprehensions==3.10.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] pytorch-lightning==1.2.4
[pip3] pytorch-msssim==0.2.1
[pip3] pytorch3d==0.7.1
[pip3] torch==2.1.0a0+git979c5b4
[pip3] torchaudio==2.1.0a0+1e48af0
[pip3] torchfile==0.1.0
[pip3] torchvision==0.16.0a0+fc377d0
[pip3] triton==2.0.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.6.0 hecad31d_10 conda-forge
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.23.1 py39h6c91a56_0
[conda] numpy-base 1.23.1 py39ha15fc14_0
[conda] pytorch-lightning 1.2.4 pypi_0 pypi
[conda] pytorch-msssim 0.2.1 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pytorch3d 0.7.1 pypi_0 pypi
[conda] torch 2.1.0a0+git979c5b4 dev_0
[conda] torchaudio 2.1.0a0+1e48af0 dev_0
[conda] torchfile 0.1.0 pypi_0 pypi
[conda] torchvision 0.16.0a0+fc377d0 dev_0
[conda] triton 2.0.0 pypi_0 pypi

@whoyao
Copy link
Contributor Author

whoyao commented May 6, 2023

#3318 may resolve this, please review.

@mthrok
Copy link
Collaborator

mthrok commented May 8, 2023

Hi @whoyao

Thanks for the report and detailed description. I agree that there is something not quite right about the GPU encoder.
The lack of automated test is preventing us from catching edge cases.

Questions:

The changes made to StreamWriter after 2.0.1 are mainly to allow users to pass custom filter graphs which allows on-the-fly transformation.
However, CUDA is not yet support in this custom filter graph. So that the main filtering is always null, but the input/output to the filter graph is configured same way as CPU. Thus FFmpeg is warning that the data passed to filter graph [in] does not match with what the actual AVFrame being processed. Fortunately or unfortunately, the filter is null, so FFmpeg does not seem to raise and error, and allow to let it pass, thus the waning.

In terms of the fix, I think that the proper fix is to make filter graph support CUDA, but just for the sake of resolving the warning, I think it is more appropriate to overwrite pixel format for the [in] filter context of AVFilterGraph to have CUDA.
The reasons for this is

  1. In get_video_encode_process, src_fmt and enc_fmt are supposed to have values of software pixel format, and whether CUDA should be used or not should be communicated through hw_accel variable. This is to make the distinction between CPU and CUDA code path clear and consistant.
  2. This way, the special override for CUDA acceleration, (due to the lack of CUDA support in filter graph), is localized in get_video_filter_graph function. Therefore I think it'll be easier to maintain later.

FilterGraph f;
f.add_video_src(
src_fmt, av_inv_q(src_rate), src_rate, src_width, src_height, {1, 1});

So I think, just changing the src_fmt at L662 to (is_cuda ? AV_PIX_FMT_CUDA : src_fmt) would do. What do you think?

@whoyao
Copy link
Contributor Author

whoyao commented May 9, 2023

Hi, @mthrok. Thank you for your thoughtful response.
I will address your questions one by one.

Q1

Yes, my old code no longer works.

In version 2.0.1, the following code has actually completed the padding work.

const auto num_channels_buffer = num_channels + (pad_extra ? 1 : 0);
using namespace torch::indexing;
torch::Tensor buffer =
torch::empty({height, width, num_channels_buffer}, frames.options());
size_t spitch = width * num_channels_buffer;
for (int i = 0; i < num_frames; ++i) {
// Slice frame as HWC
auto chunk = frames.index({i}).permute({1, 2, 0});
buffer.index_put_({"...", Slice(0, num_channels)}, chunk);

However, these codes disappeared in the latest version.
Therefore, I have to complete the padding work myself. This part of the code is indeed incompatible. Previously, only 3 channels were required to input, now 4 channels must be input.

Q2

Yes, I am trying to fix the issue.

I agree with your approach. It is true that src_fmt should not have been set to AV_PIX_FMT_CUDA.

In my case, the actual src_fmt is AV_PIX_FMT_RGB0 rather than AV_PIX_FMT_RGB24.
Because I manually completed the padding, it was natural for me to set src_fmt to AV_PIX_FMT_RGB0 when calling add_video_stream. However, this operation is not allowed, which is still confusing.

@nateanl nateanl added the triaged label May 9, 2023
@mthrok
Copy link
Collaborator

mthrok commented May 9, 2023

@whoyao

Thanks for the reply. According to my understanding then, allowing AV_PIX_FMT_RGB0 in get_src_pix_fmt is another missing peace in #3318. Can you update the PR that way? then I think I can merge it.

Regarding passing RGB24 Tensor, (like in the previous version), the reason it was changed is that so that the behavior of the StreamWriter is consistent across the formats (though the family of YUV format completely falls out of this). I guess we can add back that logic. It's a difficult line to draw when designing a library, but asking client code to pad manually requires more memory to be wasted, which is not desirable.

@mthrok
Copy link
Collaborator

mthrok commented Jun 8, 2023

@whoyao Do you plan to update #3318 according to my comments? I am thinking to work on the regression mentioned here and base on your fix if you plan to update #3318, otherwise, I will just include the suggestions in my fix.

@whoyao
Copy link
Contributor Author

whoyao commented Jun 8, 2023

@whoyao Do you plan to update #3318 according to my comments? I am thinking to work on the regression mentioned here and base on your fix if you plan to update #3318, otherwise, I will just include the suggestions in my fix.

I think your fix is better than mine, I will close my pull request 😊

@mthrok
Copy link
Collaborator

mthrok commented Jun 8, 2023

#3428 will fix the regression. One should be able to pass regular RGB tensor when encoding is RGB0.

facebook-github-bot pushed a commit that referenced this issue Jun 9, 2023
Summary:
StreamWriter's encoding pipeline looks like the following

1. convert tensor to AVFrame
2. pass AVFrame to AVFilter
3. pass the resulting AVFrame to AVCodecContext (encoder) and AVFormatContext (muxer)

When dealing with CUDA tensor, the AVFilter becomes no-op, as we have not added support for CUDA-compatible filters.

When CUDA frame is passed, the existing solution passes the software pixel format to AVFilter, which issues warning later as what AVFilter sees is AV_PIX_FMT_CUDA.

Since the filter itself is no-op, it functions as expected. But this commit fixes it.

See #3317

Pull Request resolved: #3426

Differential Revision: D46562370

Pulled By: mthrok

fbshipit-source-id: ce0131f1e50bcc826ee036fc0f35db2a5162b660
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants