-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wich opencl platforms does the opencl vp9 encoder work with? #2
Comments
Hi, This project supports only Mali-T6xx GPUs(OpenCL). All the performance optimization,validation etc., is done only for Mali GPUs. And it would work functionally on any OpenCL platform with Integrated GPUs such as Intel, though performance is not guaranteed on those platforms. It would not work in OpenCL platforms based on discrete cards such as Nvidia, AMD graphics cards. |
Hi, @ittiamvpx. You have issues at Could you please tell the current state of Thanks! |
Hi Kagami, The GPU acceleration of vp9 encoder in the repository libvpx-1 is targeted towards real time encoding presets only and particularly for specific cpu speeds. The workspace is under development but the package as is was tested on Integrated GPU's (Mali and Intel HD Graphics) for quality and performance and is stable. We did not test on discrete graphic cards but we believe that we did not do anything in particular that limits its usage only for Integrated GPU's, As of now we do not have any a road map towards support for discrete cards. Thanks |
Hi, @ram-mohan. Thanks for the answer.
I built the most recent commit of
Trace:
I have Nvidia GTX 970 with proprietary drivers. I also built version without multithreading and it segfaults inside
Ok, I understand. I may provide additional debug info of my configuration/built if needed though. Regards. |
Looking at the failure it seems that the application you are running is unable to open kernel files for compilation. In the file "vp9_eopencl.c" there is a macro called PREFIX_PATH. This path helps in locating the opencl kernel files. Try modifying this relative path to open *.cl files. See if build kernel calls made in function in vp9_eopencl_init() are successful. we recommend following configuration for encoding "./vpxenc --target-bitrate=1000 --ivf --rt --cpu-used=-6 --end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500 --buf-optimal-sz=600 --max-intra-rate=300 --limit=1000 --profile=0 --lag-in-frames=0 --min-q=2 --max-q=52 --passes=1 --kf-max-dist=99999 --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --sharpness=0 --error-resilient=1 --codec=vp9 --gf-cbr-boost=200 --frame-parallel=0 --aq-mode=3 /home/testclips/gipsrestat720p.y4m --threads=1 -o out.ivf" |
Thanks for your help! With that change: diff --git a/vp9/encoder/opencl/vp9_eopencl.c b/vp9/encoder/opencl/vp9_eopencl.c
index 8e3fabf..f560155 100644
--- a/vp9/encoder/opencl/vp9_eopencl.c
+++ b/vp9/encoder/opencl/vp9_eopencl.c
@@ -17,7 +17,7 @@
#if ARCH_ARM
#define PREFIX_PATH "./"
#else
-#define PREFIX_PATH "../../vp9/encoder/opencl/"
+#define PREFIX_PATH "./vp9/encoder/opencl/"
#endif
static const int pixel_rows_per_workitem_log2_pro_me = 4; I was able to successfully encode 1 frame of video with
Nice, thanks. It doesn't seem to include |
Yeah i notice, --gpu flag is missing. Sorry about that. There seems to be an assertion failure in function "vp9_enc_sync_gpu (file:vp9_egpu.c, line 395)". Can you share the Lvalue and Rvalue in the comparison made. |
|
For 720p content Rvalue 80 is as expected. But I am unable to make much out of the Lvalue. Can you please share the sizeof(GPU_OUTPUT_PRO_ME) structure on your platform and the actual difference 'gpu_output_buffer - cpi->gpu_output_pro_me_base' you are seeing In vp9_eopencl_alloc_buffers() memory needed for gpu interface buffers is allocated. Lines 431-465 represent allocation of a part of gpu output buffers that is currently under consideration. Looking at the buffer/sub-buffer creation and their cpu side map pointers is the key for solving this issue. As of now I do not have a set up similar that of yours to reproduce this issue. Once I get hold of it, i will look in to it. Thanks, |
I added debug prints near this line: diff --git a/vp9/encoder/vp9_egpu.c b/vp9/encoder/vp9_egpu.c
index cb0e945..4610c75 100644
--- a/vp9/encoder/vp9_egpu.c
+++ b/vp9/encoder/vp9_egpu.c
@@ -390,8 +390,20 @@ void vp9_enc_sync_gpu(VP9_COMP *cpi, ThreadData *td, int mi_row, int mi_row_step
const int size = cm->sb_cols * sb_row;
(void) size;
+ printf("BEFORE p1=%p p2=%p diff=%ld size=%d sizeof=%zu\n",
+ gpu_output_buffer,
+ cpi->gpu_output_pro_me_base,
+ (gpu_output_buffer - cpi->gpu_output_pro_me_base),
+ size,
+ sizeof(GPU_OUTPUT_PRO_ME));
egpu->acquire_output_pro_me_buffer(cpi, (void **) &gpu_output_buffer,
subframe_idx);
+ printf("AFTER p1=%p p2=%p diff=%ld size=%d sizeof=%zu\n",
+ gpu_output_buffer,
+ cpi->gpu_output_pro_me_base,
+ (gpu_output_buffer - cpi->gpu_output_pro_me_base),
+ size,
+ sizeof(GPU_OUTPUT_PRO_ME));
assert(gpu_output_buffer - cpi->gpu_output_pro_me_base == size);
}
if (mi_row - mi_row_step == subframe.mi_row_start && Output:
Seems like pointers are correct before
I'll try to look into it, thanks. |
Hi @ram-mohan , Got the same assertion error in vp9_enc_sync_gpu(). My understanding is that the vp9_opencl_map_buffer() doesn't generate continuous addresses for the mapped pointers from different sub-frames in the host memory. The reason might be clEnqueueMapBuffer() itself, or there are other host memory allocations during two map calls. Are there any particular reasons to consider those pointers as continuous? Thanks, |
Hi mingtotti, Yes we were able to reproduce this issue. Like you pointed out, the host pointers for different sub buffers were not contiguous. The assumption we made was out of general intuition. It seems that this assumption is not valid as per OpenCL specification. We have made the necessary changes from our side. We will push these changes soon. Thanks |
Hi Ram, That would be great! Thanks, |
Hello, I know, its over a year - but I try to be able to encode in vp9 with gpu acceleration. So I tried it with libvpx-1. I could compile it, and also start to encode with this: It run through the first process, but got then stock - as you see here: by the way - sometimes it stops later (but i added --best): When I use VP8, its different - but slow: Does anyone has any Idea to help me? Thanks for you help. |
For me the latest version also fails with sigsegv:
I use Intel HD Graphics and latest revision of Beignet as OpenCL library. |
I also tried to compile with nvidia-opencl provided in Ubuntu 18.04:
Unfortunately I get some errors on linking phase:
Am I doing something wrong or is it an issue of the ittiamvpx/libvpx-1? |
basing on the errors seen it seems that opencl library is not being linked properly. Having said that this library is not expected to work on devices with external graphic cards. we have noticed few issues when we tried on them. Those were never fixed. This library was tested on mali platforms. |
Thanks!
So it is likely that Ubuntu 18.04 provides a bogus opencl library for nvidia?
You say you tried them. Were you able to successfully encode anything at all? |
So it is likely that Ubuntu 18.04 provides a bogus opencl library for nvidia?. It was tested on mali platforms only. In nvidia even if the build was successfully I am assume you will see some crash behavior Further, the gpu acceleration done here is targeted towards very low end cpu/gpus like mobile phones. As you are running ubuntu 18.04 it has to be high end cpu. So gpu stuff here may not be of any use to you. |
I'm interested in parallelization of VP9 encoding. I have a high end CPU which does a good job, but I want to encode more video streams on the same machine. So the idea is to use a GPU in addition to CPU. CPU will encode VP9 streams and GPU will encode some other VP9 streams at the same time. If GPU is able to handle at least one 1080p30 VP9 encoding then it fits me. |
This workspace doesnt fit your requirements. |
What do you mean by workspace? |
i mean this project |
Why? Are you sure that GPU won't be able to handle 1080p30 VP9 encoding? I understand there are issues with using it with discrete GPUs, but in theory me or my colleagues can fix these. If what I want to try is possible at all then this project is a good point to start, isn't it? |
If i understand you correctly you want the GPU to encode an entire vp9 bitstream on its own. GPU's are not designed to do that. GPU's is not a parallel CPU. They are co-processors that do some tasks better than cpu and few other tasks worse than CPU. In the processing of vp9 encoding, we identify algorithms that can perform well on a gpu in comparison with cpu and these are offloaded to gpu. CPU does the actual encoding, but few portions of this encoding process is moved to GPU because they are better. This way we get some performance gain. In this project, we moved one or two modules in real-time encoding preset to gpu and we saw some gains in mali platforms. These gains are not universal. They may not be seen on other gpus. I am not certain if this is a good place to start. All i can say is GPU acceleration is really tricky. good luck.. |
Oh, I see now. Thanks for the explanation! So in theory a discrete GPU can offload some part of the encoding process from Intel CPU, but it will not necessary make an improvement, while the actual bottleneck can be different from what you had in your Mali case. So first of all I have to find out whether there is a bottleneck in a recent Intel CPU which can be resolved by offloading to GPU. And if there is not then the thing will not work. And if there is then I need to do the similar work as you did for libvpx but for my case. |
Hi, is it possible to use your opencl based vp9 encoder in ffmpeg on a regular x86-64 linux install?
Tried fedora with nvidia opencl on a 960 maxwell 2 gpu, was able to install and test opencl, but had errors when trying to encode vp9 with your libvpx version compiled into ffmpeg, crashed.
Sorry for asking, i was not enough a coder to debug this on my own! :)
The text was updated successfully, but these errors were encountered: