Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Tensorflow model via TVM on OpenCL #143

Open
marcin-sielski opened this issue Apr 20, 2020 · 6 comments
Open

Running Tensorflow model via TVM on OpenCL #143

marcin-sielski opened this issue Apr 20, 2020 · 6 comments

Comments

@marcin-sielski
Copy link

I am experiencing following error on Rasberry Pi while trying to run tensorflow model over TVM on OpenCL.
terminate called after throwing an instance of 'vc4c::CompilationError'
what(): Normalizer: Invalid local type for memory area: (p) i32* %tmp.3262

@doe300
Copy link
Owner

doe300 commented Apr 21, 2020

Can you give me more details about what were you trying to do?

  • What tensorflow sources did you build?
  • What command did you run?
  • If possible, please post any log previous to that error.

@marcin-sielski
Copy link
Author

marcin-sielski commented Apr 21, 2020

Hi @doe300,

My apologies but I am new to the OpenCL and the whole TVM stack. I will try to explain in more details what I did.
I have compiled VC4C, VC4CL, VC4CLStdLib, TVM (full package) and installed it on Raspberry Pi together with Tensorflow and tried to follow the tutorial at:
https://opensourceforu.com/2019/06/the-capabilities-of-tensor-virtual-machine-an-open-deep-learning-compiler-stack/
The tutorial has some drawbacks but I was able to walk through most of the example:

  1. I have imported and compiled the mobilenet into relay representation for "opencl" (example uses llvm) - Step 3
  2. I have saved the compiler output - Step 4
  3. I have executed successfully most of the steps from Step 5 (tvm.cl() instead of tvm.cpu(0))
    While trying to execute the model:
# Execute the Model
module.run()
# Get the first output
out = module.get_output(0)

I get

terminate called after throwing an instance of 'vc4c::CompilationError'
  what():  Normalizer: Invalid local type for memory area: (p) i32* %tmp.326

Thank you for help.

Best Regards

Marcin Sielski

PS. I managed to execute entire example on Intel/beignet successfully.

@marcin-sielski
Copy link
Author

Hi @doe300,

Do you have any ideas how to troubleshoot the issue?

I wonder what the message 'Invalid local type for memory area: (p) i32* %tmp.326' mean.

Thank you for help in advance.

Best Regards

Marcin Sielski

@doe300
Copy link
Owner

doe300 commented May 23, 2020

The error means that the OpenCL kernel accesses memory in a way that is not yet supported by the VC4C compiler.

Without knowing the actual kernel code, there is not much that can be done here. If you compile the VC4CL host-side stack in debug mode (with the CMake variable BUILD_DEBUG set), the kernel code which fails to compile will be dumped into a file (look for lines in the standard output like "Dumping program sources to (tmp/vc4c-source-xxx.cl") which the could be used to analyze the problem.

@marcin-sielski
Copy link
Author

@doe300
I finally managed to generate kernel code. I am not sure where too look at:

vc4cl-source-1804289383.cl.tar.gz

Thank you for help in advance.

Best Regards

Marcin Sielski

@doe300
Copy link
Owner

doe300 commented Jun 28, 2020

Thanks for the source code.

I do not get the error reported above, but I do get some register-association errors. I will have a look into them.

doe300 added a commit that referenced this issue Jun 29, 2020
This fixup-step will try to group all scalar (including pointer) locals
into group vectors to reduce the register pressure.

Also removes the obsolete register-resolver round maximum.

Effects:
* fixes register association error for #143
* drastically increases number of instructions where applied
doe300 added a commit that referenced this issue Jun 29, 2020
This fixup-step will try to group all scalar (including pointer) locals
into group vectors to reduce the register pressure.

Also removes the obsolete register-resolver round maximum.

Effects:
* fixes register association error for #143
* drastically increases number of instructions where applied
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants