Remove redundant instructions for checking for-loop-exit-condition #22

nomaddo · 2017-12-21T08:21:45Z

The for-loop creates redundant instructions in the instructions of checking exit condition.

ex)

kernel void loop1 (global float a[], global float b[]) {
  for (int i = 0; i < 1000; i++) {
    a[i] = -i;
    b[i] = -i;
  }
}

// Module with 1 kernels, global data with 0 words (64-bit each), starting at offset 1 words and 0 words of stack-frame
// Kernel 'loop1' with 67 instructions, offset 2, with following parameters: __global out float* a (4 B, 1 items), __global out float* b (4 B, 1 items)
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or -, unif, unif
or ra0, unif, unif
or r2, unif, unif
or r1, 0 (0), 0 (0)
nop.never 
or r0, r1, r1
sub r0, 0 (0), r0
itof r3, r0; v8min r0, r1, r1
shl r0, r0, 2 (2)
add r0, ra0, r0
or -, mutex_acq, mutex_acq
ldi vpw_setup, 6656
or vpm, r3, r3
ldi vpw_setup, 2155954176
ldi vpw_setup, 3221291008
or vpw_addr, r0, r0
or r0, r1, r1
or r1, r1, r1
shl r0, r0, 2 (2)
add r1, r1, 1 (1)
add r0, r2, r0
ldi ra1, 1000
or -, vpw_wait, vpw_wait
or mutex_rel, 1 (1), 1 (1)
or -, mutex_acq, mutex_acq
ldi vpw_setup, 6656
or vpm, r3, r3
ldi vpw_setup, 2155954176
ldi vpw_setup, 3221291008
or vpw_addr, r0, r0
or -, vpw_wait, vpw_wait
or mutex_rel, 1 (1), 1 (1)
xor.setf -, r1, ra1                                             /// redundant ?
xor.ifzc r0, 1 (1), 1 (1); v8min.ifz r0, 1 (1), 1 (1)           /// 
or.setf -, r0, r0                                               ///
or.ifz r1, r1, r1                                               ///
or.setf -, elem_num, r0                                         ///
brr.ifallzc (pc+4) + 4
nop.never 
nop.never 
nop.never 
brr.ifanyz (pc+4) + -40
nop.never 
nop.never 
nop.never 
or r0, unif, unif
or.setf -, elem_num, r0
brr.ifallzc (pc+4) + -63
nop.never 
nop.never 
nop.never 
not irq, qpu_num
nop.thrend.never 
nop.never 
nop.never

To me, above instructions (specifying ///) can be expressed as isub.setf -, r1, ra1.
Then, if the condition is negative, jump to the head of loop. Otherwise go through.

The text was updated successfully, but these errors were encountered:

doe300 · 2017-12-21T11:30:40Z

xor.setf -, r1, ra1 /// redundant ?
xor.ifzc r0, 1 (1), 1 (1); v8min.ifz r0, 1 (1), 1 (1) ///

These two lines calculate the condition (bool r0 = i == 1000). In this particular case, we don't really need the value of the condition stored in any register, the correct flags set would be enough.

or.setf -, r0, r0 ///
or.ifz r1, r1, r1 ///

These two lines are generated by the phi-node for i (something like setting the i of the next iteration to the i of the previous one). or.ifz r1, r1, r1 is of course redundant, with or without using flags, which also makes the previous instruction or.setf -, r0, r0 unnecessary.

or.setf -, elem_num, r0 ///

This instruction is required by the subsequent conditional branches to set the flags correctly. This could be removed, if the last instruction before setting flags already sets the flags for the same source value (and for all SIMD elements!!), which is the case here.

To sum it up:
You are right, with additional optimizations, this could be reduced to something like:

xor.setf -, r1, ra1
xor.ifzc.setf -, 1 (1), 1 (1); v8min.ifz.setf -, 1 (1), 1 (1)
brr.ifallzc (pc+4) + 4
...

nomaddo · 2017-12-21T11:48:21Z

The output of clang is as follows:

; Function Attrs: nounwind
define spir_kernel void @loop1(float addrspace(1)* nocapture %a, float addrspace(1)* nocapture %b) #1 {
entry:
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.09 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %sub = sub nsw i32 0, %i.09
  %conv = sitofp i32 %sub to float
  %arrayidx = getelementptr inbounds float addrspace(1)* %a, i32 %i.09
  store float %conv, float addrspace(1)* %arrayidx, align 4, !tbaa !10
  %arrayidx3 = getelementptr inbounds float addrspace(1)* %b, i32 %i.09
  store float %conv, float addrspace(1)* %arrayidx3, align 4, !tbaa !10
  %inc = add nuw nsw i32 %i.09, 1
  %exitcond = icmp eq i32 %inc, 1000
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body
  ret void
}

One to one translation is a little bit hard to do such optimization.
But if we can handle icmp eq ... and br ... in the same time, better optimization can be implemented in translation phase to vc4.

We have two choice:

Optimize it as peephole
Optimize it through translation

Peephole optimization might be general (it can be used for other bit-operation optimization), but latter is easier.

nomaddo · 2017-12-21T12:19:15Z

One more thing: to jump neightbor basic_block, this cod output jump instruction.
brr.ifallzc (pc+4) + 4 can be deleted in this case.

doe300 · 2017-12-21T12:24:30Z

You are right, with additional optimizations, this could be reduced to something like:

xor.setf -, r1, ra1
xor.ifzc.setf -, 1 (1), 1 (1); v8min.ifz.setf -, 1 (1), 1 (1)
brr.ifallzc (pc+4) + 4
...

Scratch that. According to here, we cannot set flags on both ALUs depending on the condition of their execution, thus xor.ifzc.setf -, 1 (1), 1 (1); v8min.ifz.setf -, 1 (1), 1 (1) won't work. Also the or.ifz r1, r1, r1 is original a i32 %18 = i32 %17 and therefore the same issue as discussed in #14.

nomaddo · 2018-01-18T08:50:18Z

@doe300 you already done it? If so, let's close the issue.

doe300 · 2018-01-18T08:57:38Z

No, I haven't found a fitting optimization yet.

nomaddo added the enhancement label Dec 21, 2017

nomaddo changed the title ~~Remov redundant instructions for checking for-loop-exit-condition~~ Remove redundant instructions for checking for-loop-exit-condition Dec 21, 2017

doe300 self-assigned this Dec 21, 2017

doe300 added a commit that referenced this issue Dec 21, 2017

Now also eliminates useless parts of either-or-branches (see #22)

0d9ef10

doe300 removed their assignment Jan 5, 2018

doe300 added the optimization related to an optimization step label Apr 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant instructions for checking for-loop-exit-condition #22

Remove redundant instructions for checking for-loop-exit-condition #22

nomaddo commented Dec 21, 2017 •

edited

Loading

doe300 commented Dec 21, 2017 •

edited

Loading

nomaddo commented Dec 21, 2017

nomaddo commented Dec 21, 2017 •

edited

Loading

doe300 commented Dec 21, 2017

nomaddo commented Jan 18, 2018

doe300 commented Jan 18, 2018 •

edited

Loading

Remove redundant instructions for checking for-loop-exit-condition #22

Remove redundant instructions for checking for-loop-exit-condition #22

Comments

nomaddo commented Dec 21, 2017 • edited Loading

doe300 commented Dec 21, 2017 • edited Loading

nomaddo commented Dec 21, 2017

nomaddo commented Dec 21, 2017 • edited Loading

doe300 commented Dec 21, 2017

nomaddo commented Jan 18, 2018

doe300 commented Jan 18, 2018 • edited Loading

nomaddo commented Dec 21, 2017 •

edited

Loading

doe300 commented Dec 21, 2017 •

edited

Loading

nomaddo commented Dec 21, 2017 •

edited

Loading

doe300 commented Jan 18, 2018 •

edited

Loading