Skip to content

Latest commit

 

History

History
44 lines (35 loc) · 2.1 KB

README.md

File metadata and controls

44 lines (35 loc) · 2.1 KB

Julia fractal generator for rust.

Project started when adding huge image support in blender image engine. During development it wasn't easy to generate/download huge images that were also interesting to show.

A second goal of the project is to exercise at writing vectorization code (AVX1/AVX2) in assembly. Rust is mostly used for IO and outer loops. There are some rust native kernels for time comparison.

Results running on an Intel(R) Core(TM) i7-8550U CPU (Slowest on top).

test benchmark::bench_native_f64         ... bench:  26,907,733 ns/iter (+/- 1,039,052)
test benchmark::bench_cpu_f64            ... bench:  26,821,107 ns/iter (+/- 971,015)
test benchmark::bench_cpu_f32            ... bench:  22,525,174 ns/iter (+/- 961,333)
test benchmark::bench_native_f32         ... bench:  22,246,048 ns/iter (+/- 1,016,533)
test benchmark::bench_asm_xmm_f64_scalar ... bench:  21,793,094 ns/iter (+/- 929,025)
test benchmark::bench_asm_xmm_f32_scalar ... bench:  21,760,044 ns/iter (+/- 1,005,370)
test benchmark::bench_asm_xmm_f64_packed ... bench:  12,938,314 ns/iter (+/- 685,412)
test benchmark::bench_asm_ymm_f64_packed ... bench:   7,932,974 ns/iter (+/- 401,188)
test benchmark::bench_asm_xmm_f32_packed ... bench:   7,837,280 ns/iter (+/- 429,013)
test benchmark::bench_asm_ymm_f32_packed ... bench:   4,821,649 ns/iter (+/- 338,150)

Remarkably double precision is slower in rust. This could be to bad vectorization. Still need to have a look at the generated assembly. Modern CPU only calculate in f64 precision. To support f32 it uses bit sizzling inside the CPU.

The best kernel (performance vs precision) is julia_ymm_f64_packed. The kernel only support calculating a multiple of the scalar packing number of items.

xmm_f32 = 4
xmm_f64 = 2
ymm_f32 = 8
ymm_f64 = 4

The kernels are optimized for readability. The kernels can still be improved performance wise.

Note: That this has been developed on a Linux OS and hasn't been tested on other OS's. Other OS's and linkers require different stack management. Note: Will only compile and run on AVX2 X86 processors. There isn't any check if your CPU is supported.