1. simd and RUSTFLAGS
simd(单指令流多数据流)支持与硬件紧密关联,在rust-lang/rust/src/librustc_target中对不同的硬件平台和操作系统进行了相应的支持。
为了让rust编译器根据不同的平台使用特定的指令集,可通过环境变量RUSTFLAGS来让编译器生成相应平台的相应指令集代码。
通过设置RUSTFLAGS="-C target-cpu=xxx"或RUSTFLAGS="-C target-features=+xxx"来分别指定CPU和指令集。
2. 查看当前Rust所支持simd
2.1 查看Rust所支持的平台
rustc --print target-list
与rust-lang/rust/src/librustc_target/spec中的*.rs文件对应。
aarch64-fuchsia aarch64-linux-android aarch64-pc-windows-msvc aarch64-unknown-cloudabi aarch64-unknown-freebsd aarch64-unknown-hermit aarch64-unknown-linux-gnu aarch64-unknown-linux-musl aarch64-unknown-netbsd aarch64-unknown-none aarch64-unknown-openbsd arm-linux-androideabi arm-unknown-linux-gnueabi arm-unknown-linux-gnueabihf arm-unknown-linux-musleabi arm-unknown-linux-musleabihf armebv7r-none-eabi armebv7r-none-eabihf armv4t-unknown-linux-gnueabi armv5te-unknown-linux-gnueabi armv5te-unknown-linux-musleabi armv6-unknown-freebsd armv6-unknown-netbsd-eabihf armv7-linux-androideabi armv7-unknown-cloudabi-eabihf armv7-unknown-freebsd armv7-unknown-linux-gnueabihf armv7-unknown-linux-musleabihf armv7-unknown-netbsd-eabihf armv7r-none-eabi armv7r-none-eabihf asmjs-unknown-emscripten i586-pc-windows-msvc i586-unknown-linux-gnu i586-unknown-linux-musl i686-apple-darwin i686-linux-android i686-pc-windows-gnu i686-pc-windows-msvc i686-unknown-cloudabi i686-unknown-dragonfly i686-unknown-freebsd i686-unknown-haiku i686-unknown-linux-gnu i686-unknown-linux-musl i686-unknown-netbsd i686-unknown-openbsd mips-unknown-linux-gnu mips-unknown-linux-musl mips-unknown-linux-uclibc mips64-unknown-linux-gnuabi64 mips64el-unknown-linux-gnuabi64 mipsel-unknown-linux-gnu mipsel-unknown-linux-musl mipsel-unknown-linux-uclibc mipsisa32r6-unknown-linux-gnu mipsisa32r6el-unknown-linux-gnu mipsisa64r6-unknown-linux-gnuabi64 mipsisa64r6el-unknown-linux-gnuabi64 msp430-none-elf nvptx64-nvidia-cuda powerpc-unknown-linux-gnu powerpc-unknown-linux-gnuspe powerpc-unknown-linux-musl powerpc-unknown-netbsd powerpc64-unknown-freebsd powerpc64-unknown-linux-gnu powerpc64-unknown-linux-musl powerpc64le-unknown-linux-gnu powerpc64le-unknown-linux-musl riscv32imac-unknown-none-elf riscv32imc-unknown-none-elf riscv64gc-unknown-none-elf riscv64imac-unknown-none-elf s390x-unknown-linux-gnu sparc-unknown-linux-gnu sparc64-unknown-linux-gnu sparc64-unknown-netbsd sparcv9-sun-solaris thumbv6m-none-eabi thumbv7a-pc-windows-msvc thumbv7em-none-eabi thumbv7em-none-eabihf thumbv7m-none-eabi thumbv7neon-linux-androideabi thumbv7neon-unknown-linux-gnueabihf thumbv8m.base-none-eabi thumbv8m.main-none-eabi thumbv8m.main-none-eabihf wasm32-experimental-emscripten wasm32-unknown-emscripten wasm32-unknown-unknown wasm32-unknown-wasi x86_64-apple-darwin x86_64-fortanix-unknown-sgx x86_64-fuchsia x86_64-linux-android x86_64-pc-windows-gnu x86_64-pc-windows-msvc x86_64-rumprun-netbsd x86_64-sun-solaris x86_64-unknown-bitrig x86_64-unknown-cloudabi x86_64-unknown-dragonfly x86_64-unknown-freebsd x86_64-unknown-haiku x86_64-unknown-hermit x86_64-unknown-l4re-uclibc x86_64-unknown-linux-gnu x86_64-unknown-linux-gnux32 x86_64-unknown-linux-musl x86_64-unknown-netbsd x86_64-unknown-openbsd x86_64-unknown-redox x86_64-unknown-uefi
2.2 查看Rust所支持平台的所支持的features(指令集)
# uname -a //查看当前系统平台 Linux zyd-VirtualBox 4.15.0-58-generic #64~16.04.1-Ubuntu SMP Wed Aug 7 14:10:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # rustc --target=x86_64-unknown-linux-gnu --print target-features
Available features for this target:
16bit-mode - 16-bit mode (i8086).
32bit-mode - 32-bit mode (80386).
3dnow - Enable 3DNow! instructions.
3dnowa - Enable 3DNow! Athlon instructions.
64bit - Support 64-bit instructions.
64bit-mode - 64-bit mode (x86_64).
adx - Support ADX instructions.
aes - Enable AES instructions.
atom - Intel Atom processors.
avx - Enable AVX instructions.
avx2 - Enable AVX2 instructions.
avx512bitalg - Enable AVX-512 Bit Algorithms.
avx512bw - Enable AVX-512 Byte and Word Instructions.
avx512cd - Enable AVX-512 Conflict Detection Instructions.
avx512dq - Enable AVX-512 Doubleword and Quadword Instructions.
avx512er - Enable AVX-512 Exponential and Reciprocal Instructions.
avx512f - Enable AVX-512 instructions.
avx512ifma - Enable AVX-512 Integer Fused Multiple-Add.
avx512pf - Enable AVX-512 PreFetch Instructions.
avx512vbmi - Enable AVX-512 Vector Byte Manipulation Instructions.
avx512vbmi2 - Enable AVX-512 further Vector Byte Manipulation Instructions.
avx512vl - Enable AVX-512 Vector Length eXtensions.
avx512vnni - Enable AVX-512 Vector Neural Network Instructions.
avx512vpopcntdq - Enable AVX-512 Population Count Instructions.
bmi - Support BMI instructions.
bmi2 - Support BMI2 instructions.
cldemote - Enable Cache Demote.
clflushopt - Flush A Cache Line Optimized.
clwb - Cache Line Write Back.
clzero - Enable Cache Line Zero.
cmov - Enable conditional move instructions.
cx16 - 64-bit with cmpxchg16b.
ermsb - REP MOVS/STOS are fast.
f16c - Support 16-bit floating point conversion instructions.
false-deps-lzcnt-tzcnt - LZCNT/TZCNT have a false dependency on destregister.
false-deps-popcnt - POPCNT has a false dependency on dest register.
fast-11bytenop - Target can quickly decode up to 11 byte NOPs.
fast-15bytenop - Target can quickly decode up to 15 byte NOPs.
fast-bextr - Indicates that the BEXTR instruction is implemented as a single uop with good throughput..
fast-gather - Indicates if gather is reasonably fast..
fast-hops - Prefer horizontal vector math instructions (haddp, phsub, etc.) over normal vector instructions with shuffles.
fast-lzcnt - LZCNT instructions are as fast as most simple integer ops.
fast-partial-ymm-or-zmm-write - Partial writes to YMM/ZMM registers are fast.
fast-scalar-fsqrt - Scalar SQRT is fast (disable Newton-Raphson).
fast-shld-rotate - SHLD can be used as a faster rotate.
fast-variable-shuffle - Shuffles with variable masks are fast.
fast-vector-fsqrt - Vector SQRT is fast (disable Newton-Raphson).
fma - Enable three-operand fused multiple-add.
fma4 - Enable four-operand fused multiple-add.
fsgsbase - Support FS/GS Base instructions.
fxsr - Support fxsave/fxrestore instructions.
gfni - Enable Galois Field Arithmetic Instructions.
glm - Intel Goldmont processors.
glp - Intel Goldmont Plus processors.
idivl-to-divb - Use 8-bit divide for positive values less than 256.
idivq-to-divl - Use 32-bit divide for positive values less than 2^32.
invpcid - Invalidate Process-Context Identifier.
lea-sp - Use LEA for adjusting the stack pointer.
lea-uses-ag - LEA instruction needs inputs at AG stage.
lwp - Enable LWP instructions.
lzcnt - Support LZCNT instruction.
macrofusion - Various instructions can be fused with conditional branches.
merge-to-threeway-branch - Merge branches to a three-way conditional branch.
mmx - Enable MMX instructions.
movbe - Support MOVBE instruction.
movdir64b - Support movdir64b instruction.
movdiri - Support movdiri instruction.
mpx - Support MPX instructions.
mwaitx - Enable MONITORX/MWAITX timer functionality.
nopl - Enable NOPL instruction.
pad-short-functions - Pad short functions.
pclmul - Enable packed carry-less multiplication instructions.
pconfig - platform configuration instruction.
pku - Enable protection keys.
popcnt - Support POPCNT instruction.
prefer-256-bit - Prefer 256-bit AVX instructions.
prefetchwt1 - Prefetch with Intent to Write and T1 Hint.
prfchw - Support PRFCHW instructions.
ptwrite - Support ptwrite instruction.
rdpid - Support RDPID instructions.
rdrnd - Support RDRAND instruction.
rdseed - Support RDSEED instruction.
retpoline - Remove speculation of indirect branches from the generated code, either by avoiding them entirely or lowering them with a speculation blocking construct..
retpoline-external-thunk - When lowering an indirect call or branch using a `retpoline`, rely on the specified user provided thunk rather than emitting one ourselves. Only has effect when combined with some other retpoline feature..
retpoline-indirect-branches - Remove speculation of indirect branches from the generated code..
retpoline-indirect-calls - Remove speculation of indirect calls from the generated code..
rtm - Support RTM instructions.
sahf - Support LAHF and SAHF instructions.
sgx - Enable Software Guard Extensions.
sha - Enable SHA instructions.
shstk - Support CET Shadow-Stack instructions.
slm - Intel Silvermont processors.
slow-3ops-lea - LEA instruction with 3 ops or certain registers is slow.
slow-incdec - INC and DEC instructions are slower than ADD and SUB.
slow-lea - LEA instruction with certain arguments is slow.
slow-pmaddwd - PMADDWD is slower than PMULLD.
slow-pmulld - PMULLD instruction is slow.
slow-shld - SHLD instruction is slow.
slow-two-mem-ops - Two memory operand instructions are slow.
slow-unaligned-mem-16 - Slow unaligned 16-byte memory access.
slow-unaligned-mem-32 - Slow unaligned 32-byte memory access.
soft-float - Use software floating point features..
sse - Enable SSE instructions.
sse-unaligned-mem - Allow unaligned memory operands with SSE instructions.
sse2 - Enable SSE2 instructions.
sse3 - Enable SSE3 instructions.
sse4.1 - Enable SSE 4.1 instructions.
sse4.2 - Enable SSE 4.2 instructions.
sse4a - Support SSE 4a instructions.
ssse3 - Enable SSSE3 instructions.
tbm - Enable TBM instructions.
tremont - Intel Tremont processors.
vaes - Promote selected AES instructions to AVX512/AVX registers.
vpclmulqdq - Enable vpclmulqdq instructions.
waitpkg - Wait and pause enhancements.
wbnoinvd - Write Back No Invalidate.
x87 - Enable X87 float instructions.
xop - Enable XOP instructions.
xsave - Support xsave instructions.
xsavec - Support xsavec instructions.
xsaveopt - Support xsaveopt instructions.
xsaves - Support xsaves instructions.
Use +feature to enable a feature, or -feature to disable it.
For example, rustc -C -target-cpu=mycpu -C target-feature=+feature1,-feature2
不同的CPU平台支持不同的指令集,可参见CPU指令集,Rust对指令集的选择通过-C target-features=+avx2来enable avx2指令集。注意,尽管所有支持AVX2的CPU都支持FMA,但是如果想同时使用AVX2和FMA,需明确enable,如-C target-features=+avx2,+fma。若想启用的指令集间有依赖关系,也需启用所有依赖的指令集。
2.3 查看Rust所支持平台的所支持的CPU
rustc --target=x86_64-unknown-linux-gnu --print target-cpus
Available CPUs for this target:
native - Select the CPU of the current host (currently skylake).
amdfam10 - Select the amdfam10 processor.
athlon - Select the athlon processor.
athlon-4 - Select the athlon-4 processor.
athlon-fx - Select the athlon-fx processor.
athlon-mp - Select the athlon-mp processor.
athlon-tbird - Select the athlon-tbird processor.
athlon-xp - Select the athlon-xp processor.
athlon64 - Select the athlon64 processor.
athlon64-sse3 - Select the athlon64-sse3 processor.
atom - Select the atom processor.
barcelona - Select the barcelona processor.
bdver1 - Select the bdver1 processor.
bdver2 - Select the bdver2 processor.
bdver3 - Select the bdver3 processor.
bdver4 - Select the bdver4 processor.
bonnell - Select the bonnell processor.
broadwell - Select the broadwell processor.
btver1 - Select the btver1 processor.
btver2 - Select the btver2 processor.
c3 - Select the c3 processor.
c3-2 - Select the c3-2 processor.
cannonlake - Select the cannonlake processor.
cascadelake - Select the cascadelake processor.
core-avx-i - Select the core-avx-i processor.
core-avx2 - Select the core-avx2 processor.
core2 - Select the core2 processor.
corei7 - Select the corei7 processor.
corei7-avx - Select the corei7-avx processor.
generic - Select the generic processor.
geode - Select the geode processor.
goldmont - Select the goldmont processor.
goldmont-plus - Select the goldmont-plus processor.
haswell - Select the haswell processor.
i386 - Select the i386 processor.
i486 - Select the i486 processor.
i586 - Select the i586 processor.
i686 - Select the i686 processor.
icelake-client - Select the icelake-client processor.
icelake-server - Select the icelake-server processor.
ivybridge - Select the ivybridge processor.
k6 - Select the k6 processor.
k6-2 - Select the k6-2 processor.
k6-3 - Select the k6-3 processor.
k8 - Select the k8 processor.
k8-sse3 - Select the k8-sse3 processor.
knl - Select the knl processor.
knm - Select the knm processor.
lakemont - Select the lakemont processor.
nehalem - Select the nehalem processor.
nocona - Select the nocona processor.
opteron - Select the opteron processor.
opteron-sse3 - Select the opteron-sse3 processor.
penryn - Select the penryn processor.
pentium - Select the pentium processor.
pentium-m - Select the pentium-m processor.
pentium-mmx - Select the pentium-mmx processor.
pentium2 - Select the pentium2 processor.
pentium3 - Select the pentium3 processor.
pentium3m - Select the pentium3m processor.
pentium4 - Select the pentium4 processor.
pentium4m - Select the pentium4m processor.
pentiumpro - Select the pentiumpro processor.
prescott - Select the prescott processor.
sandybridge - Select the sandybridge processor.
silvermont - Select the silvermont processor.
skx - Select the skx processor.
skylake - Select the skylake processor.
skylake-avx512 - Select the skylake-avx512 processor.
slm - Select the slm processor.
tremont - Select the tremont processor.
westmere - Select the westmere processor.
winchip-c6 - Select the winchip-c6 processor.
winchip2 - Select the winchip2 processor.
x86-64 - Select the x86-64 processor.
yonah - Select the yonah processor.
znver1 - Select the znver1 processor.
跨平台编译,需指定相应的CPU,若只是本地运行的话,可以直接export RUSTFLAGS="-C target_cpu=native"。
参考资料:
[1] https://rust-lang-nursery.github.io/packed_simd/perf-guide/target-feature/features.html
[2] https://github.com/rust-lang/rust/tree/master/src/librustc_target
[3] CPU指令集
