8282875: AArch64: [vectorapi] Optimize Vector.reduceLane for SVE 64/128 vector size
This patch speeds up add/mul/min/max reductions for SVE for 64/128 vector size. According to Neoverse N2/V1 software optimization guide[1][2], for 128-bit vector size reduction operations, we prefer using NEON instructions instead of SVE instructions. This patch adds some rules to distinguish 64/128 bits vector size with others, so that for these two special cases, they can generate code the same as NEON. E.g., For ByteVector.SPECIES_128, "ByteVector.reduceLanes(VectorOperators.ADD)" generates code as below: ``` Before: uaddv d17, p0, z16.b smov x15, v17.b[0] add w15, w14, w15, sxtb After: addv b17, v16.16b smov x12, v17.b[0] add w12, w12, w16, sxtb ``` No multiply reduction instruction in SVE, this patch generates code for MulReductionVL by using scalar insnstructions for 128-bit vector size. With this patch, all of them have performance gain for specific vector micro benchmarks in my SVE testing system. [1] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ [2] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001 Change-Id: I4bef0b3eb6ad1bac582e4236aef19787ccbd9b1c
Loading
Please register or sign in to comment