8280510: AArch64: Vectorize operations with loop induction variable
AArch64 has SVE instruction of populating incrementing indices into an SVE vector register. With this we can vectorize some operations in loop with the induction variable operand, such as below. for (int i = 0; i < count; i++) { b[i] = a[i] * i; } This patch enables the vectorization of operations with loop induction variable by extending current scope of C2 superword vectorizable packs. Before this patch, any scalar input node in a vectorizable pack must be an out-of-loop invariant. This patch takes the induction variable input as consideration. It allows the input to be the iv phi node or phi plus its index offset, and creates a PopulateIndexNode to generate a vector filled with incrementing indices. On AArch64 SVE, final generated code for above loop expression is like below. add x12, x16, x10 add x12, x12, #0x10 ld1w {z16.s}, p7/z, [x12] index z17.s, w1, #1 mul z17.s, p7/m, z17.s, z16.s add x10, x17, x10 add x10, x10, #0x10 st1w {z17.s}, p7, [x10] As there is no populating index instruction on AArch64 NEON or other platforms like x86, a function named is_populate_index_supported() is created in the VectorNode class for the backend support check. Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1 are tested and no issue is found. Hotspot jtreg has existing tests in compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so no new jtreg is created within this patch. A new JMH is created in this patch and tested on a 512-bit SVE machine. Below test result shows the performance can be significantly improved in some cases. Benchmark Performance IndexVector.exprWithIndex1 ~7.7x IndexVector.exprWithIndex2 ~13.3x IndexVector.indexArrayFill ~5.7x
Loading
Please register or sign in to comment