8280510: AArch64: Vectorize operations with loop induction variable (e85e8ef4) · Commits · github / openjdk / jdk

Commit e85e8ef4 authored Feb 16, 2022 by Pengfei Li

8280510: AArch64: Vectorize operations with loop induction variable

AArch64 has SVE instruction of populating incrementing indices into an
SVE vector register. With this we can vectorize some operations in loop
with the induction variable operand, such as below.

  for (int i = 0; i < count; i++) {
    b[i] = a[i] * i;
  }

This patch enables the vectorization of operations with loop induction
variable by extending current scope of C2 superword vectorizable packs.
Before this patch, any scalar input node in a vectorizable pack must be
an out-of-loop invariant. This patch takes the induction variable input
as consideration. It allows the input to be the iv phi node or phi plus
its index offset, and creates a PopulateIndexNode to generate a vector
filled with incrementing indices. On AArch64 SVE, final generated code
for above loop expression is like below.

  add     x12, x16, x10
  add     x12, x12, #0x10
  ld1w    {z16.s}, p7/z, [x12]
  index   z17.s, w1, #1
  mul     z17.s, p7/m, z17.s, z16.s
  add     x10, x17, x10
  add     x10, x10, #0x10
  st1w    {z17.s}, p7, [x10]

As there is no populating index instruction on AArch64 NEON or other
platforms like x86, a function named is_populate_index_supported() is
created in the VectorNode class for the backend support check.

Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
are tested and no issue is found. Hotspot jtreg has existing tests in
compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so
no new jtreg is created within this patch. A new JMH is created in this
patch and tested on a 512-bit SVE machine. Below test result shows the
performance can be significantly improved in some cases.

  Benchmark                       Performance
  IndexVector.exprWithIndex1            ~7.7x
  IndexVector.exprWithIndex2           ~13.3x
  IndexVector.indexArrayFill            ~5.7x

parent a86cab8d

Hide whitespace changes

Inline Side-by-side

Please register or to comment