Image Resizing Using Arm Neon
Solution 1:
On my second thought, the vertical downsizing is very well SIMDable because the same arithmetic can be applied to horizontally adjacent pixels.
So here is what I suggest :
- Resize vertically with NEON using q15 unsigned fp arithmetic. The temporary result is stored in 32bits/element.
- Resize horizontally with ARM using q15 unsigned fp arithmetic, divied by area/typecast/pack and store the final result in RGBA.
Please note that the division by area shall be performed in a LONG multiplication with (1/area) in q17.
Why q17? If you do q15*q17, the result is in q32 where two 32bit registers contain the data. And you don't need to do any 'typecasting by bit operations' because the upper register already has the targeted 8bit int value. That's the beauty of fp arithmetic.
Maybe I'll write the fully optimized version of this in near future, completely in assembly.
Solution 2:
Unfortunately, NEON isn't very well suited for this kind of job. If it was image resizing with fixed source and destination resolutions, it would be possible to NEONize with dynamic vectors, but summing variable number of adjacent pixels isn't simply SIMDable.
I suggest replacing float arithmetic with fixed point one. That alone will help a lot.
Besides, division takes terribly long. It really harms the performance especially when done inside a loop. You should replace it with a multiplication like :
uint8_t dst = malloc(w_dst);
float area_ret = 1.0f/area;
for x_dst = 0 .. w_dst
dst[x_dst] = (uint8_t)round(acc[x_dst] * area_ret);
Post a Comment for "Image Resizing Using Arm Neon"