Hi, I have noticed a bug in ippiLUTPalette_8u_C1R function during my odd dimension image tests. I am using windows 64bit ipp v2018.3.210.
I have (w, h) = (1575, 1049), srcStep = 1664, dstStep = 1576, s_lut[256] is a uint8_t array defined in global scope.
Now calling
ippiLUTPalette_8u_C1R(srcData, srcStep, dstData, dstStep, IppiSize{w, h}, s_lut, 8);results in wrong values except the first line of the image. It seems srcStep is larger than what this function can handle. On a side note, IppiCopy_8u_C1R works without any problem for the same data.
After I noticed that only the first row of the destination image is correct, I applied a simple workaround. Calling
for(int j = 0; j < h; j++)
ippiLUTPalette_8u_C1R(srcData + j*srcStep, w, dstData + j*dstStep, w, IppiSize{w, 1}, s_lut, 8);works without any problems and resulting values are as expected. Here I process row by row giving width as the step. I think step value is the problem.
On another note, I have checked the correctness of this function against simple C implementation as below
uint8_t *srcRow = srcData;
uint8_t *dstRow = dstData;
for(int j = 0; j < h; j++)
{
for(int i = 0; i < w; i++)
{
dstRow[i] = s_lut[ srcRow[i] ];
}
srcRow += srcStep;
dstRow += dstStep;
}
Runtimes on Intel i7-8700K:
ippiLUTPalette_8u_C1R -> 980 us
Simple C loop byte-by-byte lookup -> 1070 us
Is this function really not optimized? or very unlikely but there are still some invalid operations even when applied row-by-row, so that it is becoming this slow? 1ms for going over the image only once is really too much. ippiCopy_8u_C1R on the exact same data takes 88 us (>11x faster).