Hi,
I have compared the time performance of cross-correlation (with normalized coefficients) in Ipp 7.0 vs Ipp 2018. In my test, the old version is 2x - 3x faster. I have disabled the hyper-threding from Bios. I have tried to use versions for different processors (y8, e9, I9).
VTune (trial) shows that Ipp 2018 version of cross-correlation has no multi-threading compared to Ipp 7.0.
Is there something I'm missing? The only difference between the 2 tests is the call for cross-correlation.
My processor is an Intel Core i7 4790.
---------------------------------------
Test #1: Ipp 2018
Lib info:
targetCpu: I9
Name: "ippCV AVX2 (I9)"
Version: "2018.0.3 (r58644)"
BuildDate: "Apr 7 2018"
Code:
CGenericImage imInput; // 2048 x 2048, 8-bit image loaded from hdd
CGenericImage imInput_32f; // input image converted to Ipp32f
CGenericImage imTemplate_32f; // generated gaussian template, 9x9
CGenericImage imOutput_32f; // score image
IppiSize szImage = { imInput.m_nWidth, imInput.m_nHeight };
IppiSize szTemplate = { imTemplate_32f.m_nWidth, imTemplate_32f.m_nHeight };
st |= ippiConvert_8u32f_C1R((Ipp8u*)imInput.m_pData, imInput.m_nStep, (Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage);
Ipp8u* pBuffer = NULL;
int nBufferSize = 0;
st |= ippiCrossCorrNormGetBufferSize(szImage, szTemplate, algType, &nBufferSize);
pBuffer = ippsMalloc_8u(nBufferSize);
IppEnum algType = (IppEnum)(ippAlgAuto | ippiROISame | ippiNormCoefficient);
st |= ippiCrossCorrNorm_32f_C1R((Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage
, (Ipp32f*)imTemplate_32f.m_pData, imTemplate_32f.m_nStep, szTemplate
, (Ipp32f*)imOutput_32f.m_pData, imOutput_32f.m_nStep, algType, pBuffer);
---------------------------------------
Test #2: Ipp 7.0
Lib info:
targetCpu: e9
Name: "ippcve9-7.0.dll"
Version: "7.0 build 250.85"
BuildDate: "Nov 27 2011"
Code:
CGenericImage imInput; // 2048 x 2048, 8-bit image loaded from hdd
CGenericImage imInput_32f; // input image converted to Ipp32f
CGenericImage imTemplate_32f; // generated gaussian template, 9x9
CGenericImage imOutput_32f; // score image
IppiSize szImage = { imInput.m_nWidth, imInput.m_nHeight };
IppiSize szTemplate = { imTemplate_32f.m_nWidth, imTemplate_32f.m_nHeight };
st = ippiConvert_8u32f_C1R((Ipp8u*)imInput.m_pData, imInput.m_nStep, (Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage);
st = ippiCrossCorrSame_NormLevel_32f_C1R((Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage
, (Ipp32f*)imTemplate_32f.m_pData, imTemplate_32f.m_nStep, szTemplate
, (Ipp32f*)imOutput_32f.m_pData, imOutput_32f.m_nStep);
---------------------------------------