Hi,
I'm evaluating IPP's multirate FIR filter for my application where performance is the major concern.
It is very intuitive and easy to use.
However, I found that ippFirMR performs poorly when compared to hand written AVX2 code(40% slower).
I ran the VTune profiler on code that runs IPP FIRMR and found that the reason for poor performance was the low vector capacity usage. VTune micro-architecture exploration shows only 50% vector capacity usage.
I tried setting CPU features to dispatch L9 implementation, but found no difference.
Does ippFirMR have an implementation for L9(AVX2) and K0(AVX512) ? or does it only have the SSE versions.
Am I missing something ?
Implementations for K0 and L9 would be of great help as I do not want to maintain separate versions of handwritten code for every architecture.
I'm using the 2020.1.216 version of IPP. I'm running on an Intel Corei7-7600U processor(with windows 10) that has AVX2 capabilities.