Is a SharedMatrix implicitly synchronized by hppWait i.e., hppiGetMatrixData is conceptually not needed in this case? I wonder if this is correct in particular when CPU/GPU copies cannot be omitted (due to the device type / extension used).
The reference manual (ipp_async_manual.pdf) states that "All Intel® IPP Asynchronous C/C++ library functions, except for setup and release functions, are asynchronous.". However, the above question remains unclear to me. For example, the "ipp_async_sobel" source code example calls hppWait, but never uses hppiGetMatrixData (as of the December 2013 Preview of Intel IPP). Is it conceptually more correct to use hppiGetMatrixData even when it performs no actual work due to "zero copy"?