IPP 7.1 vs IPP 6.1

April 22, 2013, 3:26 am

Latest and popular articles on Intel Technologies

Hi all!

I use IPP from 6.1 version appearance. 1 week ago we installed IPP 7.1 and reconfigured our source code to be compiled with 7.1 version.

Results are very strange. Overall performance decreased more than 4 times.

Maybe 7.1 requires some special settings?

↧

ippiWTInv Maximum Image Size?

April 22, 2013, 10:34 am

Latest and popular articles on Intel Technologies

≫ Next: NewBie q: What happened to BigNumber?

≪ Previous: IPP 7.1 vs IPP 6.1

I am trying to use ippiWTFwd and ippiWTInv to compute the haar transform of an image. The image is already in a size that is powers of 2, so I don't have to worry about replication or extending the image border. When I call my function on a 256x256 image up to a 1024x1024 image it doesn't cause a problem. If I try an image that is 2048x2048 or larger, my function will crash in ippiWTInv and return a read access violation at 0x0.

Here is the function I am using, which I pass the image data and the image width/height.

Ipp32f* haarIppi(Ipp32f* inputBuffer, int width, int height)
{
 IppiWTFwdSpec_32f_C1R* pSpec;
 IppiWTInvSpec_32f_C1R* pSpecInv;
 Ipp32f pTapsLow[2] = {0.7071067811865475f,0.7071067811865475f};
 Ipp32f pTapsHigh[2] = {0.7071067811865475f,-0.7071067811865475f};
 int lenLow = 2;
 int anchorLow = 1;
 int lenHigh = 2;
 int anchorHigh = 1;
int srcStep = width*sizeof(Ipp32f);
 Ipp32f* pDetailXDst = new Ipp32f[width*height/4];
 Ipp32f* pDetailYDst= new Ipp32f[width*height/4];
 Ipp32f* pDetailXYDst = new Ipp32f[width*height/4];
 Ipp32f* pApproxDst = new Ipp32f[width*height/4];
 IppiSize dstRoiSize = {width/2, height/2};
 int bufSize, bufSizeInv;
 Ipp8u* pBuffer;
 Ipp8u* pBufferInv;
 Ipp32f* pDstInv = new Ipp32f[width*height];
IppiSize roiInvSize = {width/2, height/2};
 int stepDstInv = width*sizeof(Ipp32f);
 int approxStep, detailXStep, detailYStep, detailXYStep;
 approxStep = detailXStep = detailYStep = detailXYStep = width/2*sizeof(Ipp32f);
//perform forward wavelet transform
 ippiWTFwdInitAlloc_32f_C1R ( &pSpec, pTapsLow, lenLow, anchorLow, pTapsHigh, lenHigh, anchorHigh);
 ippiWTFwdGetBufSize_C1R(pSpec, &bufSize);
 pBuffer = ippsMalloc_8u(bufSize);
 IppStatus forward = ippiWTFwd_32f_C1R (inputBuffer, srcStep, pApproxDst, approxStep, pDetailXDst,
 detailXStep, pDetailYDst, detailYStep, pDetailXYDst, detailXYStep,
 dstRoiSize, pSpec, pBuffer);
 if(forward!=0)
 qDebug() << "something failed in forward Xform";
 //initialize inverse specs
 ippiWTInvInitAlloc_32f_C1R (&pSpecInv, pTapsLow, lenLow, anchorLow, pTapsHigh, lenHigh, anchorHigh);
 ippiWTInvGetBufSize_C1R(pSpecInv, &bufSizeInv);
 pBufferInv = ippsMalloc_8u(bufSizeInv);
//perform inverse wavelet transform
 ippiWTInv_32f_C1R( pApproxDst, approxStep, pDetailXDst, detailXStep, pDetailYDst, detailYStep, pDetailXYDst,
 detailXYStep, roiInvSize, pDstInv, stepDstInv, pSpecInv, pBufferInv);
 ippiWTInvFree_32f_C1R (pSpecInv);
 ippiWTFwdFree_32f_C1R (pSpec);
return pDstInv;
}

↧

NewBie q: What happened to BigNumber?

April 22, 2013, 4:56 pm

Latest and popular articles on Intel Technologies

≫ Next: Can't build UIC samples

≪ Previous: ippiWTInv Maximum Image Size?

I am a newbie so chances are am overlooking things.

So I installed the IPP library as I wanted to do so some big number arithmetic. The "Cryptography for Intel IPP Reference Manual" that came with it lists the BigNumber class.

However I dont see any header files having any declaration/mention of it anywhere within the install folder. I dbl-checked the install to see if I had missed speciyfing any options during install time which could have installed it but nope. Neither do it I see any samples for it...

Where can I find it? what am I missing?

thx,

Jayant

↧

Can't build UIC samples

April 25, 2013, 2:22 am

Latest and popular articles on Intel Technologies

≫ Next: IPP and TBB

≪ Previous: NewBie q: What happened to BigNumber?

Hi all!

I downloaded IPP samples from Intel web page for IPP 7.1. I also installed CMake 2.8.10.2 and Active Perl 5.16. I tried to build samples in accordance with guide from Intel. I also set path to *.h and *.lib of IPP in PATH variable. But when I tried to start build (perl build.pl --cmake=uic,ia32,vc2010,d,mt,release), I got an error message : "Intel(R) IPP was not found".

May be build procedure requires some additional settings?

↧

IPP and TBB

April 25, 2013, 7:39 am

Latest and popular articles on Intel Technologies

≫ Next: Help about YUV420 to RGBA

≪ Previous: Can't build UIC samples

Hi,

I am wondering if IPP(7.1) in general and ippiWarpAffine* in special, does take advantage of TBB's parallel_for and if yes what is the way to enable it. When I did enabled TBB on OpenCV I got a significant speed boost on the warpAffine().

My test images(CT medical image) are 512x512 (8u) and I am using CUBIC interpolation on a destination sizes of 1590x820, . OpenCV(with TBB) is more that 3 times faster than IPP for exactly the same AffineTransform. Is worth mentioning that I am using in both (IPP and OpenCV) cases java wrappers under linux(RH6) 64-bit. For IPP I did compile the java language support (from IPP 7.0.7) against 7.1 and I am using jipp.ip.ippiWarpAffine_8u_C1R(). From OpenCV I am using Imgproc.warpAffine().

Any ideas? Please note that I am new to IPP and TBB and I am evaluating different products in order to find a good basis for a rendering libray (64-bit - Win7, Linux, Mac). From Intel I did download Intel C++ Composer XE 2013 which bundles IPP and TBB along with IMK and intel's compiler and it seems a nice fit for us so far.

Thank You,

Dacian

↧

Help about YUV420 to RGBA

April 25, 2013, 7:57 am

Latest and popular articles on Intel Technologies

≫ Next: newb where are filter functions?

≪ Previous: IPP and TBB

Hi all,

I have a decoded frame, which is in YUV420 format. I need to convert this file to BGRA or RGBA format. I have found ippiYCbCr420ToBGR_8u_P3C4R API and gave it a try but failed. Could you please help me to find the mistake?

int YUV420_TO_RGB32_IPP( unsigned char *apImage, unsigned char *pY, unsigned char *pU, unsigned char *pV, unsigned int nWidth, unsigned int nHeight) {

const Ipp8u *pYUV[3];
int srcStep[3];
IppiSize roiSize;

pYUV[0] = pY;
pYUV[1] = pU;
pYUV[2] = pV;

srcStep[0] = nWidth;
srcStep[1] = nWidth/2;
srcStep[2] = nWidth/2;

roiSize.width = nWidth;
roiSize.height = nHeight;

Ipp8u aval=255;
ippiYCbCr420ToBGR_8u_P3C4R(pYUV, srcStep, apImage, nWidth*4, roiSize, aval);
return 0;
}

Here is the image that I am supposed to get: https://www.dropbox.com/s/5bgrm4j1cy3v6wq/1.png
Here is the image what I am getting: https://www.dropbox.com/s/86crcvyijz7vd2y/2.png

Thanks.

↧

newb where are filter functions?

April 25, 2013, 11:13 am

Latest and popular articles on Intel Technologies

≫ Next: how to compile DLL for Java application using Visual C++ 2012

≪ Previous: Help about YUV420 to RGBA

I just got an eval copy of ipp to try. The documentation is awful. I just want to try first a simple single rate floating point FIR filter.

I look at http://software.intel.com/sites/products/documentation/doclib/ipp_sa/71/ipp_manual/index.htm, everthing

I see there says it deprectated. Where are the non-deprecated functions for

1) initialize the filter structure

2) process a vector of input data

Thanks

↧

how to compile DLL for Java application using Visual C++ 2012

April 27, 2013, 6:43 pm

Latest and popular articles on Intel Technologies

≫ Next: Why is IPP DMIP slower than Concurrency::parallel_for?

≪ Previous: newb where are filter functions?

based on IPP java sample, I generated C header. With c file and c header, i want to compile them in Visual C++.

But it gives a lot error, most of them is type mismatch. even jclass class is not correct.

so who can help me how to compile dlls for java application.

Attachment	Size
Download cipp.zip	2.86 MB

↧

Why is IPP DMIP slower than Concurrency::parallel_for?

April 30, 2013, 4:42 am

Latest and popular articles on Intel Technologies

≫ Next: FIR filter with non-contiguous (strided) input?

≪ Previous: how to compile DLL for Java application using Visual C++ 2012

I've downloaded Intel IPP DMIP sample: ipp-samples.7.1.1.013. I built application\dmip_bench\ utility against IPP v7.1.1. It showed significant performance boost of DMIP flavor against IPP flavor.

I then refactored ModifyBrightness::DoIPP method to simply process image by rows, and parallelized this processing with Concurrency::parallel_for. Then I rebuild the solution with both _IPP_SEQUENTIAL_STATIC and _IPP_PARALLEL_DYNAMIC macros. And the results was unexpected.

With _IPP_SEQUENTIAL_STATIC:

DMIP 1.5 Jul 12 2012
ippIP SSSE3 (v8) 7.1.1 (r37466) Sep 24 2012
ippCV SSSE3 (v8) 7.1.1 (r37466) Sep 24 2012
ippCC SSSE3 (v8) 7.1.1 (r37466) Sep 25 2012
Number of threads: 2
DMIP Modify Brightness example time 3.16375 msec slice 34
IPP Modify Brightness example time 1.85974 msec slice 467
Close the session

With _IPP_PARALLEL_DYNAMIC:

DMIP 1.5 Jul 12 2012
ippIP SSSE3 (v8) 7.1.1 (r37466) Sep 27 2012
ippCV SSSE3 (v8) 7.1.1 (r37466) Sep 27 2012
ippCC SSSE3 (v8) 7.1.1 (r37466) Sep 28 2012
Number of threads: 2
DMIP Modify Brightness example time 2.34378 msec slice 34
IPP Modify Brightness example time 6.75662 msec slice 467
Close the session

As you can see, manually parallelized version works better, than DMIP. Why?

I used Visual Studio 2010 for compilation. Under Windows 7 x64. Solution configuration was x86. I have Intel E6550 processor. I used an RGB 1200x467 image.

I attached modified sample. With compiled executables and output logs.

Attachment	Size
Download dmip-bench.zip	751.65 KB

↧

FIR filter with non-contiguous (strided) input?

April 30, 2013, 10:09 am

Latest and popular articles on Intel Technologies

≫ Next: Multirate Stream FIR

≪ Previous: Why is IPP DMIP slower than Concurrency::parallel_for?

Are FIR filters limited to only inputs with unit-stride (contiguous)?

↧

Multirate Stream FIR

April 30, 2013, 11:17 am

Latest and popular articles on Intel Technologies

≫ Next: ippsDFTInv_CToC_64f No Longer Supported?

≪ Previous: FIR filter with non-contiguous (strided) input?

1. What's the difference between multirate FIR filter and multirate Stream FIR filter?

2. I need a multirate filter that will process an input of any length at a time, and be able to pick up where it left off when called again. For a decimating filter, that means it must maintain the decimation phase between calls. It also means the number of samples produced in each call is not fixed. (It's only fixed if the length of the input is divisible by the decimation ratio)

Does IPP have anything I can use?

↧

ippsDFTInv_CToC_64f No Longer Supported?

May 1, 2013, 2:42 pm

Latest and popular articles on Intel Technologies

≫ Next: Log Sum functions are deprecated - what is the replacement?

≪ Previous: Multirate Stream FIR

I have the following working code:

IppsDFTSpec_C_64f *spec;
ippsDFTInitAlloc_C_64f(&spec, N, IPP_FFT_NODIV_BY_ANY, ippAlgHintAccurate);
ippsDFTInv_CToC_64f(pRealData, pImagData, pRealData, pImagData, spec, NULL);

Yet the newer compilers (version 13.0.1 in my case) complain about the InitAlloc being deprecated. I tried to switch to the new recommended way, but there seems to be no ippsDFTInit_C_64f function. There is a ippsDFTInit_C_64fc version (note fc vice f at the end), but to use this I have to introduce 2 data copies to make it work:

int specSize, initSize, workSize;
IppStatus status = ippsDFTGetSize_C_64fc(N, IPP_FFT_NODIV_BY_ANY, ippAlgHintAccurate, &specSize, &initSize, &workSize);
IppsDFTSpec_C_64fc *fftSpec = (IppsDFTSpec_C_64fc*)ippsMalloc_8u(specSize);
Ipp8u* pInitBuf = ippsMalloc_8u(initSize);
status = ippsDFTInit_C_64fc(N, IPP_FFT_NODIV_BY_ANY, ippAlgHintAccurate, fftSpec, pInitBuf);
if (pInitBuf) ippsFree(pInitBuf);
Ipp8u* pWorkBuf = ippsMalloc_8u(workSize);
Ipp64fc* pComplexData = ippsMalloc_64fc(N);
for (int i=0; i<N; i++) {
    pComplexData[i].re = pRealData[i];
    pComplexData[i].im = pImagData[i];
}
ippsDFTInv_CToC_64fc(pComplexData, pComplexData, fftSpec, pWorkBuf);
for (EcUint4 i=0; i<N; i++) {
    pRealData[i] = pComplexData[i].re;
    pImagData[i] = pComplexData[i].im;
}
ippsFree(pComplexData);

Is there any way to use ippsDFTInv_CToC_64f? If not, it seems to me the better option is to stick with the deprecated code to avoid the data copying (and the code bloat).

↧

Log Sum functions are deprecated - what is the replacement?

May 2, 2013, 8:04 am

Latest and popular articles on Intel Technologies

≫ Next: IPP_OpenCV2.1_Sample.zip missing?

≪ Previous: ippsDFTInv_CToC_64f No Longer Supported?

Subject says it all. Thanks.

↧

IPP_OpenCV2.1_Sample.zip missing?

May 2, 2013, 12:30 pm

Latest and popular articles on Intel Technologies

≫ Next: Java JNI Can't find dependent libraries

≪ Previous: Log Sum functions are deprecated - what is the replacement?

Want to take a look at it but it is no more there. Any update?

↧

Java JNI Can't find dependent libraries

May 5, 2013, 11:48 pm

Latest and popular articles on Intel Technologies

≫ Next: Announcing v0.5.0 of the Math eXtension Library (MXLib) - A C++ Wrapper for IPP

≪ Previous: IPP_OpenCV2.1_Sample.zip missing?

I want to call ipp function via JNI. So I compiled dlls using c++ and I put these dlls into my Java project lib folder.

but when I call one ipp function jipp.core.ippGetCpuType(), it gives me error like:

Exception in thread "main" java.lang.UnsatisfiedLinkError: D:\DevelopJava\MyIPP\lib\jippcore.dll: Can't find dependent libraries
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary1(Unknown Source)
at java.lang.ClassLoader.loadLibrary0(Unknown Source)
at java.lang.ClassLoader.loadLibrary(Unknown Source)
at java.lang.Runtime.loadLibrary0(Unknown Source)
at java.lang.System.loadLibrary(Unknown Source)
at jipp.core.<clinit>(core.java:42)
at client.zero.main(zero.java:16)

but I already set java.library.path include intel path like:

\Program Files (x86)\Intel\Composer XE 2013\redist\intel64\ipp

D:\Program Files (x86)\Intel\Composer XE 2013\redist\intel64\compiler

so ,anyone can help me ?

↧

Announcing v0.5.0 of the Math eXtension Library (MXLib) - A C++ Wrapper for IPP

May 6, 2013, 4:56 pm

Latest and popular articles on Intel Technologies

≫ Next: Working FFT convolution example?

≪ Previous: Java JNI Can't find dependent libraries

I'm pleased to announce a Beta version of MXLib has been posted to Sourceforge at:

https://sourceforge.net/projects/mxlib/

MXLib is a C++ wrapper around the Intel^® Integrated Performance Primitives (IPP) library. The idea is to provide Scientists, Engineers, Researchers and other non full-time programmers an easy to use, high performance library of functions for scientific programming. MXLib provides the following:

Most functions can be accessed with either C++ style call or a MatLab^® like call. The MatLab^® like function calls make it easy to port research code from a MatLab^® to C++. In fact, for many projects, you can copy your MatLab^® code, paste it into your C++ editor, make a few changes (adding object instantiations, work around certain MatLab^TMsyntax structures that aren’t supported by C++, etc.) and you’re done!
Automatically handles memory allocation/de-allocation and ensures that the IPP functions are called properly. Memory problems and calling IPP functions with data sets with mismatched sizes are common ways that problems occur when using IPP.
Greatly extends the functionality of IPP. Adds support for 64-bit data types IPP64s, Ipp64sc, Ipp64f, Ipp64fc) for matrices/images and also adds functionality found in a MatLab^®, but not found in IPP.
Nearly seamless integration of complex and real data types into the same template objects greatly decreasing the complexity of the code.
Two basic object types; Vectors and Matrices. For some projects, most of your calculations are done using vectors (signal processing), others require more work with matrices. Functions for moving data between the two object types are included.
Functions for moving data between your C++ project and MatLab^®are included.
Formatted console Print() functions to make it easier to examine your data.
Makes it easy to access IPP functions that have not yet been integrated into MXLib. Let MXLib handle the memory allocation/de-allocation for you then access the data inside the MXLib objects when you need to make your own call to an IPP function.
Makes it much easier to work with complex data types in IPP by adding operators, abs(), angle(), set to zero, etc. functions.
Uses exception handling that automatically types to the console where the exception or IPP error occurred and the chain of functions that led to the error similar to how errors are displayed in MatLab^®.

New features in this release:

Integration of FreeImage into MXLib. New functions ImRead() and ImWrite() use the FreeImage library to load and save image files such as .bmp, .jpg, .tiff, etc. Integrates seamlessly with the MXMatrix class and will by default flip the images over and reorder the color channels to match how the data would be structured after loading the image file in MatLab^®. Will handle just about anything that FreeImage does except for greyscale images with alpha channels. Includes a prebuilt 64-bit library and DLL for Windows because they sometimes have trouble doing this. Linux developers will have to use/make their own. To learn more about FreeImage, go to: http://freeimage.sourceforge.com

New simple TIC-TOC style timers for Linux and Windows.
Some of bug fixes.

What MXLib does not do (or doesn’t do yet):

Although MatLab^® is not necessary to use MXLib; it does not completely replace MatLab^®! MatLab^® is a wonderful tool. Use it. Buy it. The MatLab^® Compiler is also a good tool for certain projects, but many projects can’t use or don’t want to use it because of performance issues, installed library requirements, code obfuscation issues, etc.
It does not automatically generate C++ code like the MatLab^®Coder. MatLab^®Coder is an interesting product that has its uses, but anyone who has used it knows the kind of code it spits out and how hard that is to modify or integrate with other code.
Does not release you from the need to obtain a legitimate copy of IPP. IPP can be used free of charge under Linux for non-commercial use and you can download a demo version for Windows. Please visit the Intel IPP website at: http://software.intel.com/en-us/articles/intel-ipp/#support and be sure to read the any User Agreements and Copyright restrictions there.
Does not include precompiled libraries (yet!). For the time being MXLib comes as template class code that is compatible with Linux (GCC and Intel^® compilers) and Windows (Tested with Visual Studio 2010 Pro) that is included in your project and compiled each time. It compiles rather quickly with VS 2010, but can be a bit slow to build under Linux especially when using the Intel compiler. This difference is due to how the different compilers handle template class code and is out of my control.

What is Coming Next

Support for NVidia's NPP CUDA library! The code for this will work similarly to the current IPP based code in MXLib and will allow switching between computations on the CPU and on the GPU easy.
Things that people suggest.

If you have any questions or suggestions, please contact me here or in the discussions in the Sourceforge project.

Enjoy!

↧

Working FFT convolution example?

May 8, 2013, 7:27 am

Latest and popular articles on Intel Technologies

≫ Next: Can't download the Intel Cryptography lib

≪ Previous: Announcing v0.5.0 of the Math eXtension Library (MXLib) - A C++ Wrapper for IPP

Would someone happen to have a working example of an FFT convolution using IPP? I'm trying to setup a basic image compare, so far the results aren't great.

↧

Can't download the Intel Cryptography lib

May 9, 2013, 1:30 am

Latest and popular articles on Intel Technologies

≫ Next: IPPISet is slower than memset & IPPSSET

≪ Previous: Working FFT convolution example?

HI, I can't download the crypto lib for IPP from the web page. The download doesn't start!! In My Product Subscription Information page I get this info: -- Cryptography for Intel® Integrated Performance Primitives -- Cryptography for Intel® Integrated Performance Primitives for Linux* - Version 7.1 (0.079) - 16 Aug 2012 -- Cryptography for Intel® Integrated Performance Primitives for Linux* - Version 7.1.1 (163) - 25 Mar 2013 -- Intel® C++ Composer XE for Linux* -- Intel® C++ Composer XE for Linux* - Version 2013 (Update 3 Eng/Jpn) - 22 Mar 2013 -- -- Intel® Integrated Performance Primitives for Linux* - Version 7.1 (update 1) - 18 Oct 2012 -- -- Intel® Math Kernel Library for Linux* - Version 11.0 (Update 3) - 22 Mar 2013 -- -- Intel® MPSS for Linux* - Version 2013 (update 2 hotfix) - 21 Mar 2013 -- -- Intel® Threading Building Blocks for Linux* - Version 4.1 (Update 3) - 27 Mar 2013 -- Intel® Integrated Performance Primitives for Linux* - Version 7.1 (update 1) - 18 Oct 2012 If I select the Cryptography for Intel® Integrated Performance Primitives for Linux* - Version 7.1.1 (163), this takes me to the download page, but when I click the download link, this takes me back to the My Product Subscription Information page. Does anyone know how to solve this issue?

↧

IPPISet is slower than memset & IPPSSET

May 9, 2013, 5:24 am

Latest and popular articles on Intel Technologies

≫ Next: IPPISet is slower than memset & IPPSSET

≪ Previous: Can't download the Intel Cryptography lib

Hi,

We tried IPPISET roi based method to set the set of portions in the image 0.

But this is slower than memset [on entire image to 0]

We have tried the IPPSSET and found it is more faster than memset and IPPISET.

Is the ROI based method usually slower than non-ROI based method like IPPSSET.

Actually we thought since we were doing selective regions in image to reset to zero using IPPISET ,it should be faster than memset and IPPSET performed on entire image.

But when we profile we found the IPPISET[called four time on four different small region on image] is taking more time than IPPSSET and memset.

Can you please explain why/how this can happen?

We were using IPP 7.0 version.

Thanks & Regards,

Murali

↧

IPPISet is slower than memset & IPPSSET

May 9, 2013, 5:24 am

Latest and popular articles on Intel Technologies

≫ Next: RGBToGray

≪ Previous: IPPISet is slower than memset & IPPSSET

Hi,

We tried IPPISET roi based method to set the set of portions in the image 0.

But this is slower than memset [on entire image to 0]

We have tried the IPPSSET and found it is more faster than memset and IPPISET.

Is the ROI based method usually slower than non-ROI based method like IPPSSET.

Actually we thought since we were doing selective regions in image to reset to zero using IPPISET ,it should be faster than memset and IPPSET performed on entire image.

But when we profile we found the IPPISET[called four time on four different small region on image] is taking more time than IPPSSET and memset.

Can you please explain why/how this can happen?

We were using IPP 7.0 version.

Thanks & Regards,

Murali

↧