ippsConvolve_32f take about 5 times longer on AVX2 compared to AVX.

September 10, 2013, 1:42 am

Latest and popular articles on Intel Technologies

≫ Next: Texture Compression Functions IPP 8.0

Hello All,

We measured time that takes to perform ippsConvolve_32f on i5-4402E processor and have seen that ippsConvolve_32f takes about 5 times longer when we using avx2 compared to avx.

We tried to use ippsConv_32f instead of ippsConvolve_32f and get the same results. We tried possible convolution algorithms (ippAlgAuto, ippAlgDirect and ippAlgFFT) and have seen that using ippAlgAuto and using ippAlgDirect gives the same result (using avx and using avx2).

When we try to use ippAlgFFT in avx we get little performance decrease and in avx2 we get performance increase compared to ippAlgAuto in avx2 but still take more time then avx ippAlgAuto.

The times we get in microSec: AVX AVX2

ippAlgAuto, ippAlgDirect: 4 27

ippAlgFFT 5 5

So it's seems to be bug in ippsConvolve_32f for ippAlgFFT when using avx2.

avx2 should be more faster then avx for each algorithm but we see that for ippAlgFFT there is no improvement and for ippAlgDirect the performance is critically decreased.

Notes:

We are using static linkage (#include <ipp_h9.h> for avx2 and <ipp_g9.h> for avx before #include <ipp.h>).

Thank you,

Itzhak

↧

Texture Compression Functions IPP 8.0

September 11, 2013, 1:59 am

Latest and popular articles on Intel Technologies

≫ Next: ipp installation under linux

≪ Previous: ippsConvolve_32f take about 5 times longer on AVX2 compared to AVX.

Hi eveyone,

İ downloaded the trial edition of IPP 8.0 , i can not find Texture Compression Functions in ippj.h

I looked from manual reference 8.0 whether named different but it is like in 7 version, than looked from what is new it does not tell any change about this fuctions. What is the problem?

↧

ipp installation under linux

September 11, 2013, 2:28 am

Latest and popular articles on Intel Technologies

≫ Next: IPP 8.0 problem linking threaded libraries

≪ Previous: Texture Compression Functions IPP 8.0

After running ipp setup software is installed inside /opt/intel

To add the library to my programming IDE (QT), I added this path to the project file:

/opt/intel/composer_xe_2013_sp1.0.061/ipp/lib/ia32

But, unfortunately, I still cannot find the ipp.h include file in my program.

is there anything wrong with the path? how to resolve this?

↧

IPP 8.0 problem linking threaded libraries

September 12, 2013, 2:40 am

Latest and popular articles on Intel Technologies

≫ Next: I can not decode MP4A-LATM stream with using UMC::AACDecode.

≪ Previous: ipp installation under linux

When compiling using the IPP threaded libraries I receive the linker error message:
“LINK : fatal error LNK1104: cannot open file 'threaded/ippcoremt.lib“
I have installed 'Intel C++ Studio XE 2013 SP1' together with Visual Studio 2010. The installation of the IPP threaded libraries was activated and they are completely present in the specific 'threaded' directory.
On my system the installation path is: C:\Program Files (x86)\Intel\Composer XE 2013 SP1\ipp\lib\ia32\threaded and C:\Program Files (x86)\Intel\Composer XE 2013 SP1\ipp\lib\intel64\threaded.
The Visual Studio Library Path settings is: $(ICLibDir);$(LibraryPath);$(IPPLibDir).
$(IPPLibDir) points to C:\Program Files (x86)\Intel\Composer XE 2013 SP1\ipp\lib\ia32
So all the paths are set correctly.
The header file 'ippcore.h' contains '#pragma comment( lib, "threaded/ippcoremt" )' what usually binds the correct threaded library 'ippcoremt.lib'

This means that the compiler and/or linker does not handle combined paths correctly inside a '#pragma comment(lib, ).
Does anybody have a solution?

↧

I can not decode MP4A-LATM stream with using UMC::AACDecode.

September 13, 2013, 4:26 am

Latest and popular articles on Intel Technologies

≫ Next: I can not decode MP4A-LATM stream with using UMC::AACDecode.

≪ Previous: IPP 8.0 problem linking threaded libraries

I use Live555 library for receiving RTSP streams. A lot of RTSP streams contains audio stream with SDP strings:
m=audio 0 RTP/AVP 96
a=rtpmap:96 MP4A-LATM/48000/2
a=fmtp:96 profile-level-id=15;object=2;cpresent=0;config=400023203FC0
a=control:trackID=1

From Live555 I recieve full packet with header LATM (2 bytes), I can not tune UMC::AACDecode for decoding these packets, decoder always returns UMC_ERR_UNSUPPORTED.

In mailing list of Live555 I found http://lists.live555.com/pipermail/live-devel/2006-May/004356.html
I attempted give to decoder packets without LATM header, but this too does not work.
What I must do for launch UMC::AACDecode with this type stream ?

↧

I can not decode MP4A-LATM stream with using UMC::AACDecode.

September 13, 2013, 4:36 am

Latest and popular articles on Intel Technologies

≫ Next: Integral image for Ipp32f image

≪ Previous: I can not decode MP4A-LATM stream with using UMC::AACDecode.

From Live555 I recieve full packet with header LATM (2 bytes), I can not tune UMC::AACDecode for decoding these packets, decoder always returns UMC_ERR_UNSUPPORTED.

↧

Integral image for Ipp32f image

September 15, 2013, 5:10 am

Latest and popular articles on Intel Technologies

≫ Next: IPP 7.1 update

≪ Previous: I can not decode MP4A-LATM stream with using UMC::AACDecode.

Is it possible anyhow to compute an integral image for an image w/ pixels of type Ipp32f (i.e. float) with use of Intel IPP?

↧

IPP 7.1 update

September 16, 2013, 3:37 pm

Latest and popular articles on Intel Technologies

≫ Next: VS2012 support for IPP 7.0 or 7.1

≪ Previous: Integral image for Ipp32f image

I'm trying to find out if there is more than 1 update for IPP 7.1. if so, can you tell me a location. thanks.

↧

VS2012 support for IPP 7.0 or 7.1

September 17, 2013, 6:47 am

Latest and popular articles on Intel Technologies

≫ Next: MulScale very slow

≪ Previous: IPP 7.1 update

Hello,

I have got reference to the IPP libraries within my software project, which I would like to adapt for Visual Studio 2012 now. This means that the IPP libraries have to be built with VS2012, too. Could you please provide me such libraries and include files? Thanks.

Best regards,

Vitali

↧

MulScale very slow

September 18, 2013, 12:57 pm

Latest and popular articles on Intel Technologies

≫ Next: ippsFIR_Direct

≪ Previous: VS2012 support for IPP 7.0 or 7.1

Hi all,

I found out that some functions are very slow, for example the function ippiMulScale_8u_C3IR takes up to 4ms by using 1920x1080 image on Intel Core i5 CPU, 8GB RAM. (I'm using IPP 7.0 static linked).
Any suggestion?

↧

ippsFIR_Direct

September 20, 2013, 12:08 am

Latest and popular articles on Intel Technologies

≫ Next: Asynchronous C/C++ GPU optimization

≪ Previous: MulScale very slow

I'm looking at using IPP to optimize a 1024 point FIR filter. For my application I need to update the coefficients every sample, so that the coefficients are smoothly interpolated between different coefficient sets. The function ippsFIR_Direct_32f seems to do what I need, but it's deprecated in IPP v8, so I'd rather user something else if there is a better option.

The other FIR functions, such as ippsFIRSR_32f, look like they require a seperate intitalization of the coefficient buffers (using ippsFIRSRInit), which I don't think will work for me since I need to update the coefficients within audio processing interrupt, so I can't have any memory allocation or costly copying of data.

Any suggestions?

Thanks.

↧

Asynchronous C/C++ GPU optimization

September 21, 2013, 6:57 am

Latest and popular articles on Intel Technologies

≫ Next: Using Intel IPP with pjsip (PJMEDIA)

≪ Previous: ippsFIR_Direct

Hi!

If I need to handle quite big image (for example, 4k), could it be efficient to split input image into tiles (tile size can be based on number of GMA cores, to provide full GPU utilzation) and execute whole sequence of image processing operations by tiles, e.g. compute Sobel on 0-tile, next on 1st-tile, ...?

↧

Using Intel IPP with pjsip (PJMEDIA)

September 22, 2013, 8:24 am

Latest and popular articles on Intel Technologies

≫ Next: Porting Ipp6 to Ipp8

≪ Previous: Asynchronous C/C++ GPU optimization

I want to use Intel IPP with pjsip to provide support for G.729 audio codec. I'm working on OS K 10.8.5. I found a tutorial here but I think it is outdated because the link in step 2 is broken. And the IPP sample has no readme.htm file what so ever. The steps are quite complex, so I have no idea if those steps would still work for the current version of Intel IPP (8.0). Do you have a new tutorial posted somewhere?

Internal Tags:

↧

Porting Ipp6 to Ipp8

September 23, 2013, 7:48 pm

Latest and popular articles on Intel Technologies

≫ Next: [H.264] lossless mode ?

≪ Previous: Using Intel IPP with pjsip (PJMEDIA)

Hi all,

I am new to Ipp and have tiny experience with ImageProcessing. I need to port old code which is using Ipp6 to our new project. I need some help with ippiResize_8u_C1R function (IPPI_INTER_CUBIC interpolation). What is the most correct way to port it, if I want to get exactly same pixels value after resize. In new Ipp Resize api I see completely new terminology like "spec", "external buffer" and also in function ippiResizeCubicInit_8u I need to define A, B, C and I don't have a clue what their value was in original old function.

Thanks, Pavel

↧

[H.264] lossless mode ?

September 26, 2013, 12:42 am

Latest and popular articles on Intel Technologies

≫ Next: Copying cv::Mat to Ipp

≪ Previous: Porting Ipp6 to Ipp8

Dear all

Please teach the way to encode with h264 lossless mode.

According to the UMC manual, the following parameters are related.

profile_idc = HIGH444 (244?)
rate_controls.method = H264_RCM_QUANT? (Constant quantization parameters)
rate_controls.quantI = quantP = quantB = - 6 * (bit_depth_luma - 8)
qpprime_y_zero_transform_bypass_flag = true

Anything else?

I tried the above parameters, but lossless mode didn't work. (near lossless mode?)

I've tried with IPP 7.0.

Thanks.

↧

Copying cv::Mat to Ipp

September 26, 2013, 11:21 am

Latest and popular articles on Intel Technologies

≫ Next: how to decode g729b stream?

≪ Previous: [H.264] lossless mode ?

I was wondering if anyone had quick example code for this, I couldn't find any examples in this forum or OpenCV's website. If not, I have a few questions where I am most confused on.

OpenCV is interlaced, in bgr format. Ipp expects planar images though (as per the malloc definition.) For now I think I'll avoid this, and just only convert grayscale images, But this is useful for future work/questions.
The OpenCV widthstep is no longer 32 bit aligned. So my assumption is going to be to transfer row by row from a cv::Mat into an ipp_malloced memory. I'm assuming that OpenCV widthstep < IPP widthstep.
Is there anything else I should worry about?

Thank you in advance!Constantin

↧

how to decode g729b stream?

September 27, 2013, 12:37 am

Latest and popular articles on Intel Technologies

≫ Next: UIC Multithreading

≪ Previous: Copying cv::Mat to Ipp

hi, everybody!

I need to decode g729b stream from rtp. but I do not know how to do this. I try to use USC_G729A_Fxns and USC_G729I_Fxns, even USC_G7291_Fxns, but none of this can work correctlly. Does anybody can help me? thanks.

↧

UIC Multithreading

October 31, 2013, 4:05 pm

Latest and popular articles on Intel Technologies

≫ Next: Convert FFT to IPP

≪ Previous: how to decode g729b stream?

Hi All,

I've been looking into JPEG decoding performance via UIC. I've tried:

1) ippSetNumThreads() called with 1, 4

2) SetThreadingMode() called with JT_OLD, JT_RSTI

3) SetNOfThreads() called with 1,4,16

4) 8-bit jpegs with and without restart intervals

None of the combinations of these settings produces a measureable difference in the exeuction time for decoding a single JPEG image. The UIC we are using has been built with MS Visual Studio. We are running IPP version 7.0.6.278.

Any guidance on how to achieve the speedups as described in http://software.intel.com/en-us/articles/jpeg-new-threading-model-in-ipp would be much appreciated.

Thanks,

Michal

↧

Convert FFT to IPP

November 1, 2013, 1:05 am

Latest and popular articles on Intel Technologies

≫ Next: no matching version of IPP for crypto library

≪ Previous: UIC Multithreading

I want to convert a source script in c/c++ to IPP, but output's not right.

[cpp]void four1(double data[], int nn, int isign)

{
int n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
double tempr, tempi;

n = nn << 1;
j = 1;
for (i = 1; i < n; i += 2) {
if (j > i) {
tempr = data[j]; data[j] = data[i]; data[i] = tempr;
tempr = data[j+1]; data[j+1] = data[i+1]; data[i+1] = tempr;
}
m = n >> 1;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
}
mmax = 2;
while (n > mmax) {
istep = 2*mmax;
theta = TWOPI/(isign*mmax);
wtemp = sin(0.5*theta);
wpr = -2.0*wtemp*wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for (m = 1; m < mmax; m += 2) {
for (i = m; i <= n; i += istep) {
j =i + mmax;
tempr = wr*data[j] - wi*data[j+1];
tempi = wr*data[j+1] + wi*data[j];
data[j] = data[i] - tempr;
data[j+1] = data[i+1] - tempi;
data[i] += tempr;
data[i+1] += tempi;
}
wr = (wtemp = wr)*wpr - wi*wpi + wr;
wi = wi*wpr + wtemp*wpi + wi;
}
mmax = istep;
}
}[/cpp]

I convert to below source but output data is not right.

[cpp]

void fft_ipp( double data[], int nn, int isign )
{
const int order = (int)log(((double)(nn)) / log(2.0));
Ipp64fc *pTmp = ippsMalloc_64fc((1<<order)+2);

// Spec and working buffers
IppsFFTSpec_C_64fc * pFFTSpec=0;
Ipp8u *pFFTSpecBuf, *pFFTInitBuf, *pFFTWorkBuf;
// Query to get buffer sizes
int sizeFFTSpec,sizeFFTInitBuf,sizeFFTWorkBuf;
ippsFFTGetSize_C_64fc(order, IPP_FFT_NODIV_BY_ANY,
ippAlgHintAccurate, &sizeFFTSpec, &sizeFFTInitBuf, &sizeFFTWorkBuf);
// Alloc FFT buffers
pFFTSpecBuf = ippsMalloc_8u(sizeFFTSpec);
pFFTInitBuf = ippsMalloc_8u(sizeFFTInitBuf);
pFFTWorkBuf = ippsMalloc_8u(sizeFFTWorkBuf);
// Initialize FFT
ippsFFTInit_C_64fc(&pFFTSpec, order, IPP_FFT_NODIV_BY_ANY, ippAlgHintAccurate, pFFTSpecBuf, pFFTInitBuf);
if (pFFTInitBuf) ippFree(pFFTInitBuf);

// Do FFT
if(isign > 0)
{
ippsFFTFwd_CToC_64fc((Ipp64fc*)data, (Ipp64fc*)data, pFFTSpec, pFFTWorkBuf);
}
else
{
ippsFFTInv_CToC_64fc((Ipp64fc*)data, (Ipp64fc*)data, pFFTSpec, pFFTWorkBuf);
}
if (pFFTWorkBuf) ippFree(pFFTWorkBuf);
if (pFFTSpecBuf) ippFree(pFFTSpecBuf);
ippsFFTFree_C_64fc(pFFTSpec);
}

[/cpp]

Please help me!

↧