Super-resolution image (huge image) support in Intel® IPP 2017 Beta

April 21, 2016, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: IPP dynamic linking with dispatching disabled?

≪ Previous: Adventures with ippiFilterWiener (1)

Intel® IPP 2017 Beta introduced the new APIs (Intel® IPP 64x functions) that can support 64-bit data length in the image and signal processing domains. The Intel® IPP 64x functions are implemented as wrappers over Intel® IPP functions operating on 32-bit sizes by using tiling and multithreading. The 64x APIs support external threading for Intel® IPP functions, and are provided in the form of source and pre-built binaries.

Attached is the file provide a quick summary for these functions. Your feedback is welcome if you have chance to evaluate these new APIs.

Intel IPP 2017 beta release is available as a part of the Intel Parallel Studio XE 2017 Beta, or Intel System Studio 2017 Beta. Please register for one of the two releases to get Intel IPP 2017 Beta release.

To sign up for the Intel Parallel Studio XE 2017 Beta:
Please visit the Intel Parallel Studio Beta Registration Page

To sign up for the Intel System Studio 2017 Beta:
Please visit the Intel System Studio Beta Page

To find more information for the Intel IPP Beta, please check the forum post:
https://software.intel.com/en-us/forums/intel-integrated-performance-primitives/topic/623345

Attachment	Size
Download IntelIPP64bitlengthfunctions.pdf	418.42 KB

↧

IPP dynamic linking with dispatching disabled?

April 26, 2016, 7:56 pm

Latest and popular articles on Intel Technologies

≫ Next: Adventures with ippiFilterWiener (2)

≪ Previous: Super-resolution image (huge image) support in Intel® IPP 2017 Beta

Hi staff,

I'm using the old Intel Parallel PXE 2011. I have a question about dynamic linking and dispatching.

I know that dynamic linking is default with dispatching enabled. Is there any option/function call to disable it?

I want this because i want to compare different data between static linking with no dispatching and dynamic linking (with dispatching disable).

Hope anyone can help me.

Best regards,

Tam.

↧

Adventures with ippiFilterWiener (2)

April 27, 2016, 1:30 am

Latest and popular articles on Intel Technologies

≫ Next: Images, Stride and Memory Alignment Question (IPP)

≪ Previous: IPP dynamic linking with dispatching disabled?

The documentation for ippiFilterWiener says the following about the noise parameter:

If this parameter is not defined (noise = 0), then the function estimates the noise level by averaging
through the image of all local variances σi,j, and stores the corresponding values in the noise for further use.

However, when I pass noise=0.0, ippiFilterWiener crashes (with what looks like a corrupted stack). This is with IPP 9.0.2 on Mac OS X and an Intel 2.5 GHz Core 2 Duo MacBook Pro. It happens with all ippiFilterWiener variants I tried. Below, gdb info for ippiFilterWiener_32f_C1R.

Regards,

Adriaan van Os

(gdb) bt
#0 0x0044a680 in s8_ownippsSum_32fc_Accur ()
#1 0x0044a75f in s8_ownippsSum_32f_Fast ()

(gdb) info registers all
eax 0x612a9c 6367900
ecx 0x1fe 510
edx 0x44a650 4499024
ebx 0x810 2064
esp 0xbfff8474 0xbfff8474
ebp 0xbfff848c 0xbfff848c
esi 0x62f000 6483968
edi 0xb 11
eip 0x44a680 0x44a680
eflags 0x210216 2163222
cs 0x17 23
ss 0x1f 31
ds 0x1f 31
es 0x1f 31
fs 0x0 0
gs 0x37 55
st0 -nan(0xc000000000000000) (raw 0xffffc000000000000000)
st1 1.8429914817468991589404685143008851e-37 (raw 0x3f84fadae16000000000)
st2 1.7609456388294814816641033672841063e-37 (raw 0x3f84efb0000000000000)
st3 7.3921886536716230873355441867864221e-38 (raw 0x3f83c93c094000000000)
st4 1.2518971788368537901486625344444118e-37 (raw 0x3f84aa66400000000000)
st5 1.2952924775761875792156629182813672e-38 (raw 0x3f818d0b7c0000000000)
st6 6.9833720180014973328020631173089637e-38 (raw 0x3f83be1b000000000000)
st7 3.6056194214217565132427279466616184e-38 (raw 0x3f82c44f000000000000)
fctrl 0x137f 4991
fstat 0x3863 14435
ftag 0x8000 32768
fiseg 0x17 23
fioff 0x44a6a8 4499112
foseg 0x1f 31
fooff 0x0 0
fop 0x1f7 503
xmm0 {
v4_float = {0, 0, 0, 0.187211409},
v2_double = {0, 5.1598354856965341e-315},
v16_int8 = '\0' , ">?\264Y",
v8_int16 = {0, 0, 0, 0, 0, 0, 15935, -19367},
v4_int32 = {0, 0, 0, 1044362329},
v2_int64 = {0, 1044362329},
uint128 = 6463860900704026624
} (raw 0x59b43f3e000000000000000000000000)
xmm1 {
v4_float = {0, 0, 0, 0.067506507},
v2_double = {0, 5.1010832593468362e-315},
v16_int8 = '\0' , "=\212@\332",
v8_int16 = {0, 0, 0, 0, 0, 0, 15754, 16602},
v4_int32 = {0, 0, 0, 1032470746},
v2_int64 = {0, 1032470746},
uint128 = 15726721893375410176
} (raw 0xda408a3d000000000000000000000000)
xmm2 {
v4_float = {0, 0, 0, 0.144226447},
v2_double = {0, 5.1455833123493324e-315},
v16_int8 = '\0' , ">\023\260\031",
v8_int16 = {0, 0, 0, 0, 0, 0, 15891, -20455},
v4_int32 = {0, 0, 0, 1041477657},
v2_int64 = {0, 1041477657},
uint128 = 1851000603858173952
} (raw 0x19b0133e000000000000000000000000)
xmm3 {
v4_float = {-0.0061361026, -0.0371395648, 0.222022787, 0.15741688},
v2_double = {-1.0616635290980361e-20, 3.6044674592801975e-08},
v16_int8 = "\273\311\021\\\275\030\037\250>cY\361>!1\344",
v8_int16 = {-17463, 4444, -17128, 8104, 15971, 23025, 15905, 12772},
v4_int32 = {-1144450724, -1122492504, 1046698481, 1042362852},
v2_int64 = {-4915378428291047512, 4495535745710240228},
uint128 = 0xbbc9115cbd181fa83e6359f13e2131e4
} (raw 0xe431213ef159633ea81f18bd5c11c9bb)
xmm4 {
v4_float = {0, 0, 0.248581812, 0.173564509},
v2_double = {0, 1.1379934465744102e-07},
v16_int8 = "\000\000\000\000\000\000\000\000>~\214;>1\272\345",
v8_int16 = {0, 0, 0, 0, 15998, -29637, 15921, -17691},
v4_int32 = {0, 0, 1048480827, 1043446501},
v2_int64 = {0, 4503190863491480293},
uint128 = 16553597523710475838
} (raw 0xe5ba313e3b8c7e3e0000000000000000)
xmm5 {
v4_float = {0, -0.0061361026, -0.031003464, 0.00444444502},
v2_double = {1.5565620048787301e-314, -6.6569817986579441e-15},
v16_int8 = "\000\000\000\000\273\311\021\\\274\375\372\372;\221\242\265",
v8_int16 = {0, 0, -17463, 4444, -17155, -1286, 15249, -23883},
v4_int32 = {0, -1144450724, -1124205830, 999400117},
v2_int64 = {3150516572, -4828427272823135563},
uint128 = 0x00000000bbc9115cbcfdfafa3b91a2b5
} (raw 0xb5a2913bfafafdbc5c11c9bb00000000)
xmm6 {
v4_float = {0.111111112, 0.111111112, 0.111111112, 0.111111112},
v2_double = {1.4228543251986994e-10, 1.4228543251986994e-10},
v16_int8 = "=\343\2169=\343\2169=\343\2169=\343\2169",
v8_int16 = {15843, -29127, 15843, -29127, 15843, -29127, 15843, -29127},
v4_int32 = {1038323257, 1038323257, 1038323257, 1038323257},
v2_int64 = {4459564432529526329, 4459564432529526329},
uint128 = 0x3de38e393de38e393de38e393de38e39
} (raw 0x398ee33d398ee33d398ee33d398ee33d)
xmm7 {
v4_float = {0, 0, 0, 0},
v2_double = {0, 0},
v16_int8 = '\0' ,
v8_int16 = {0, 0, 0, 0, 0, 0, 0, 0},
v4_int32 = {0, 0, 0, 0},
v2_int64 = {0, 0},
uint128 = 0
} (raw 0x00000000000000000000000000000000)
mxcsr 0x1faf 8111
mm0 {
uint64 = -4301219119115534336,
v2_int32 = {0, -1001455616},
v4_int16 = {0, 0, 0, -15281},
v8_int8 = "\000\000\000\000\000\000O\304"
} (raw 0xc44f000000000000)
mm1 {
uint64 = -4611686018427387904,
v2_int32 = {0, -1073741824},
v4_int16 = {0, 0, 0, -16384},
v8_int8 = "\000\000\000\000\000\000\000\300"
} (raw 0xc000000000000000)
mm2 {
uint64 = -370736216871534592,
v2_int32 = {0, -86318752},
v4_int16 = {0, 0, -7840, -1318},
v8_int8 = "\000\000\000\000`\341\332\372"
} (raw 0xfadae16000000000)
mm3 {
uint64 = -1175439502743699456,
v2_int32 = {0, -273678336},
v4_int16 = {0, 0, 0, -4176},
v8_int8 = "\000\000\000\000\000\000\260\357"
} (raw 0xefb0000000000000)
mm4 {
uint64 = -3946269003000840192,
v2_int32 = {0, -918812352},
v4_int16 = {0, 0, 2368, -14020},
v8_int8 = "\000\000\000\000@\t<\311"
} (raw 0xc93c094000000000)
mm5 {
uint64 = -6168172270893137920,
v2_int32 = {0, -1436139520},
v4_int16 = {0, 0, 16384, -21914},
v8_int8 = "\000\000\000\000\000@f\252"
} (raw 0xaa66400000000000)
mm6 {
uint64 = -8283390750176051200,
v2_int32 = {0, -1928627200},
v4_int16 = {0, 0, 31744, -29429},
v8_int8 = "\000\000\000\000\000|\v\215"
} (raw 0x8d0b7c0000000000)
mm7 {
uint64 = -4748201382132056064,
v2_int32 = {0, -1105526784},
v4_int16 = {0, 0, 0, -16869},
v8_int8 = "\000\000\000\000\000\000\033\276"
} (raw 0xbe1b000000000000)

↧

Images, Stride and Memory Alignment Question (IPP)

April 29, 2016, 9:40 am

Latest and popular articles on Intel Technologies

≫ Next: Planar vs. Interleaved and Processing Speed Question (IPP + TBB)

≪ Previous: Adventures with ippiFilterWiener (2)

Hello there,

Is it worth to align scan lines in a image so each row begins on 16-aligned memory? That is, round up the stride to the next multiple of 16 bytes?

I assume this might help a bit when processing the entire image, but the real question is: does IPP cares?

If yes, along the same line, is it worth to 32-align scan lines on CPUs that have a 256 bit vector unit, or 64-align for AVX 512 chips?

Thanks,
Axel

↧

Planar vs. Interleaved and Processing Speed Question (IPP + TBB)

April 29, 2016, 10:08 am

Latest and popular articles on Intel Technologies

≫ Next: Fourier transfom in c#

≪ Previous: Images, Stride and Memory Alignment Question (IPP)

Hi there,

Just a general question: suppose I can chose between dealing with planar image data (4:4:4 YCbCr) or a standard interleaved RGB or BGR image.

From a processing performance perspective, does the planar data offers better performance potential than the interleaved data?

With planar data, it occurs to me that I could run three single-plane IPP operations ("C1") at once using tbb:parallel_invoke, would this be faster than with a single call to the corresponding interleaved ("C3") IPP operation?

Does that even make sense? If I have, say, 4:2:2 data, I assume planar w/TBB would win against RGB as there is less data to process to begin with?

Thanks,
Axel

↧

Fourier transfom in c#

May 2, 2016, 6:29 am

Latest and popular articles on Intel Technologies

≫ Next: Multi Threading Performance in Multiplication of 2 Arrays / Images - Intel IPP

≪ Previous: Planar vs. Interleaved and Processing Speed Question (IPP + TBB)

Hi,

I have been looking for a working example for the fourier transform using IPP in C#. Issue here is most of the example code here don’t work with the IPP version 9.0.

I wrote following code finally but unfortunately no data is assigned to the destination array. What might be the problem here?

And I have real signal and I create a complex signal to apply fft. Is it better to do fft directly using real signal?

Thanks!

using System.Runtime.InteropServices;
using ipp;

namespace IntelIPP_Test
{
    unsafe public class Program
    {
        // Spec and working buffers
        static IppsFFTSpec_C_32fc* pFFTSpec;
        static byte[] pFFTSpecBuf, pFFTInitBuf, pFFTWorkBuf;

        // Allocate complex buffers
        static Ipp32fc[] pSrc, pDst;

        // Query to get buffer sizes
        static int _sizeFFTSpec, _sizeFFTInitBuf, _sizeFFTWorkBuf;

        //Set the size
        static int N = 128;
        static int order = (int)(Math.Log10((double)N) / Math.Log10(2.0));

        static void Main(string[] args)
        {
            // Query to get buffer sizes
            int ippDivisionAlgorithm = 8; // (int)sp.IPP_FFT_DIV_INV_BY_N;
            IppHintAlgorithm ippPerformanceHint = IppHintAlgorithm.ippAlgHintAccurate;

            IppStatus result;

            fixed (int* sizeFFTSpec = &_sizeFFTSpec, sizeFFTInitBuf = &_sizeFFTInitBuf, sizeFFTWorkBuf = &_sizeFFTWorkBuf)
            {
                result = sp.ippsFFTGetSize_C_32fc(order, ippDivisionAlgorithm, ippPerformanceHint,
                    sizeFFTSpec, sizeFFTInitBuf, sizeFFTWorkBuf);
            }

            // Alloc FFT buffers
            pFFTSpecBuf = new byte[_sizeFFTSpec];
            pFFTInitBuf = new byte[_sizeFFTInitBuf];
            pFFTWorkBuf = new byte[_sizeFFTWorkBuf];

            // Initialize FFT
            fixed (byte* p_dftInitBuf = pFFTInitBuf)
            fixed (byte* p_dftSpecBuf = pFFTSpecBuf)
            {
                var p_dftSpec = (IppsFFTSpec_C_32fc*)pFFTSpec;

                result = sp.ippsFFTInit_C_32fc(&p_dftSpec, order, ippDivisionAlgorithm, ippPerformanceHint, p_dftSpecBuf, p_dftInitBuf);
            }           

            getData(); // to assign data to pSrc

            fixed (Ipp32fc* pSource = pSrc, pssDst = pDst)
            fixed (byte* p_workBuffer = pFFTWorkBuf)
            fixed (byte* p_dftSpecBuf = pFFTSpecBuf)
            {
                var p_dftSpec = (IppsFFTSpec_C_32fc*)p_dftSpecBuf;

                // Fast Forward Fourier to spectra domain
                sp.ippsFFTFwd_CToC_32fc(pSource, pssDst, p_dftSpec, p_workBuffer);
            }
        }

↧

Multi Threading Performance in Multiplication of 2 Arrays / Images - Intel IPP

May 2, 2016, 7:47 am

Latest and popular articles on Intel Technologies

≫ Next: Getting stuck in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE

≪ Previous: Fourier transfom in c#

I'm using Intel IPP for multiplication of 2 Images (Arrays).
I'm using Intel IPP 8.2 which comes with Intel Composer 2015 Update 6.

I created a simple function to multiply too large images (The whole project is attached, see below).
I wanted to see the gains using Intel IPP Multi Threaded Library.

Here is the simple project (I also attached the complete project form Visual Studio):

#include "ippi.h"
#include "ippcore.h"
#include "ipps.h"
#include "ippcv.h"
#include "ippcc.h"
#include "ippvm.h"

#include <ctime>
#include <iostream>

using namespace std;

const int height = 6000;
const int width  = 6000;
Ipp32f mInput_image [1 * width * height];
Ipp32f mOutput_image[1 * width * height] = {0};

int main()
{
    IppiSize size = {width, height};

    double start = clock();

    for (int i = 0; i < 200; i++)
        ippiMul_32f_C1R(mInput_image, 6000 * 4, mInput_image, 6000 * 4, mOutput_image, 6000 * 4, size);

    double end = clock();
    double douration = (end - start) / static_cast<double>(CLOCKS_PER_SEC);

    cout << douration << endl;
    cin.get();

    return 0;
}

I compiled this project once using Intel IPP Single Threaded and once using Intel IPP Multi Threaded.

I tried different sizes of arrays and in all of them the Multi Threaded version yields no gains (Sometimes it is even slower).

I wonder, how come there is no gain in this task with multi threading?
I know Intel IPP uses the AVX and I thought maybe the task becomes Memory Bounded?

I tried another approach by using OpenMP manually to have Multi Threaded approach using Intel IPP Single Thread implementation.
This is the code:

#include "ippi.h"
#include "ippcore.h"
#include "ipps.h"
#include "ippcv.h"
#include "ippcc.h"
#include "ippvm.h"

#include <ctime>
#include <iostream>

using namespace std;

#include <omp.h>

const int height = 5000;
const int width  = 5000;
Ipp32f mInput_image [1 * width * height];
Ipp32f mOutput_image[1 * width * height] = {0};

int main()
{
    IppiSize size = {width, height};

    double start = clock();

    IppiSize blockSize = {width, height / 4};

    const int NUM_BLOCK = 4;
    omp_set_num_threads(NUM_BLOCK);

    Ipp32f*  in;
    Ipp32f*  out;

    //  ippiMul_32f_C1R(mInput_image, width * 4, mInput_image, width * 4, mOutput_image, width * 4, size);

    #pragma omp parallel            \
    shared(mInput_image, mOutput_image, blockSize) \
    private(in, out)
    {
        int id   = omp_get_thread_num();
        int step = blockSize.width * blockSize.height * id;
        in       = mInput_image  + step;
        out      = mOutput_image + step;
        ippiMul_32f_C1R(in, width * 4, in, width * 4, out, width * 4, blockSize);
    }

    double end = clock();
    double douration = (end - start) / static_cast<double>(CLOCKS_PER_SEC);

    cout << douration << endl;
    cin.get();

    return 0;
}

The results were the same, again, no gain of performance.

Is there a way to benefit from Multi Threading in this kind of task?
How can I validate whether a task becomes memory bounded and hence no benefit in parallelize it? Are there benefit to parallelize task of multiplying 2 arrays on CPU with AVX?

The Computers I tried it on is based on Core i7 4770k (Haswell).

Here is a link to the Project in Visual Studio 2013.

Thank You.

↧

Getting stuck in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE

May 2, 2016, 10:00 am

Latest and popular articles on Intel Technologies

≫ Next: Identifier not found

≪ Previous: Multi Threading Performance in Multiplication of 2 Arrays / Images - Intel IPP

One of our customers is reporting an issue which we have isolated to the Intel IPP for GSMAMR processing. After forcing a core dump we have determined that we randomly get stuck in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE. We had been using IPP 8.2.1 on Linux and, due to issues we previously had observed on Windows, updated to IPP 8.2.3 but the problem persists. In addition to the IPP update, we changed the sample code to use the ippsAlgebraicCodebookSearchEX function as was recommended from the Windows issue. Would greatly appreciate any suggestions to resolve or work around this issue.

Thanks - Bob / Dialogic

Back trace from the forced core dump when thread is hung.

Thread 62 (Thread 0x7f58eb9fc700 (LWP 26864)):
#0 0x00007f598a730fe8 in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE () from /usr/dialogic/data/ssp.mlm
#1 0x00007f598a54232f in e9_ownAlgebraicCodebookSearch_M122_GSMAMR_16s () from /usr/dialogic/data/ssp.mlm
#2 0x00007f598a541f0a in e9_ownsAlgebraicCodebookSearch_GSMAMR_16s () from /usr/dialogic/data/ssp.mlm
#3 0x00007f598a516ad0 in e9_ippsAlgebraicCodebookSearchEX_GSMAMR_16s () from /usr/dialogic/data/ssp.mlm
#4 0x00007f598a4ec7f5 in ownEncode_GSMAMR (encSt=0x7f5971e9dc18, rate=<value optimized out>, pAnaParam=0x7f58eb9fb5ce,
pVad=<value optimized out>, pSynthVec=0x7f58eb9fb470)
at /cm/vobs/3rdparty/components/intel/ipp-samples.7.1.1.013/sources/speech-codecs/codec/speech/gsmamr/src/encgsmamr.c:589
#5 0x00007f598a4ecefd in apiGSMAMREncode (encoderObj=0x7f5971e9dc00, src=<value optimized out>, rate=GSMAMR_RATE_12200,
dst=0x7f589188ef10 "", pVad=0x7f58eb9fb7d4)
at /cm/vobs/3rdparty/components/intel/ipp-samples.7.1.1.013/sources/speech-codecs/codec/speech/gsmamr/src/encgsmamr.c:313
#6 0x00007f598a068063 in GSMAMR_Encode (handle=0x7f58eb9fa8c0, src=0x2, rate=GSMAMR_RATE_DTX, dst=
0xffff7e2f <Address 0xffff7e2f out of bounds>, pVad=0x7) at x86/gsmamrapi.c:154
#7 0x00007f598a2ae413 in GSMAMREncode (pCodec=0x7f589188ee88, pSrcData=0x2, ppCodedData=0x7f58eb9fbdb0,
numSamples=<value optimized out>, idtmfFlag=<value optimized out>, silenceFlag=1207968416) at codec.c:1740

Environment details from IPP debug we have in our code.

DisplayIPPCPUFeatures: 0x4a : 0x60
ippCore 8.2.3 (r48108)
ippIP AVX2 (l9) 8.2.3 (r48108)
ippSP AVX2 (l9) 8.2.3 (r48108)
ippVC AVX2 (l9) 8.2.3 (r48108)
Processor supports Advanced Vector Extensions 2 instruction set
4 cores on die
ippGetMaxCacheSizeB 8192 k
Available 0xefff Enabled 0xefff
MMX A E
SSE A E
SSE2 A E
SSE3 A E
SSSE3 A E
MOVBE A E
SSE41 A E
SSE42 A E
AVX A E
AVX(OS) A E
AES A E
CLMUL A E
ABR X X
RDRRAND A E
F16C A E
AVX2 A E
ADCOX X X
RDSEED X X
PREFETCHW X X
SHA X X
KNC X X

↧

Identifier not found

May 4, 2016, 7:43 am

Latest and popular articles on Intel Technologies

≫ Next: IPPS FFT Initialization has changed (Headers vs. Documentation)

≪ Previous: Getting stuck in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE

I am using IPP 8.1.1 as static library at 64b environments. As the code should be compiled into environment without VC and IPP installed, I do not use IPP options of IDE.

I include into *.cpp file two headers:

#include "ipp_n8.h"
#include "ippcv.h"

and have got from VC compiler the error.

error C3861: 'n8_ippiFilterGaussianGetBufferSize': identifier not found

The symbol ippiFilterGaussianGetBufferSize is defined into ippcv.h

and as i suppose should be replaced in accordance macro from ipp_n8.h

#define ippiFilterGaussianGetBufferSize n8_ippiFilterGaussianGetBufferSize

Where is my error? What is the way to correct it?

Thanks

↧

IPPS FFT Initialization has changed (Headers vs. Documentation)

May 4, 2016, 8:57 am

Latest and popular articles on Intel Technologies

≫ Next: IPP DFT Real-to-Complex in-place

≪ Previous: Identifier not found

I don't know when, but obviously did the Initialization of the IPPS FFT functions change.

ippsFFTInitAlloc is no longer available.

The current initialization process using

ippsFFTInit_R_32f,
                   ( IppsFFTSpec_R_32f** ppFFTSpec,
                     int order, int flag, IppHintAlgorithm hint,
                     Ipp8u* pSpec, Ipp8u* pSpecBuffer ))

is not clear and also not documented (the documentation still refers to the deprecated ippsFFTInitAlloc).

Any thoughts? Thanks!

↧

IPP DFT Real-to-Complex in-place

May 4, 2016, 6:13 am

Latest and popular articles on Intel Technologies

≫ Next: Undefined Behaviour with ippsModInv_BN

≪ Previous: IPPS FFT Initialization has changed (Headers vs. Documentation)

What happened to the in-place functions of the real-to-complex DFTs in IPPS?

The documentations mentions additional functions with only one argument (pSrcDst) for in-place calculation. However, the header provides such in-place functions only for FFT and not for DFT transformations.

Calling ippsDFTFwd_RToCCS_32f(in, out, spec, buf) with in==out results in an error: ippStsContextMatchErr Invalid context structure.

Any thoughts? Thanks!

↧

Undefined Behaviour with ippsModInv_BN

May 6, 2016, 6:28 pm

Latest and popular articles on Intel Technologies

≫ Next: RGB To HLS 8u P3-C3 conversions

≪ Previous: IPP DFT Real-to-Complex in-place

Dear Intel Community,

I am trying to compute the multiplicative inverse of two numbers, using ippsModInv_BN. However, I get an undefined behaviour, and incorrect results. The issue occurs when executing the following:

#include <iostream>
#include <ippcore.h>
#include <ippcp.h>
#include <ipps.h>
#include <ippdefs.h>
#include <ippch.h>

// Use the BigNumber class available at https://software.intel.com/en-us/node/503848
// And initialize the values similarly to https://software.intel.com/en-us/node/503498
#include "xsample_bignum.h"

using namespace std;

void modInv (BigNumber& P, BigNumber& Q)
{

//  Calculate the multiplicative inverse of a positive integer - big number Q
//  with respect to specified modulus P, using the same routine as implemented
//  in xsample_bignum.cpp
//
//  BigNumber BigNumber::InverseMul(const BigNumber& a) const
//  {
//      BigNumber r(*this);
//      ippsModInv_BN(BN(a), BN(*this), BN(r));
//      return r;
//  }

    BigNumber R(P);
    IppStatus status = ippsModInv_BN(BN(Q), BN(P), BN(R));

    cout << "Status: ";
    switch (status) {
        case ippStsNoErr         : cout << "ippStsNoErr"<< endl; break;
        case ippStsBadArgErr     : cout << "ippStsBadArgErr"<< endl; break;
        case ippStsNullPtrErr    : cout << "ippStsNullPtrErr"<< endl; break;
        case ippStsBadModulusErr : cout << "ippStsBadModulusErr"<< endl; break;
        case ippStsOutOfRangeErr : cout << "ippStsOutOfRangeErr"<< endl; break;
        default:
            cout << "Unknown error code: "<< status << endl;
    }

    cout << "P: "<< P << endl;
    cout << "Q: "<< Q << endl;
    cout << "R: "<< R << endl;
    cout << endl << endl;
}

void test1 ()
{
    BigNumber P("0x098A0803974924E2671D9091044FE4ED0A6BA0978A9651D84EC5D2F97E3615CD555D504DD81B5832F884829D914ABD8AFEE8608A851AF569C3520C47E4D35646F");
    BigNumber Q("0x0BC612FF163A36AD648521120507CD2D4ADFC5DAC68856F6B45BBF101EFDB4A8D4656E8E2099C1DC3B7CFA16F57192ACA707E0C41E499837758E7A28E54BA6317");
    modInv(P, Q);

    // Expected:
    // 0x7F705DA34BC5CD9030BB4B5D0E4B2DEAC5734DD140076FA07C09B913F9C92B707247245BD0A96BD03EF1A84B11856519F8AB9247DC331C7B11A6B1636125AD16

    // Obtained:
    // 0x98A0803974924E2671D9091044FE4ED0A6BA0978A9651D84EC5D2F97E3615CD555D504DD81B5832F884829D914ABD8AFEE8608A851AF569C3520C47E4D35646F
}

void test2 ()
{
    BigNumber P("0x0BC612FF163A36AD648521120507CD2D4ADFC5DAC68856F6B45BBF101EFDB4A8D4656E8E2099C1DC3B7CFA16F57192ACA707E0C41E499837758E7A28E54BA6317");
    BigNumber Q("0x0B7EE5771917C2E470D42E54F8D40BE052BFA0413CC90E8DF14D983E1F490B6FE4B1856996417A0A5BDE8383BE18638D1A9DEC06E2E9386A289D6A1250D492973");
    modInv(P, Q);

    // Expected:
    // 0x5E73BECF77A0BE120FC9A1F7DBB69719755630772A95B840344737429CCAAB7E9D291B3E6E569EEDAEB92C88A389D4A50C8EEA795C5BB10401CA355878C72432

    // Obtained:
    // 0x5E73BECF77A0BE120FC9A1F7DBB69719755630772A95B840344737429CCAAB7E9D291B3E6E569EEDAEB92C88A389D4A50C8EEA795C5BB10401CA355878C72432
}

int main () {

    ippInit();
    cout << "Using Intel IPP Crypto"<< endl;
    const IppLibraryVersion * version = ippsGetLibVersion ();
    printf("%s %s %s\n", version->Name, version->Version, version->BuildDate);
    cout << "================================================"<< endl;

    test1 ();
    test2 ();
    return 0;
}

The problem occurs in the first test case (test1), and instead of the correct result, I get the same value of P. Furthermore, the status that is returned by ippsModInv_BN is -13, which does not correspond to the expected return values. The output that I obtain is the following:

Using Intel IPP Crypto
ippSP AVX (e9) 9.0.3 (r51269) Apr  8 2016
================================================

Uknown error code: -13
P:  0x98A0803974924E2671D9091044FE4ED0A6BA0978A9651D84EC5D2F97E3615CD555D504DD81B5832F884829D914ABD8AFEE8608A851AF569C3520C47E4D35646F
Q:  0xBC612FF163A36AD648521120507CD2D4ADFC5DAC68856F6B45BBF101EFDB4A8D4656E8E2099C1DC3B7CFA16F57192ACA707E0C41E499837758E7A28E54BA6317
R:  0x98A0803974924E2671D9091044FE4ED0A6BA0978A9651D84EC5D2F97E3615CD555D504DD81B5832F884829D914ABD8AFEE8608A851AF569C3520C47E4D35646F

ippStsNoErr
P:  0xBC612FF163A36AD648521120507CD2D4ADFC5DAC68856F6B45BBF101EFDB4A8D4656E8E2099C1DC3B7CFA16F57192ACA707E0C41E499837758E7A28E54BA6317
Q:  0xB7EE5771917C2E470D42E54F8D40BE052BFA0413CC90E8DF14D983E1F490B6FE4B1856996417A0A5BDE8383BE18638D1A9DEC06E2E9386A289D6A1250D492973
R:  0x5E73BECF77A0BE120FC9A1F7DBB69719755630772A95B840344737429CCAAB7E9D291B3E6E569EEDAEB92C88A389D4A50C8EEA795C5BB10401CA355878C72432

Is there any workaround for this issue ?

Thanks,
Alen

↧

RGB To HLS 8u P3-C3 conversions

May 9, 2016, 2:50 am

Latest and popular articles on Intel Technologies

≫ Next: Filters with Fixed Kernel corrections

≪ Previous: Undefined Behaviour with ippsModInv_BN

Hello,

I was trying to replace ippiRGBToHLS_8u_C3R with ippiBGRToHLS_8u_C3P3R.

The functions give differing results for some RGB values, and ippiBGRToHLS_8u_C3P3R gives differing results for the same RGB values.

Here is an example:

// a BGR image, 5 pixels per row (R,G,B)
// padding to 16 bytes
IppStatus st;
Ipp8u BGR[16] = {  91, 182, 204,  91, 182, 204,  91, 182, 204,  91, 182, 204,  91, 182, 204, 0 };
Ipp8u RGB[16] = { 204, 182,  91, 204, 182,  91, 204, 182,  91, 204, 182,  91, 204, 182,  91, 0 };
IppiSize roi = { 5, 1 };

// convert RGB->HLS
Ipp8u HLS3ch[16];
st = ippiRGBToHLS_8u_C3R(RGB, 16, HLS3ch, 16, roi);
// extract to channels
Ipp8u H0[8 ], L0[8 ], S0[8 ];
Ipp8u* pHLS0[] = { H0, L0, S0 };
st = ippiCopy_8u_C3P3R(HLS3ch, 16, pHLS0, 8, roi);

Ipp8u H[8 ], L[8 ], S[8 ];
Ipp8u* pHLS[] = { H, L, S };
st = ippiBGRToHLS_8u_C3P3R(BGR, 16, pHLS, 8, roi);

The H0 and H channels are identical, as are the L0 and L channels. But

S0 = 134 134 134 134 134

S = 133 133 133 133 134

Using: i7-4790, IPP8.1 32bit, Win10 64bit.

↧

Filters with Fixed Kernel corrections

May 9, 2016, 3:14 am

Latest and popular articles on Intel Technologies

≫ Next: ippiWarpPerspective returning ippStsRectErr

≪ Previous: RGB To HLS 8u P3-C3 conversions

In the headers of IPP 9.0.2 in ippi.h, some of the descriptions of Filters with Fixed Kernel are wrong (but they are right in the documentation)

1. SobelHoriz is listed twice, once with "(3x3)" and once without
2. SobelVert is listed twice, once with "(3x3)" and once without
3. Sharpen is incorrect, three times "1" instead of "-1"
4. Laplace (3x3) is incorrect, three times "1" instead of "-1"
5. SobelHorz 5x5 is incorrect, "-4" instead of "-2" in the fourth row, last column

Regards,

Adriaan van Os

↧

ippiWarpPerspective returning ippStsRectErr

May 10, 2016, 8:08 am

Latest and popular articles on Intel Technologies

≫ Next: IPP library optimization based on CPU type

≪ Previous: Filters with Fixed Kernel corrections

I have an application which uses ippiWarpPerspective to apply a perspective warp to an image. The parameters of the perspective warp depend on the input parameters to the application. In certain situations, the dstRoi I want is actually a single row of pixels. Unfortunately ippiWarpPerspective returns ippStsRectErr in this case (as stated in the documentation).

More often than not, when this happens it turns out that I actually don't need a full perspective warp but only a copy or a translation and/or rotation. I can detect these circumstances and use alternative functions (which also gives me a speed improvement). However, in some cases I also need the non-uniform sample spacing of an actual perspective transform.

Does anyone have any neat suggestions for how to solve my problem?

↧

IPP library optimization based on CPU type

May 10, 2016, 6:41 pm

Latest and popular articles on Intel Technologies

≫ Next: ResizeYUV422Super/ ResizeYUV422Lanczos don't exist ?

≪ Previous: ippiWarpPerspective returning ippStsRectErr

Hello,

I saw the ippGetCpuFeatures() regarding IPP library as below but I'm wondering about IPP optimization.

ippGetCpuFeatures() (*) that can be used to detect your processor features. It is declared in ippcore.h…..

The Intel(R) IPP library contains a collection of functionally identical processor-specific optimized libraries that are “dispatched” at run-time. The “dispatcher” chooses which of these processor-specific optimized libraries to use when your application makes a call into the IPP library. This is done to maximize each function’s use of the underlying SIMD instructions and other architecture-specific features….

Question:

1. IPP library fetches and runs optimized set at runtime the processing after detecting CPU feature(CPU/APU) automatically? Especailly APU..

2. There are some ways to optimize for maximizing the capability of IPP libraries?

↧

ResizeYUV422Super/ ResizeYUV422Lanczos don't exist ?

May 12, 2016, 11:32 am

Latest and popular articles on Intel Technologies

≫ Next: LUT Function Interpolation Method

≪ Previous: IPP library optimization based on CPU type

The older libraries we had (Intel 11.1) had support for resizing w/supersampling for non-planar YUV422 signals. The V16Update 2 has only NN and Linear for the same. Is there any long term reason for this functionality being removed ? Is the only way to split the channels and then work on them individually with ResizeAntialiasing() for best results with minimal Moire artifacts ?

↧

LUT Function Interpolation Method

May 13, 2016, 12:37 pm

Latest and popular articles on Intel Technologies

≫ Next: Access to old version (7.0.x)?

≪ Previous: ResizeYUV422Super/ ResizeYUV422Lanczos don't exist ?

Hello,

The current LUT function supports 3 interpolation modes:

Nearest Neighbor.
Linear.
Cubic.

Could you please add the option for "Monotone Cubic Interpolation"?
Most curve must maintain the Monotonoic property to make sense and avoid artifacts.

Moreover, if you could implement something like MATLAB's `interp1` with all its interpolation methods it would be great.

Thank You.

↧

Access to old version (7.0.x)?

May 16, 2016, 4:09 am

Latest and popular articles on Intel Technologies

≫ Next: IPP: Linux: GCCv5 support

≪ Previous: LUT Function Interpolation Method

Hello!

Does anyone know how to get the old version(s)? We purchased v7.1 for windows a few years ago but now we need the Linux one and they no longer sell it. I also applied for the community license but it doesn't grant access to any of earlier versions.

Are we supposed to buy the Parallel Studio or System Studio? But is the access to older version guaranteed?

PS: I already downloaded the files, but neither our old license nor the new community license can be applied.

↧

IPP: Linux: GCCv5 support

May 17, 2016, 7:59 am

Latest and popular articles on Intel Technologies

≫ Next: DTMF Detection

≪ Previous: Access to old version (7.0.x)?

Does Intel provide builds of IPP (latest, v9 I presume) for Linux built with GCCv5 (ie: 5.3.1 or later), or just GCCv4.8 releases?

↧