Hi,
I am trying to observe the performance level of the IPP, for decoding purposes. I installed all the IPP package correctly, and was able to do some encodin/decoding tests with different file formats.
Any ways, what I am interested is to see how IPP performs for decdoing JP2 stream.
I have an ASUS G74S computer, which has 8 Intel cores: Intel(R) Cpre(TM) i7-2630QM CPU @ 2.00 GHz
The file under test is an encoded jp2 stream, and the file size is about 230KB (and just for sake of information, it is a1920x1080 size image).
What is interesting is that I did a run with the advanced timing option, and looping about 30 times (whcih essentialy, is looping around the decoder function).
Well, it is worth to mention at this point, that line 741 of "uic_transcoder_con.cpp" reports the time per loop (decTime = msec / cmdOptions.loops), but I wanted to see the total time, so I removed the denominator (decTime = msec).
But, suprisingly, for the image size that I am testing with, it takes about 18 seconds to decode my image 30 times !!!
However, when I tried to do a comparison, with J2K-Codec, decoding the same image for 30 times, takes only about 4 seconds !!!
I don't think IPP should be this slow compared to J2K-Codec, but I am not sure what I am doing wrong?
Just as a test, I tried to open up my VC++ 2010, and under proerties, and under "Intel Performance Libraries", I set "Use IPP" to "NO"
And I was expecting to get a lot more slower result due to using pure CPU power. However, surprislngly, when I ran the exact same test, I got about the same number (18 sec) !
So, I am speculating this probably means that it didn't even used IPP at the first place. Out of curiosity, when I set "Use IPP" to "Multi-Threaded Static Library", it failed to compile, and it may make sense, since I may not have the proper libraries for multi-threading.
But, when I set it back to "Single-threaded Static Library", it compiles, and runs fine, but as I mentioned, it takes about 18 second, which is supper slow !!!
Did I forget to set something properly? Is this using the hardware accelerated primitives at all? If it deos use the hardware acceleration, then is this supposed to be this slow?! Do I need to do something to ensure that hardware acceleration will be fully utilized ?
Thanks,
--Rudy