Quantcast
Channel: Intel® Software - Intel® Integrated Performance Primitives
Viewing all articles
Browse latest Browse all 1489

ippsDotProd_32f Performance on Haswell CPU

$
0
0

Hi,

at the moment I'm using ippsDotProd_32f in IPP 7.0 quite extensively in one of my projects. I now tested IPP 8.2 on a Haswell CPU (Xeon e5-2650 v3 in a HP z640 workstation) with this project because I expected it to be significantly faster (see below). Actually, the code was about 10% slower using IPP 8.2 which I found quite disturbing.

I created a test program (see below) to verify this and found that ippsDotProd_32f (as well as some other functions) seem to be slower in IPP 8.2 as compared to IPP 7.0 if one uses a lot but rather small arrays of about 100 entries. For larger arrays the speed seems to be equal.

Unfortunately this is exactly what I have to do in my project. Now two questions arise:

 

1. What can I do to make my code work at least with the speed of IPP 7.0 event if I use IPP 8.2

2. Why is ippsDotProd_32f on a Haswell CPU not actually significantly faster? My assumptions are based on this article (section 3.1):

https://software.intel.com/en-us/articles/intel-xeon-processor-e5-2600-v...
 

Where it is stated that Haswell CPUs have two FMA units and therefore should be much faster calculating dot products. Furthermore it is stated in https://software.intel.com/en-us/articles/haswell-support-in-intel-ipp that ippsDotProd_32f should actually profit from this fact, at least in IPP versions larger 7.0

 

I'm very thankful for assistance here! Apparently I understood something wrong? Here is my test code, it was compiled with Visual Studio 2012 on a non-Haswell-computer but the tests were run on the mentioned Haswell-system:

 

#include "stdafx.h"
#include "windows.h"
#include "ipp.h"
#include "ipps.h"
#include "ippcore.h"



int main(int argc, _TCHAR* argv[])
{

	IppStatus IPP_Init_status;
	IPP_Init_status=ippInit();
	printf("%s\n", ippGetStatusString(IPP_Init_status) );
	const IppLibraryVersion *lib;
	lib = ippsGetLibVersion();
	printf("%s %s\n", lib->Name, lib->Version);
	//ippSetNumThreads(1);

	//generate two vectors
	float* vec1;
	float* vec2;
	vec1=new float[1000]();
	vec2=new float[1000]();

	//fill vectors with values
	for (int i=0;i<1000;i++){
		vec1[i]=(float)i;
		vec2[i]=(float)(1000-i);
	}


	//result variable
	float dotprod_result=0.f;


	//start timing
	int dotprod_time=0;
	LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
    LARGE_INTEGER Frequency;
    QueryPerformanceFrequency(&Frequency);
    QueryPerformanceCounter(&StartingTime);


	//run ippsDotProd
	for (int i=0; i<500000000; i++){
		//ippsSum_32f(vec1,1000, &dotprod_result,ippAlgHintFast);
		ippsDotProd_32f(vec1, vec1, 100, &dotprod_result);
	}


	//stop timing
	QueryPerformanceCounter(&EndingTime);
    ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
    ElapsedMicroseconds.QuadPart *= 1000000;
    ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
    dotprod_time=(int)(ElapsedMicroseconds.QuadPart/1000);

	printf("Total time [ms]:  %d\n", dotprod_time);



	delete[] vec1;
	delete[] vec2;

	return 0;
}

 

The result for IPP 7.0:

ippStsNoErr: No errors, it's OK.
ippse9-7.0.dll 7.0 build 205.105
Total time [ms]:  7558

 

The result for IPP 8.2:

ippStsNoErr: No errors.
ippSP AVX2 (l9) 8.2.1 (r44077)
Total time [ms]:  8141

 

 

 

 


Viewing all articles
Browse latest Browse all 1489

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>