ippsDotProd_32f Performance on Haswell CPU

Hi,

at the moment I'm using ippsDotProd_32f in IPP 7.0 quite extensively in one of my projects. I now tested IPP 8.2 on a Haswell CPU (Xeon e5-2650 v3 in a HP z640 workstation) with this project because I expected it to be significantly faster (see below). Actually, the code was about 10% slower using IPP 8.2 which I found quite disturbing.

I created a test program (see below) to verify this and found that ippsDotProd_32f (as well as some other functions) seem to be slower in IPP 8.2 as compared to IPP 7.0 if one uses a lot but rather small arrays of about 100 entries. For larger arrays the speed seems to be equal.

Unfortunately this is exactly what I have to do in my project. Now two questions arise:

1. What can I do to make my code work at least with the speed of IPP 7.0 event if I use IPP 8.2

2. Why is ippsDotProd_32f on a Haswell CPU not actually significantly faster? My assumptions are based on this article (section 3.1):

https://software.intel.com/en-us/articles/intel-xeon-processor-e5-2600-v...

Where it is stated that Haswell CPUs have two FMA units and therefore should be much faster calculating dot products. Furthermore it is stated in https://software.intel.com/en-us/articles/haswell-support-in-intel-ipp that ippsDotProd_32f should actually profit from this fact, at least in IPP versions larger 7.0

I'm very thankful for assistance here! Apparently I understood something wrong? Here is my test code, it was compiled with Visual Studio 2012 on a non-Haswell-computer but the tests were run on the mentioned Haswell-system:

#include "stdafx.h"
#include "windows.h"
#include "ipp.h"
#include "ipps.h"
#include "ippcore.h"



int main(int argc, _TCHAR* argv[])
{

	IppStatus IPP_Init_status;
	IPP_Init_status=ippInit();
	printf("%s\n", ippGetStatusString(IPP_Init_status) );
	const IppLibraryVersion *lib;
	lib = ippsGetLibVersion();
	printf("%s %s\n", lib->Name, lib->Version);
	//ippSetNumThreads(1);

	//generate two vectors
	float* vec1;
	float* vec2;
	vec1=new float[1000]();
	vec2=new float[1000]();

	//fill vectors with values
	for (int i=0;i<1000;i++){
		vec1[i]=(float)i;
		vec2[i]=(float)(1000-i);
	}


	//result variable
	float dotprod_result=0.f;


	//start timing
	int dotprod_time=0;
	LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
    LARGE_INTEGER Frequency;
    QueryPerformanceFrequency(&Frequency);
    QueryPerformanceCounter(&StartingTime);


	//run ippsDotProd
	for (int i=0; i<500000000; i++){
		//ippsSum_32f(vec1,1000, &dotprod_result,ippAlgHintFast);
		ippsDotProd_32f(vec1, vec1, 100, &dotprod_result);
	}


	//stop timing
	QueryPerformanceCounter(&EndingTime);
    ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
    ElapsedMicroseconds.QuadPart *= 1000000;
    ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
    dotprod_time=(int)(ElapsedMicroseconds.QuadPart/1000);

	printf("Total time [ms]:  %d\n", dotprod_time);



	delete[] vec1;
	delete[] vec2;

	return 0;
}

The result for IPP 7.0:

ippStsNoErr: No errors, it's OK.
ippse9-7.0.dll 7.0 build 205.105
Total time [ms]: 7558

The result for IPP 8.2:

ippStsNoErr: No errors.
ippSP AVX2 (l9) 8.2.1 (r44077)
Total time [ms]: 8141

ippsDotProd_32f Performance on Haswell CPU

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112