Hi,
I'm working on a complicated project, that requires a parallel calculations in order to achieve good time performance.
Our company bought for this purpose intel xeon platinum 8168 processor (96 CORES - name 96). Also, we have a computer with intel core i9 7960x processor(16 CORES - name 16).
I'm using "omp pragma for" directive, as all calculations happen in FOR loops. And at this point i got strange results.
I'm running my code on 16 PC, with number of iterations in FOR loop less than 16. That's mean, that number of threads & number of CORES that are used less than 16. At this point, I got almost same time results(I mean, 5 iterations, 10 iterations, and 15 iterations complete with almost same time). And this is correct, since NOT ALL CPU power was used.
At this point, I try and run SAME code on 96. And I see strange time performance results. If I run 40 iterations(see 40 threads), the time is almost twice against 1 iteration. And if I run 90 iterations(still, NOT full power!!), time increase almost 4 times.
My questions is, does it have some issue to this specific processor (intel xeon platinum 8168 processor) working with IPP libraries ?
What could be the possible reason for such a time complexity increase? I am aware about dynamic memory allocations , we have some, and time needed to create large number of threads, but still that's seem not the real reason.
Thanks