Hi,
I am working with complicated algorithm, that requires a lot of computations. For this purpose, Intel Xeon Platinum 8168 CPU was puchased - with 96 Cores(name 96). Besides, we already have Intel Core i9 -7960X CPU with 16 Cores(name 16).
I'm running "omp pragma parallel for" directive on FOR loop in order to get parallel calculations. First, I tried this approach on 16 PC, with 5, 10 and 15 iterations on that FOR loop, and got almost same results for all three cases (and that is correct, since not all CPU power was used).
Next step, I ran same code on 96 PC. I also have tried different number of iterations ,and I see an constantly increasing time in total execution.
On 40 iterations, total time increased almost twice, and on 90 iterations it increased in 3.5 times(still, NOT full power of CPU as it have 96 cores!!).
I'm aware of threading pool, and time needed to create such amount of threads but still, that seems not working well at all. Does omp have specific problems with Intel Xeon Platinum processors, that I not aware of it? Maybe something about it's architecture, that not complied with omp.
* It is not a problem of cooling hardware, since tests are running about 3 minutes.
* It is not a problem of allocations or memory copy, since there is exactly same amount of memory allocated and memory copied.
Could you think of a possible problems, because I have run out of ideas.
Thanks