In Figure 1 we show the speedup curves for the two test systems, which is defined as the time to run on 1 node divided by the time to run an N nodes. In other words for a perfectly scalable problem the speedup on N nodes would be N.
The efficiency (i.e. speedup(N)/N)) strongly depends on the systems size, the larger the system the better the speedup for a given number of processors. Another way to analyse the performance of a scalable code is to plot the execution time versus the number of nodes, or alternatively to plot the execution time times the number of nodes versus the number of nodes. These quantities are shown in Figure 2 for our two test systems:
| Number of atoms | |||||
| NP | 125 | 150 | 216 | 343 | 512 |
| 1 | 47.89 | 79.21 | 192.04 | 708.60 | 2599.70 |
| 2 | 31.89 | 51.00 | 117.92 | 385.09 | 1173.13 |
| 4 | 19.36 | 29.77 | 65.85 | 208.98 | 603.27 |
| 8 | 13.80 | 20.17 | 41.70 | 125.39 | 345.69 |
| 16 | 10.21 | 14.46 | 27.27 | 74.45 | 198.40 |
| 32 | 9.20 | 12.36 | 22.58 | 55.44 | 138.63 |
| 64 | 131.99 | ||||
Table 1: Time per MD step using diagonalization as a function of the number of atoms and the number of processors (NP).
| Number of atoms | |||||
| NP | 125 | 150 | 216 | 343 | 512 |
| 1 | 470.55 | 679.49 | 1455.81 | 3741.34 | 8261.92 |
| 2 | 239.15 | 352.00 | 751.29 | 1944.99 | 4259.43 |
| 4 | 123.81 | 176.23 | 369.25 | 946.92 | 2131.56 |
| 8 | 64.51 | 90.63 | 188.34 | 479.57 | 1074.53 |
| 16 | 34.52 | 50.53 | 102.34 | 253.22 | 549.01 |
| 32 | 19.64 | 28.19 | 55.32 | 133.21 | 284.44 |
| 64 | |||||
The breakeven point depends strongly on the number of processors due
to the much better scaling of the KPM with number of processors.
However the breakeven point remains very high for this particular problem.
This situation can be improved if we use a shorter range for the hamiltonian.
Indeed for the KPM the time per MD step is linear in the number of
neighbors, i.e. roughly cubic in the range.
Florian Kirchhoff
Tue Jun 9 16:34:36 EDT 1998