Next: References Up: TBMD user guide Previous: Appendix C

Timings for TBMD on the SP2


The test systems we used for these timings are :

In Figure 1 we show the speedup curves for the two test systems, which is defined as the time to run on 1 node divided by the time to run an N nodes. In other words for a perfectly scalable problem the speedup on N nodes would be N.


Figure 1: Speedup as a function of the number of processors.


The efficiency (i.e. speedup(N)/N)) strongly depends on the systems size, the larger the system the better the speedup for a given number of processors. Another way to analyse the performance of a scalable code is to plot the execution time versus the number of nodes, or alternatively to plot the execution time times the number of nodes versus the number of nodes. These quantities are shown in Figure 2 for our two test systems:


Figure 2:
It should be noted that the limiting factor here is the diagonalization which is done with routine from the standard ScaLAPACK library. The scaling should improve for larger systems.

Comparison of the O(N3) and the O(N2) methods.

We now address the performance of the KPM method. In particular we compare the timings per MD step for the KPM to those using diagonalization (using ScaLAPACK). The system we used for these tests were done using a NRL orthogonal tight-binding model for Pd with a range of 10.5 au. The runs were performed on the IBM SP2 at the ASC MSRC.

Number of atoms
NP 125 150 216 343 512
1 47.89 79.21 192.04 708.60 2599.70
2 31.89 51.00 117.92 385.09 1173.13
4 19.36 29.77 65.85 208.98 603.27
8 13.80 20.17 41.70 125.39 345.69
16 10.21 14.46 27.27 74.45 198.40
32 9.20 12.36 22.58 55.44 138.63
64 131.99

Table 1: Time per MD step using diagonalization as a function of the number of atoms and the number of processors (NP).


Number of atoms
NP 125 150 216 343 512
1 470.55 679.49 1455.81 3741.34 8261.92
2 239.15 352.00 751.29 1944.99 4259.43
4 123.81 176.23 369.25 946.92 2131.56
8 64.51 90.63 188.34 479.57 1074.53
16 34.52 50.53 102.34 253.22 549.01
32 19.64 28.19 55.32 133.21 284.44
64

Table 2: Time per MD step (in seconds) using the KPM with a 100 moments, as a function of the number of atoms and the number of processors (NP).


The breakeven point depends strongly on the number of processors due to the much better scaling of the KPM with number of processors. However the breakeven point remains very high for this particular problem. This situation can be improved if we use a shorter range for the hamiltonian. Indeed for the KPM the time per MD step is linear in the number of neighbors, i.e. roughly cubic in the range.





Florian Kirchhoff
Tue Jun 9 16:34:36 EDT 1998