Research Articles

Scalable Hierarchical Parallel Algorithm for the Solution of Super Large-Scale Sparse Linear Equations

[+] Author and Article Information
Bin Liu

e-mail: liubin@tsinghua.edu.cn
Department of Engineering Mechanics,
Tsinghua University,
Beijing, 100084, PRC

Yuan Dong

Department of Computer Science and Technology,
Tsinghua University,
Beijing, 100084, PRC
e-mail: dongyuan@tsinghua.edu.cn

1Corresponding author.

Manuscript received December 23, 2012; final manuscript received January 4, 2013; accepted manuscript posted January 23, 2013; published online February 5, 2013. Editor: Yonggang Huang.

J. Appl. Mech 80(2), 020901 (Feb 05, 2013) (8 pages) Paper No: JAM-12-1570; doi: 10.1115/1.4023481 History: Received December 23, 2012; Revised January 04, 2013; Accepted January 23, 2013

The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural iterative algorithm for the solution of super large-scale sparse linear equations in a distributed memory computer cluster. Through alternatively performing global equilibrium computation and local relaxation, the specific accuracy requirement can be met in a few iterations. Moreover, each set/slave-processor majorly communicates with its nearest neighbors, and the transferring data between sets/slave-processors and the master-processor is always far below the communication between neighboring sets/slave-processors. The corresponding algorithm for implicit finite element analysis has been implemented based on the MPI library, and a super large 2-dimension square system of triangle-lattice truss structure under randomly distributed loadings is simulated with over 1 × 109 degrees of freedom (DOF) on up to 2001 processors of the “Exploration 100” cluster in Tsinghua University. The numerical experiments demonstrate that this algorithm has excellent parallel efficiency and high scalability, and it may have broad applications in other implicit simulations.

Copyright © 2013 by ASME
Your Session has timed out. Please sign back in to continue.


Wing, O., and Huang, J. W., 1980, “A Computation Model of Parallel Solution of Linear-Equations,” IEEE Trans. Comput., 29(7), pp. 632–638. [CrossRef]
Arnold, C. P., Parr, M. I., and Dewe, M. B., 1983, “An Efficient Parallel Algorithm for the Solution of Large Sparse Linear Matrix Equations,” IEEE Trans. Comput., 32(3), pp. 265–273. [CrossRef]
Oleary, D. P., and White, R. E., 1985, “Multi-Splittings of Matrices and Parallel Solution of Linear-Systems,” SIAM J. Algebraic Discrete Methods, 6(4), pp. 630–640. [CrossRef]
Abur, A., 1988, “A Parallel Scheme for the Forward Backward Substitutions in Solving Sparse Linear-Equations,” IEEE Trans. Power Syst., 3(4), pp. 1471–1478. [CrossRef]
Heath, M. T., Ng, E., and Peyton, B. W., 1991, “Parallel Algorithms for Sparse Linear-Systems,” SIAM Rev., 33(3), pp. 420–460. [CrossRef]
Szyld, D. B., and Jones, M. T., 1992, “2-Stage and Multisplitting Methods for the Parallel Solution of Linear-Systems,” SIAM J. Matrix Anal. Appl., 13(2), pp. 671–679. [CrossRef]
Saad, Y., and Sosonkina, M., 1999, “Non-Standard Parallel Solution Strategies for Distributed Sparse Linear Systems,” Parallel Comput., 1557, pp. 13–27. [CrossRef]
Censor, Y., Gordon, D., and Gordon, R., 2001, “Component Averaging: An Efficient Iterative Parallel Algorithm for Large and Sparse Unstructured Problems,” Parallel Comput., 27(6), pp. 777–808. [CrossRef]
Filippone, S., and Colajanni, M., 2000, “PSBLAS: A Library for Parallel Linear Algebra Computation on Sparse Matrices,” ACM Trans. Math. Softw., 26(4), pp. 527–550. [CrossRef]
Henson, V. E., and Yang, U. M., 2002, “Boomeramg: A Parallel Algebraic Multigrid Solver and Preconditioner,” Appl. Numer. Math., 41(1), pp. 155–177. [CrossRef]
Schenk, O., and Gartner, K., 2004, “Solving Unsymmetric Sparse Systems of Linear Equations With Pardiso,” FGCS, Future Gener. Comput. Syst., 20(3), pp. 475–487. [CrossRef]
Guermouche, A., Amestoy, P. R., L'excellent, J. Y., and Pralet, S., 2006, “Hybrid Scheduling for the Parallel Solution of Linear Systems,” Parallel Comput., 32(2), pp. 136–156. [CrossRef]
Roman, J., Agullo, E., Giraud, L., and Guermouche, A., 2011, “Parallel Hierarchical Hybrid Linear Solvers for Emerging Computing Platforms,” C. R. Mec., 339(2–3), pp. 96–103. [CrossRef]
Collignon, T. P., and Van Gijzen, M. B., 2011, “Fast Iterative Solution of Large Sparse Linear Systems on Geographically Separated Clusters,” Int. J. High Perform. Comput. Appl., 25(4), pp. 440–450. [CrossRef]
Buttari, A., Langou, J., Kurzak, J., and Dongarra, J., 2009, “A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,” Parallel Comput., 35(1), pp. 38–53. [CrossRef]
Manguoglu, M., Sameh, A. H., and Schenk, O., 2009, “pspike: A Parallel Hybrid Sparse Linear System Solver,” Euro-Par 2009: Parallel Processing, Proceedings, Springer-Verlag Berlin.
Li, X. Y. S., and Demmel, J. W., 2003, “Superlu_Dist: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems,” ACM Trans. Math. Softw., 29(2), pp. 110–140. [CrossRef]
Amestoy, P. R., Duff, I. S., and L'excellent, J. Y., 2000, “Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers,” Comput. Methods Appl. Mech. Eng., 184(2–4), pp. 501–520. [CrossRef]
Balay, S., Brown, J., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., Mcinnes, L. C., Smith, B. F., and Zhang, H., 2011, “Petsc Web Page,” http://www.mcs.anl.gov/petsc/
Jones, J. E., 1999, “A Parallel Multigrid Tutorial,” Proceedings of the Ninth Copper Mountain Conference on Multigrid Methods, Copper Mountain, CO, April 11–16, Paper No. UCRL-MI-133748.
Law, K. H., 1986, “A Parallel Finite-Element Solution Method,” Comput. Struct., 23(6), pp. 845–858. [CrossRef]
Farhat, C., Pierson, K., and Lesoinne, M., 2000, “The Second Generation FETI Methods and Their Application to the Parallel Solution of Large-Scale Linear and Geometrically Nonlinear Structural Analysis Problems,” Comput. Methods Appl. Mech. Eng., 184(2–4), pp. 333–374. [CrossRef]
Farhat, C., and Roux, F. X., 1992, “An Unconventional Domain Decomposition Method for an Efficient Parallel Solution of Large-Scale Finite-Element Systems,” SIAM J. Sci. Stat. Comput., 13(1), pp. 379–396. [CrossRef]
Oden, J. T., Patra, A., and Feng, Y. S., 1997, “Parallel Domain Decomposition Solver for Adaptive Hp Finite Element Methods,” SIAM J. Numer. Anal., 34(6), pp. 2090–2118. [CrossRef]
Tezduyar, T. E., and Sameh, A., 2006, “Parallel Finite Element Computations in Fluid Mechanics,” Comput. Methods Appl. Mech. Eng., 195(13–16), pp. 1872–1884. [CrossRef]
Paszynski, M., and Demkowicz, L., 2006, “Parallel, Fully Automatic HP-Adaptive 3d Finite Element Package,” Eng. Comput., 22(3–4), pp. 255–276. [CrossRef]
Wang, W. Q., Kosakowski, G., and Kolditz, O., 2009, “A Parallel Finite Element Scheme for Thermo-Hydro-Mechanical (THM) Coupled Problems in Porous Media,” Comput. Geosci., 35(8), pp. 1631–1641. [CrossRef]
Kim, J. H., Lee, C. S., and Kim, S. J., 2004, “Development of a High-Performance Domain-Wise Parallel Direct Solver for Large-Scale Structural Analysis,” Proceedings of the Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, Tokyo, July 20–22, pp. 267–27 4. [CrossRef]
Fish, J., and Belsky, V., 1997, “Generalized Aggregation Multilevel Solver,” Int. J. Numer. Methods Eng., 40(23), pp. 4341–4361. [CrossRef]
Stuben, K., and Trottenberg, U., 1982, “Multigrid Methods—Fundamental Algorithms, Model Problem Analysis and Applications,” Lect. Notes Math., 960, pp. 1–176. [CrossRef]
Parsons, I. D., and Hall, J. F., 1990, “The Multigrid Method in Solid Mechanics: Part I—Algorithm Description and Behavior,” Int. J. Numer. Methods Eng., 29(4), pp. 719–737. [CrossRef]
Papadrakakis, M., Stavroulakis, G., and Karatarakis, A., 2011, “A New Era in Scientific Computing: Domain Decomposition Methods in Hybrid CPU-GPU Architectures,” Comput. Methods Appl. Mech. Eng., 200(13–16), pp. 1490–1508. [CrossRef]
Adams, M. F., Bayraktar, H. H., Keaveny, T. M., and Papadopoulos, P., 2004, “Ultrascalable Implicit Finite Element Analyses in Solid Mechanics With Over a Half a Billion Degrees of Freedom,” Proceedings of the ACM/IEEESC2004 Conference, Pittsburgh, PA, November 6–12. [CrossRef]
Cyr, E. C., Shadid, J. N., and Tuminaro, R. S., 2012, “Stabilization and Scalable Block Preconditioning for the Navier–Stokes Equations,” J. Comput. Phys., 231(2), pp. 345–363. [CrossRef]
Schenk, O., and Gartner, K., 2006, “On Fast Factorization Pivoting Methods for Sparse Symmetric Indefinite Systems,” Electron. Trans. Numer. Anal., 23, pp. 158–179.
Gropp, W., Lusk, E., Doss, N., and Skjellum, A., 1996, “A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard,” Parallel Comput., 22(6), pp. 789–828. [CrossRef]
Gropp, W., Lusk, E. L., and Skjellum, A., 1996, Using MPI—Portable Parallel Programming With the Message-Passing Interface, MIT Press, Cambridge, MA.
Chen, P., and Sun, S. L., 2005, “New High Performance Sparse Static Solver in Finite Element Analysis With Loop-Unrolling,” Acta Mech. Solida Sinica, 18(3), pp. 248–255. [CrossRef]


Grahic Jump Location
Fig. 1

A meshed 2-dimension solid structure is divided into sets/subdomains assigned to slave processors, and the dividing line cuts through the elements

Grahic Jump Location
Fig. 3

Flow chart showing the hierarchical parallel algorithm

Grahic Jump Location
Fig. 5

Relative residual of 1 × 109 DOFs test case decreases as a quasi-exponentially function of iterations number. The test case is divided into 2000 sets.

Grahic Jump Location
Fig. 2

(a) Schematic diagram of inner nodes and the outer nodes of SetI, (b) local displacement-controlled relaxation, and (c) local force-controlled relaxation

Grahic Jump Location
Fig. 4

Two-dimensional square truss system with random force on each node

Grahic Jump Location
Fig. 6

The number of iterations required (NIR) to meet specific accuracy requirement (5 × 10–6) versus the number of DOFs in each set: a 2-dimension square system with 64 sets are tested with four different random loads

Grahic Jump Location
Fig. 7

The 2-dimension square system is tested with 2-2000 sets, and each set has half a million DOFs. The number of iterations required (NIR) to meet specific accuracy requirement (5 × 10–6) and elapsed time per iteration as functions of sets number, (a) and (b), respectively. NIR shows a rapid convergence and the elapsed time per iteration presents a very slow growth rate with increasing sets.

Grahic Jump Location
Fig. 8

The parallel improvement of our algorithm is tested with 8 × 106 DOFs and 32 × 106s DOFs test cases

Grahic Jump Location
Fig. 9

A one-dimension spring system used to represent the generally symmetric linear system



Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In