Email:
liuyangzhuan@lbl.gov

Address:
MS 50A-3111, 1 Cyclotron Road
Berkeley, CA 94720

Phone:
734-546-7392

Google Scholar:
[Here]

Direct Solvers

  • Randomized butterfly factorization-based direct solution of the frequency-domain integral equations (IE) and differential equations (DE). Direct IE solutions for high-frequency Helmholtz problems is attractive when the system matrix becomes ill-conditioned (e.g., due to low-frequency, dense-mesh breakdown or near resonance phenomena) or the system involves many right-hand sides, in which cases iterative methods fail to provide efficient solutions. However, development of a low-complexity direct solver for high frequency problems is a challenging task as past direct solvers were low-rank factorization-based and lead to the eventual O(N^3) computational complexity for such problems. This work develops butterfly factorization-based direct solvers that construct butterfly-compressed H-matrices, HODLR, and HSS matrices and compute the matrix inverse via randomized butterfly schemes. These solvers exhibit quasi-optimal CPU and memory complexities when applied to large complex-shaped objects. The above-mentioned solvers have also been combined with a multi-frontal sparse matrix solver STRUMPACK to solve high-frequency problems from DE formulations.
  • Distributed-memory hierarchical matrix solvers for rank-structured linear systems. In addition to Helmholtz equations, many scientific and engineering applications require solving large linear systems that exhibit hierarchical rank structures. These systems can be solved in quasi-linear time with hierarchical matrix techniques. For certain applications, existing distributed-memory hierarchical matrix packages such as MUMPS and STRUMPACK exhibit scaling bottlenecks w.r.t systems sizes and/or processor counts. I developed fast algorithms such as hierarchical blocked adaptive cross approximation (H-BACA) and randomized HODLR compression to rapidly construct, factor and solve these linear systems. These algorithms have been applied to accelerate wave-equations, kernel ridge regression and quantum chemistry applications.
  • Highly-efficient parallel sparse triangular solver. Triangular solution phase is often performed following factorization phase in the sparse linear solvers and has become increasingly computationally expensive for direct solvers with many RHSs or preconditioned iterative solvers. However, the low arithmetic intensity and sequential nature of the triangular solve algorithm pose performance challenges for its large-scale parallelization. I developed an asynchronous binary-tree-based communication scheme with 2D block cyclic process layout to reduces message latency, improves communication load balance and significantly accelerates asynchronous execution of the triangular solve on distributed-memory. The algorithm is further enhanced leveraging one-sided communication models. In additioin, we are working on GPU and multi-GPU impelementations of these algorithms. These new developments have been integrated into the sparse solver SuperLU_DIST.

Algorithm developments for direct solvers: (left) butterfly and low-rank direct solvers, (middle) H-BACA algorithm, (right) sparse tirangular solver (Click on the figures to zoom in).

Thumbnail Placeholder
Thumbnail Placeholder
Thumbnail Placeholder

Time-Domain Solvers for Electromagnetic and Multi-Physics Analysis

  • Plane-wave-time-domain algorithm (PWTD)-accelerated time-domain integral equation (TDIE) solutions. The PWTD algorithm is the time domain counterpart of the frequency-domain fast multipole method (FMM) that is well-suited to accelerate the TDIE solutions for large transient electromagnetic problems. However, PWTD-TDIE solvers were not able to solve ultra-big EM problems as can be solved by the frequency-domain solvers due to stability and computation efficiency issues. This works develops provably scalable parallelization of the PWTD algorithm on modern computing architectures and explores the use of wavelet compression to reduce the computational costs. The resultant PWTD algorithms, when integrated into surface/volume integral equation solvers, enables electromagnetic characterization of real-life objects involving ten million/tens of million spatial unknowns, a significant improvement over past TDIE solvers. See the illustrations below.

Examples of PWTD-TDIE solutions: a PEC sphere (left) and helicopter (middle) using surface IE, and red blood cells (right) using volume IE (Click on the figures to zoom in).

Thumbnail Placeholder
Thumbnail Placeholder
Thumbnail Placeholder

Uncertainty Quantification and Machine Learning

  • Gaussian process (GP)-based autotuning framework for exascale and machine learning applications. The performance (e.g., runtime, memory, communication, accuracy) of high-performance computing (HPC) applications can vary significantly subject to a selected number of tuning parameters of real, integer and categorical types. However, even a single evaluation can require long execution time using many compute nodes. The GP frameworks are suitable solutions for autotuning HPC applications as they require a relatively small number of function evaluations to build surrogate models of the application function. Our work extends and improves traditional GP frameworks by considering multi-task and transfer learning algorithms using the localized coregionalization model (LCM). The resulting GPTune tool and its variants outperform existing single-task tuners for many exascale applications such as SuperLU_DIST, Hypre, ScaLAPACK, NIMROD, M3DC1, STRUMPACK, MFEM, etc. A few advanced features to note: GPTune can leverage shared and distributed-memory parallelism to accelerate the surrogate modeling, ultize historical and cloud databases to incorporate more data, combine with multi-arm bandit strategies (GPTuneBand) for multi-fidelity tuning, leverage clustered GP algorithms (cGP) for better modeling of non-smooth functions, support multi-objective tuning and incorporation of coarse performance models.
  • Statistical characterization of electromagnetic wave propagation in mine tunnels using multi-element probabilistic collocation method (ME-PC). The uncertaincy quantification of EM wave propagation in realistic mine environments is very challenging due to inefficiency and inaccuracy of existing simulators and surrogate modeling tools. We leverage the ME-PC algorithm for the surrogate and a full-wave IE simulator (FMM-FFT-SIE) for computing the observables. The pertinent statistics can be then extracted via Monte-Carlo using the combined ME-PC and FMM-FFT-SIE approach.

Autotuning Resuls using GPTune (left) and its variants GPTuneBand (middle) and cGP (right). (Click on the figures to zoom in).

Thumbnail Placeholder
Thumbnail Placeholder
Thumbnail Placeholder