[en] In a previous work, embedded ensemble propagation was proposed to improve the efficiency of sampling-based uncertainty quantification methods of computational models on emerging computational architectures. It consists of simultaneously evaluating the model for a subset of samples together, instead of evaluating them individually. A first approach introduced to solve parametric linear systems with ensemble propagation is ensemble reduction. In Krylov methods for example, this reduction consists in coupling the samples together using an inner product that sums the sample contributions. Ensemble reduction has the advantages of being able to use optimized implementations of BLAS functions and having a stopping criterion which involves only one scalar. However, the reduction potentially decreases the rate of convergence due to the gathering of the spectra of the samples. In this paper, we investigate a second approach: ensemble propagation without ensemble reduction in the case of GMRES. This second approach solves each sample simultaneously but independently to improve the convergence compared to ensemble reduction. This raises two new issues which are solved in this paper: the fact that optimized implementations of BLAS functions cannot be used anymore and that ensemble divergence, whereby individual samples within an ensemble must follow different code execution paths, can occur. We tackle those issues by implementing a high-performing ensemble GEMV and by using masks. The proposed ensemble GEMV leads to a similar cost per GMRES iteration for both approaches, i.e. with and without reduction. For illustration, we study the performances of the new linear solver in the context of a mesh tying problem. This example demonstrates improved ensemble propagation speed-up without reduction.
Disciplines :
Mechanical engineering
Author, co-author :
Liegeois, Kim ; Université de Liège - ULiège > Département d'aérospatiale et mécanique > Computational and stochastic modeling
Boman, Romain ; Université de Liège - ULiège > Département d'aérospatiale et mécanique > Département d'aérospatiale et mécanique
Phipps, Eric T.; Sandia National Laboratories > Scalable Algorithms Department
Wiesner, Tobias A.; Sandia National Laboratories > Scalable Algorithms Department
Arnst, Maarten ; Université de Liège - ULiège > Département d'aérospatiale et mécanique > Computational and stochastic modeling
Language :
English
Title :
GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models
Publication date :
01 September 2020
Journal title :
Computer Methods in Applied Mechanics and Engineering
ISSN :
0045-7825
eISSN :
1879-2138
Publisher :
Elsevier, Amsterdam, Netherlands
Volume :
369
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
Ghanem, R., Higdon, D., Owhadi, H., Handbook of Uncertainty Quantification. 2017, Springer.
Robert, C., Casella, G., Monte Carlo Statistical Methods. 2013, Springer.
Le Maître, O., Knio, O.M., Spectral Methods for Uncertainty Quantification: With Applications to Computational Fluid Dynamics. 2010, Springer Science & Business Media.
Babuška, I., Nobile, F., Tempone, R., A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal. 45:3 (2007), 1005–1034.
Ganapathysubramanian, B., Zabaras, N., Sparse grid collocation schemes for stochastic natural convection problems. J. Comput. Phys. 225:1 (2007), 652–685.
Ghanem, R.G., Spanos, P.D., Polynomial chaos in stochastic finite elements. J. Appl. Mech., 57, 1990, 197,202.
A. Belme, M. Martinelli, L. Hascoët, V. Pascual, A. Dervieux, AD-based perturbation methods for uncertainties and errors, in: 44th AAAF Colloque, 2009.
Griewank, A., On automatic differentiation. Math. Program.: Recent Dev. Appl. 6:6 (1989), 83–107.
T. Dahlgren, D. Domyancic, S. Brandon, T. Gamblin, J. Gyllenhaal, R. Nimmakayala, R. Klein, Scaling uncertainty quantification studies to millions of jobs, in: Proceedings of the 27th ACM/IEEE International Conference for High Performance Computing and Communications Conference (SC), 2015.
J. Gyllenhaal, T. Gamblin, A. Bertsch, R. Musselman, Enabling high job throughput for uncertainty quantification on BG/Q, ser. IBM HPC Systems Scientific Computing User Group (SCICOMP), 2014.
Foster, I., Ainsworth, M., Allen, B., Bessac, J., Cappello, F., Choi, J.Y., Constantinescu, E., Davis, P.E., Di, S., Di, W., et al. Computing just what you need: Online data analysis and reduction at extreme scales. European Conference on Parallel Processing, 2017, Springer, 3–19.
Ozik, N.T., Wozniak, J.M., Spagnuolo, C., From desktop to large-scale model exploration with Swift/T. 2016 Winter Simulation Conference (WSC), 2016, IEEE, 206–220.
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I., Swift: a language for distributed parallel scripting. Parallel Comput. 37:9 (2011), 633–652.
Adams, B.M., Eldred, M.S., Geraci, G., Hooper, R.W., Jakeman, J.D., Maupin, K.A., Monschke, J.A., Rushdi, A.A., Stephens, J.A., Swiler, L.P., Wildey, T.M., Dakota, a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis: version 6.11 user's manual., 2019, Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).
Peterson, J.L., Anirudh, R., Athey, K., Bay, B., Bremer, P.-T., Castillo, V., Di Natale, F., Fox, D., Gaffney, J.A., Hysom, D., et al. Merlin: enabling machine learning-ready HPC ensembles. 2019 arXiv preprint arXiv:1912.02892.
Hadjidoukas, P.E., Angelikopoulos, P., Kulakova, L., Papadimitriou, C., Koumoutsakos, P., Exploiting task-based parallelism in bayesian uncertainty quantification. European Conference on Parallel Processing, 2015, Springer, 532–544.
Hadjidoukas, P.E., Lappas, E., Dimakopoulos, V.V., A runtime library for platform-independent task parallelism. 2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, 2012, IEEE, 229–236.
Künzner, F., Neckel, T., Bungartz, H.-J., Prediction and reduction of runtime in non-intrusive forward UQ simulations. SN Appl. Sci., 1(9), 2019, 1038.
Phipps, E.T., D'Elia, M., Edwards, H., Hoemmen, M., Hu, J., Rajamanickam, S., Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures. SIAM J. Sci. Comput. 39:2 (2017), C162–C193, 10.1137/15m1044679.
A. Williams, A. Prokopenko, B. Spotz, B. Perschbacher, C. Edwards, C. Siefert, C. Trott, E.T. Phipps, E.C. Cyr, E. Boman, G. Sjaardema, G. von Winckel, H. Thornquist, J. Willenbring, J. Hu, K. Devine, M. Hoemmen, M. Heroux, R.S. Tuminaro, R. Pawlowski, R.A. Bartlett, S. Kennon, T. Kolda, T.A. Wiesner, Trilinos [Computer Software], https://github.com/trilinos/Trilinos, USDOE 13839.
M.A. Heroux, C. Baker, P. Sexton, Tpetra next-generation templated petra v1.0. [Computer Software] https://github.com/trilinos/Trilinos, USDOE 1192, https://doi.org/10.11578/dc.20171025.1168, 2008.
Baker, C.G., Heroux, M.A., Tpetra, and the use of generic programming in scientific computing. Sci. Program. 20:2 (2012), 115–128, 10.3233/SPR-2012-0349.
Edwards, H.C., Trott, C.R., Sunderland, D., Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74:12 (2014), 3202–3216, 10.1016/j.jpdc.2014.07.003.
Bavier, E., Hoemmen, M., Rajamanickam, S., Thornquist, H., Amesos2 and Belos: direct and iterative solvers for large sparse linear systems. Sci. Program. 20:3 (2012), 241–255, 10.1155/2012/243875.
Prokopenko, A., Hu, J.J., Wiesner, T.A., Siefert, C.M., Tuminaro, R.S., MueLu User's guide 1.0., 2014, Sandia National Laboratories.
Coutinho, B., Sampaio, D., Pereira, F.M.Q., Meira Jr, W., Divergence analysis and optimizations. 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011, IEEE, 320–329, 10.1109/pact.2011.63.
D'Elia, M., Phipps, E.T., Rushdi, A., Ebeida, M., Surrogate-based ensemble grouping strategies for embedded sampling-based uncertainty quantification., 2017.
D'Elia, M., Phipps, E.T., Edwards, H., Hu, J., Rajamanickam, S., Ensemble grouping strategies for embedded stochastic collocation methods applied to anisotropic diffusion problems. SIAM/ASA J. Uncertain. Quant. 6:1 (2018), 87–117, 10.1137/16m1066324.
Saad, Y., Schultz, M.H., GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7:3 (1986), 856–869, 10.1137/0907058.
Saad, Y., Iterative Methods for Sparse Linear Systems. 2003, SIAM 9780898715347.
Wohlmuth, B.I., Iterative solvers based on domain decomposition. Discretization Methods and Iterative Solvers Based on Domain Decomposition, 2001, Springer, 85–176, 10.1007/978-3-642-56767-4_2.
Daniel, J.W., Gragg, W.B., Kaufman, L., Stewart, G.W., Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization. Math. Comp. 30:136 (1976), 772–795.
J. Demmel, L. Grigori, M. Hoemmen, J. Langou, Communication-avoiding parallel and sequential QR factorizations, CoRR abs/0806.2159, 2008.
Kretz, M., Lindenstruth, V., Vc: a C++ library for explicit vectorization. Softw. - Pract. Exp. 42:11 (2012), 1409–1430, 10.1002/spe.1149.
Kretz, M., Extending C++ for explicit data-parallel programming via SIMD vector types. (PhD thesis), 2015, Johann Wolfgang Goethe-Universität.
Pawlowski, R.P., Phipps, E.T., Salinger, A.G., Automating embedded analysis capabilities and managing software complexity in multiphysics simulation, part I: template-based generic programming. Sci. Program. 20:2 (2012), 197–219, 10.1155/2012/202071.
Van Zee, F.G., van de Geijn, R.A., BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw., 41(3), 2015, 14, 10.1145/2764454.
Goto, K., van de Geijn, R.A., Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34(3), 2008, 12, 10.1145/1356052.1356053.
Smith, T.M., van de Geijn, R., Smelyanskiy, M., Hammond, J.R., Van Zee, F.G., Anatomy of high-performance many-threaded matrix multiplication. Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, 2014, IEEE, 1049–1059, 10.1109/ipdps.2014.110.
Kim, K., Costa, T.B., Deveci, M., Bradley, A.M., Hammond, S.D., Guney, M.E., Knepper, S., Story, S., Rajamanickam, S., Designing vector-friendly compact BLAS and LAPACK kernels. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, Association for Computing Machinery (ACM), 55, 10.1145/3126908.3126941.
J.D. McCalpin, Memory bandwidth and machine balance in current high performance computers, in: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, 1995, pp. 19–25.
R.A. Lorie, H.R. Strong Jr, Method for conditional branch execution in SIMD vector processors, US Patent 4,435,758, 1984.
Jeffers, J., Reinders, J., Sodani, A., Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. 2016, Morgan Kaufmann 9780128091944.
P. Estérie, J. Falcou, M. Gaunard, J.-T. Lapresté, Boost.SIMD: generic programming for portable SIMDization, in: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, 2014, pp. 1–8.
Ewart, T., Delalondre, F., Schürmann, F., Cyme: a library maximizing SIMD computation on user-defined containers. International Supercomputing Conference, 2014, Springer, 440–449.
P. Karpiński, J. McDonald, A high-performance portable abstract interface for explicit SIMD vectorization, in: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017, pp. 21–28.
Cassagne, A., Le Gal, B., Leroux, C., Aumage, O., Barthou, D., An efficient, portable and generic library for successive cancellation decoding of polar codes. Languages and Compilers for Parallel Computing, 2015, Springer, 303–317.
Poirion, F., Soize, C., Numerical methods and mathematical aspects for simulation of homogeneous and non homogeneous Gaussian vector fields. Probabilistic Methods in Applied Physics, 1995, Springer, 17–53, 10.1007/3-540-60214-3_50.
Wiesner, T.A., Flexible aggregation-based algebraic multigrid methods for contact and flow problems. (Ph.D. thesis), 2015, Technische Universität München.
Benzi, M., Golub, G.H., Liesen, J., Numerical solution of saddle point problems. Acta Numer. 14 (2005), 1–137, 10.1017/s0962492904000212.
Li, C., Vuik, C., Eigenvalue analysis of the SIMPLE preconditioning for incompressible flow. Numer. Linear Algebra Appl. 11:5–6 (2004), 511–523, 10.1002/nla.358.
Elman, H., Howle, V.E., Shadid, J., Shuttleworth, R., Tuminaro, R., A taxonomy and comparison of parallel block multi-level preconditioners for the incompressible Navier–Stokes equations. J. Comput. Phys. 227:3 (2008), 1790–1808, 10.1016/j.jcp.2007.09.026.
Booth, J.D., Ellingwood, N.D., Thornquist, H.K., Rajamanickam, S., Basker: parallel sparse LU factorization utilizing hierarchical parallelism and data layouts. Parallel Comput. 68 (2017), 17–31, 10.1016/j.parco.2017.06.003.
Similar publications
Sorry the service is unavailable at the moment. Please try again later.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.