HPC Reading List
Interconnect
- Scott, Steve, Dennis Abts, John Kim, and William J. Dally. “The blackwidow high-radix clos network.” ACM SIGARCH Computer Architecture News 34, no. 2 (2006): 16-28.
- Leiserson, Charles E., Zahi S. Abuhamdeh, David C. Douglas, Carl R. Feynman, Mahesh N. Ganmukhi, Jeffrey V. Hill, Daniel Hillis et al. “The network architecture of the Connection Machine CM-5.” In Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, pp. 272-285. 1992.
- Kim, John, Wiliam J. Dally, Steve Scott, and Dennis Abts. “Technology-driven, highly-scalable dragonfly topology.” ACM SIGARCH Computer Architecture News 36, no. 3 (2008): 77-88.
- Adiga, Narasimha R., Matthias A. Blumrich, Dong Chen, Paul Coteus, Alan Gara, Mark E. Giampapa, Philip Heidelberger et al. “Blue Gene/L torus interconnection network.” IBM Journal of Research and Development 49, no. 2.3 (2005): 265-276.
Programming Model
- Walker, D.W. and Dongarra, J.J., 1996. “MPI: a standard message passing interface.” Supercomputer, 12, pp.56-68.
- Zheng, Yili, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, and Katherine Yelick. “UPC++: a PGAS extension for C++.” In 2014 IEEE 28th international parallel and distributed processing symposium, pp. 1105-1114. IEEE, 2014.
- Jiang, Weihang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, William Gropp, and Rajeev Thakur. “High performance MPI-2 one-sided communication over InfiniBand.” In IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004., pp. 531-538. IEEE, 2004.
- Kale, Laxmikant V., and Sanjeev Krishnan. “Charm++ a portable concurrent object oriented system based on c++.” In Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, pp. 91-108. 1993.
Collective Communication
- Thakur, Rajeev, Rolf Rabenseifner, and William Gropp. “Optimization of collective communication operations in MPICH.” The International Journal of High Performance Computing Applications 19, no. 1 (2005): 49-66.
- Chan, Ernie, Marcel Heimlich, Avi Purkayastha, and Robert Van De Geijn. “Collective communication: theory, practice, and experience.” Concurrency and Computation: Practice and Experience 19, no. 13 (2007): 1749-1783.
- Pješivac-Grbović, Jelena, Thara Angskun, George Bosilca, Graham E. Fagg, Edgar Gabriel, and Jack J. Dongarra. “Performance analysis of MPI collective operations.” Cluster Computing 10 (2007): 127-143.
Math Library
- Balay, Satish, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. “Efficient management of parallelism in object-oriented numerical software libraries.” In Modern software tools for scientific computing, pp. 163-202. Boston, MA: Birkhäuser Boston, 1997.
- Agullo, Emmanuel, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. “Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects.” In Journal of Physics: Conference Series, vol. 180, no. 1, p. 012037. IOP Publishing, 2009.
- Choi, Jaeyoung, Jack J. Dongarra, Roldan Pozo, and David W. Walker. “ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers.” In The Fourth Symposium on the Frontiers of Massively Parallel Computation, pp. 120-121. IEEE Computer Society, 1992.
- Dongarra, Jack J., Piotr Luszczek, and Antoine Petitet. “The LINPACK benchmark: past, present and future.” Concurrency and Computation: practice and experience 15, no. 9 (2003): 803-820.
- Williams, Samuel, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. “Optimization of sparse matrix-vector multiplication on emerging multicore platforms.” In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1-12. 2007.
I/O and Storage
- Thakur, Rajeev, William Gropp, and Ewing Lusk. “On implementing MPI-IO portably and with high performance.” In Proceedings of the sixth workshop on I/O in parallel and distributed systems, pp. 23-32. 1999.
- Folk, Mike, Gerd Heber, Quincey Koziol, Elena Pourmal, and Dana Robinson. “An overview of the HDF5 technology suite and its applications.” In Proceedings of the EDBT/ICDT 2011 workshop on array databases, pp. 36-47. 2011.
- Chen, Peter M., Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson. “RAID: High-performance, reliable secondary storage.” ACM Computing Surveys (CSUR) 26, no. 2 (1994): 145-185.
- Patil, Swapnil V., Garth A. Gibson, Sam Lang, and Milo Polte. “GIGA+ scalable directories for shared file systems.” In Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing’07, pp. 26-29. 2007.
- Gibbons, Phillip B. “A more practical PRAM model.” In Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, pp. 158-168. 1989.
- Culler, David, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten Von Eicken. “LogP: Towards a realistic model of parallel computation.” In Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 1-12. 1993.
- Valiant, Leslie G. “A bridging model for multi-core computing.” Journal of Computer and System Sciences 77, no. 1 (2011): 154-166.
- Williams, Samuel, Andrew Waterman, and David Patterson. “Roofline: an insightful visual performance model for multicore architectures.” Communications of the ACM 52, no. 4 (2009): 65-76.
- Gropp, William, Luke N. Olson, and Philipp Samfass. “Modeling MPI communication performance on SMP nodes: Is it time to retire the ping pong test.” In Proceedings of the 23rd European MPI Users’ Group Meeting, pp. 41-50. 2016.
- Hoefler, Torsten, Timo Schneider, and Andrew Lumsdaine. “Characterizing the influence of system noise on large-scale applications by simulation.” In SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-11. IEEE, 2010.
- Hoefler, Torsten, and Roberto Belli. “Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-12. 2015.
Applications
- Solomonik, Edgar, and James Demmel. “Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms.” In European Conference on Parallel Processing, pp. 90-109. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.
- Agarwal, Ramesh C., Susanne M. Balle, Fred G. Gustavson, Mahesh Joshi, and Prasad Palkar. “A three-dimensional approach to parallel matrix multiplication.” IBM Journal of Research and Development 39, no. 5 (1995): 575-582.
- Datta, Kaushik, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. “Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures.” In SC’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp. 1-12. IEEE, 2008.
- Warren, Michael S., and John K. Salmon. “Astrophysical N-body simulations using hierarchical tree data structures.” SC 92 (1992): 570-576.
- Sengupta, Shubhabrata, Mark Harris, Yao Zhang, and John D. Owens. “Scan primitives for GPU computing.” (2007).
- Buluç, Aydin, and Kamesh Madduri. “Parallel breadth-first search on distributed memory systems.” In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1-12. 2011.
- Holst, Terry L. “Supercomputer applications in computational fluid dynamics.” In Supercomputing’88: Proceedings of the 1988 ACM/IEEE Conference on Supercomputing, Vol. II Science and Applications, pp. 51-60. IEEE, 1988.
- Dubey, Abhimanyu, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur et al. “The llama 3 herd of models.” arXiv preprint arXiv:2407.21783 (2024).
- Ben-Nun, Tal, and Torsten Hoefler. “Demystifying parallel and distributed deep learning: An in-depth concurrency analysis.” ACM Computing Surveys (CSUR) 52, no. 4 (2019): 1-43.