Publications

Yanfei Guo, Ken Raffenetti, Hui Zhou, Pavan Balaji, Min Si, Abdelhalim Amer, Shintaro Iwasaki, Sangmin Seo, Giuseppe Congiu, Robert Latham, Lena Oden, Thomas Gillis, Rohit Zambre, Kaiming Ouyang, Charles Archer, Wesley Bland, Jithin Jose, Sayantan Sur, Hajime Fujita, Dmitry Durnov, Michael Chuvelev, Gengbin Zheng, Alex Brooks, Sagar Thapaliya, Taru Doodi, Maria Garazan, Steve Oyanagi, Marc Snir, and Rajeev Thakur. Preparing MPICH for exascale. Int. J. High Perform. Comput. Appl., 39(2):283–305, 2025. (doi:10.1177/10943420241311608)
Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, and Rajeev Thakur. An optimized error-controlled MPI collective framework integrated with lossy compression. In IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024, pages 752–764. IEEE, 2024. (doi:10.1109/IPDPS57955.2024.00072)
Hui Zhou, Robert Latham, Ken Raffenetti, Yanfei Guo, and Rajeev Thakur. MPI progress for all. In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, November 17-22, 2024, pages 425–435. IEEE, 2024. (doi:10.1109/SCW63240.2024.00063)
Hui Zhou, Ken Raffenetti, Wesley Bland, and Yanfei Guo. Generating bindings in MPICH. CoRR, abs/2401.16547, 2024. (doi:10.48550/ARXIV.2401.16547)
Thomas Gillis, Ken Raffenetti, Hui Zhou, Yanfei Guo, and Rajeev Thakur. Quantifying the performance benefits of partitioned communication in MPI. In Proceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023, pages 285–294. ACM, 2023. (doi:10.1145/3605573.3605599)
Jiajun Huang, Kaiming Ouyang, Yujia Zhai, Jinyang Liu, Min Si, Ken Raffenetti, Hui Zhou, Atsushi Hori, Zizhong Chen, Yanfei Guo, and Rajeev Thakur. Accelerating MPI collectives with process-in-process-based multi-object techniques. In Ali Raza Butt, Ningfang Mi, and Kyle Chard, editors, Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2023, Orlando, FL, USA, June 16-23, 2023, pages 333–334. ACM, 2023. (doi:10.1145/3588195.3595955)
Jiajun Huang, Kaiming Ouyang, Yujia Zhai, Jinyang Liu, Min Si, Ken Raffenetti, Hui Zhou, Atsushi Hori, Zizhong Chen, Yanfei Guo, and Rajeev Thakur. Pip-mcoll: Process-in-process-based multi-object MPI collectives. In IEEE International Conference on Cluster Computing, CLUSTER 2023, Santa Fe, NM, USA, October 31 – Nov. 3, 2023, pages 354–364. IEEE, 2023. (doi:10.1109/CLUSTER52292.2023.00037)
Chen Wang, Yanfei Guo, Pavan Balaji, and Marc Snir. Near-lossless MPI tracing and proxy application autogeneration. IEEE Trans. Parallel Distributed Syst., 34(1):123–140, 2023. (doi:10.1109/TPDS.2022.3215942)
Hui Zhou, Ken Raffenetti, Junchao Zhang, Yanfei Guo, and Rajeev Thakur. Frustrated with mpi+threads? try mpixthreads!. In Proceedings of the 30th European MPI Users’ Group Meeting, EuroMPI 2023, Bristol, United Kingdom, September 11-13, 2023, pages 2:1–2:10. ACM, 2023. (doi:10.1145/3615318.3615320)
Michael Wilkins, Yanfei Guo, Rajeev Thakur, Peter A. Dinda, and Nikos Hardavellas. Acclaim: Advancing the practicality of MPI collective communication autotuning using machine learning. In IEEE International Conference on Cluster Computing, CLUSTER 2022, Heidelberg, Germany, September 5-8, 2022, pages 161–171. IEEE, 2022. (doi:10.1109/CLUSTER51413.2022.00030)
Hui Zhou, Ken Raffenetti, Yanfei Guo, and Rajeev Thakur. MPIX stream: An explicit solution to hybrid MPI+X programming. In EuroMPI/USA’22: 29th European MPI Users’ Group Meeting, Chattanooga, TN, USA, September 26 – 28, 2022, pages 1–10. ACM, 2022. (doi:10.1145/3555819.3555820)
Sayan Ghosh, Yanfei Guo, Pavan Balaji, and Assefaw H. Gebremedhin. RMACXX: an efficient high-level C++ interface over MPI-3 RMA. In Laurent Lefèvre, Stacy Patterson, Young Choon Lee, Haiying Shen, Shashikant Ilager, Mohammad Goudarzi, Adel Nadjaran Toosi, and Rajkumar Buyya, editors, 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021, Melbourne, Australia, May 10-13, 2021, pages 143–155. IEEE, 2021. (doi:10.1109/CCGRID51090.2021.00024)
William Gropp, Rajeev Thakur, and Pavan Balaji. Translational research in the MPICH project. J. Comput. Sci., 52:101203, 2021. (doi:10.1016/J.JOCS.2020.101203)
Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, and Pavan Balaji. Daps: A dynamic asynchronous progress stealing model for MPI communication. In IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021, pages 516–527. IEEE, 2021. (doi:10.1109/CLUSTER48925.2021.00027)
Min Si, Huansong Fu, Jeff R. Hammond, and Pavan Balaji. Openshmem over MPI as a performance contender: Thorough analysis and optimizations. In Stephen W. Poole, Oscar R. Hernandez, Matthew B. Baker, and Tony Curtis, editors, OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks – 8th Workshop on OpenSHMEM and Related Technologies, OpenSHMEM 2021, Virtual Event, September 14-16, 2021, Revised Selected Papers, volume 13159 of Lecture Notes in Computer Science, pages 39–60. Springer, 2021. (doi:10.1007/978-3-031-04888-3_3)
Chen Wang, Pavan Balaji, and Marc Snir. Pilgrim: scalable and (near) lossless MPI tracing. In Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin, editors, International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, page 52. ACM, 2021. (doi:10.1145/3458817.3476151)
Rohit Zambre, Damodar Sahasrabudhe, Hui Zhou, Martin Berzins, Aparna Chandramowlishwaran, and Pavan Balaji. Logically parallel communication for fast mpi+threads applications. IEEE Trans. Parallel Distributed Syst., 32(12):3038–3052, 2021. (doi:10.1109/TPDS.2021.3075157)
Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, and Michela Taufer. Memory-efficient and skew-tolerant mapreduce over MPI for supercomputing systems. IEEE Trans. Parallel Distributed Syst., 31(12):2734–2748, 2020. (doi:10.1109/TPDS.2019.2932066)
Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, and Pavan Balaji. CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication. In Christine Cuicchi, Irene Qualters, and William T. Kramer, editors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9-19, 2020, page 36. IEEE/ACM, 2020. (doi:10.1109/SC41405.2020.00040)
Rohit Zambre, Aparna Chandramowlishwaran, and Pavan Balaji. How I learned to stop worrying about user-visible endpoints and love MPI. In Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, and H. Peter Hofstee, editors, ICS ’20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020, pages 35:1–35:13. ACM, 2020. (doi:10.1145/3392717.3392773)
Rohit Zambre, Aparna Chandramowlishwaran, and Pavan Balaji. How I learned to stop worrying about user-visible endpoints and love MPI. CoRR, abs/2005.00263, 2020.
Rohit Zambre, Aparna Chandramowlishwaran, and Pavan Balaji. Scalable communication endpoints for mpi+threads applications. CoRR, abs/2002.02509, 2020.
Abdelhalim Amer, Charles Archer, Michael Blocksome, Chongxiao Cao, Michael Chuvelev, Hajime Fujita, Maria Garzaran, Yanfei Guo, Jeff R. Hammond, Shintaro Iwasaki, Kenneth J. Raffenetti, Mikhail Shiryaev, Min Si, Kenjiro Taura, Sagar Thapaliya, and Pavan Balaji. Software combining to mitigate multithreaded MPI contention. In Rudolf Eigenmann, Chen Ding, and Sally A. McKee, editors, Proceedings of the ACM International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA, June 26-28, 2019, pages 367–379. ACM, 2019. (doi:10.1145/3330345.3330378)
Joshua Hoke Davis, Tao Gao, Sunita Chandrasekaran, Heike Jagode, Anthony Danalis, Jack J. Dongarra, Pavan Balaji, and Michela Taufer. Characterization of power usage and performance in data-intensive applications using mapreduce over MPI. In Ian T. Foster, Gerhard R. Joubert, Ludek Kucera, Wolfgang E. Nagel, and Frans J. Peters, editors, Parallel Computing: Technology Trends, Proceedings of the International Conference on Parallel Computing, PARCO 2019, Prague, Czech Republic, September 10-13, 2019, volume 36 of Advances in Parallel Computing, pages 287–298. IOS Press, 2019. (doi:10.3233/APC200053)
Abdelhalim Amer, Huiwei Lu, Pavan Balaji, Milind Chabbi, Yanjie Wei, Jeff R. Hammond, and Satoshi Matsuoka. Lock contention management in multithreaded MPI. ACM Trans. Parallel Comput., 5(3):12:1–12:21, 2018. (doi:10.1145/3275443)
Sudheer Chunduri, Scott Parker, Pavan Balaji, Kevin Harms, and Kalyan Kumaran. Characterization of MPI usage on a production supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA, November 11-16, 2018, pages 30:1–30:15. IEEE / ACM, 2018.
Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, and Michela Taufer. On the power of combiner optimizations in mapreduce over MPI workflows. In 24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018, Singapore, December 11-13, 2018, pages 441–448. IEEE, 2018. (doi:10.1109/PADSW.2018.8644595)
Min Si, Antonio J. Peña, Jeff R. Hammond, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. Dynamic adaptable asynchronous progress model for MPI RMA multiphase applications. IEEE Trans. Parallel Distributed Syst., 29(9):1975–1989, 2018. (doi:10.1109/TPDS.2018.2815568)
Rohit Zambre, Aparna Chandramowlishwaran, and Pavan Balaji. Scalable communication endpoints for mpi+threads applications. In 24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018, Singapore, December 11-13, 2018, pages 803–812. IEEE, 2018. (doi:10.1109/PADSW.2018.8645059)
Hoang-Vu Dang, Sangmin Seo, Abdelhalim Amer, and Pavan Balaji. Advanced thread synchronization for multithreaded MPI implementations. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Madrid, Spain, May 14-17, 2017, pages 314–324. IEEE Computer Society / ACM, 2017. (doi:10.1109/CCGRID.2017.65)
Yanfei Guo, Charles J. Archer, Michael Blocksome, Scott Parker, Wesley Bland, Ken Raffenetti, and Pavan Balaji. Memory compression techniques for network address management in MPI. In 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, May 29 – June 2, 2017, pages 1008–1017. IEEE Computer Society, 2017. (doi:10.1109/IPDPS.2017.18)
Robert Latham, Leonardo Bautista-Gomez, and Pavan Balaji. Portable topology-aware MPI-I/O. In 23rd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2017, Shenzhen, China, December 15-17, 2017, pages 710–719. IEEE Computer Society, 2017. (doi:10.1109/ICPADS.2017.00096)
Seyed Hessam Mirsadeghi, Jesper Larsson Träff, Pavan Balaji, and Ahmad Afsahi. Exploiting common neighborhoods to optimize MPI neighborhood collectives. In 24th IEEE International Conference on High Performance Computing, HiPC 2017, Jaipur, India, December 18-21, 2017, pages 348–357. IEEE Computer Society, 2017. (doi:10.1109/HIPC.2017.00047)
Antonio J. Peña, Pavan Balaji, William Gropp, and Rajeev Thakur, editors. Proceedings of the 24th European MPI Users’ Group Meeting, EuroMPI/USA 2017, Chicago, IL, USA, September 25-28, 2017. ACM, 2017.
Ken Raffenetti, Abdelhalim Amer, Lena Oden, Charles Archer, Wesley Bland, Hajime Fujita, Yanfei Guo, Tomislav Janjusic, Dmitry Durnov, Michael Blocksome, Min Si, Sangmin Seo, Akhil Langer, Gengbin Zheng, Masamichi Takagi, Paul K. Coffman, Jithin Jose, Sayantan Sur, Alexander Sannikov, Sergey Oblomov, Michael Chuvelev, Masayuki Hatanaka, Xin Zhao, Paul F. Fischer, Thilina Rathnayake, Matthew Otten, Misun Min, and Pavan Balaji. Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1. In Bernd Mohr and Padma Raghavan, editors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 – 17, 2017, page 62. ACM, 2017. (doi:10.1145/3126908.3126963)
Min Si and Pavan Balaji. Process-based asynchronous progress model for MPI point-to-point communication. In 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017, Bangkok, Thailand, December 18-20, 2017, pages 206–214. IEEE Computer Society, 2017. (doi:10.1109/HPCC-SMARTCITY-DSS.2017.27)
Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Karthik Murthy, Milind Chabbi, Pavan Balaji, Keith R. Bisset, James Dinan, Wu-chun Feng, John M. Mellor-Crummey, Xiaosong Ma, and Rajeev Thakur. MPI-ACC: accelerator-aware MPI for scientific applications. IEEE Trans. Parallel Distributed Syst., 27(5):1401–1414, 2016. (doi:10.1109/TPDS.2015.2446479)
James Dinan, Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, and Rajeev Thakur. An implementation and evaluation of the MPI 3.0 one-sided communication interface. Concurr. Comput. Pract. Exp., 28(17):4385–4404, 2016. (doi:10.1002/CPE.3758)
Sayan Ghosh, Jeff R. Hammond, Antonio J. Peña, Pavan Balaji, Assefaw Hadish Gebremedhin, and Barbara M. Chapman. One-sided interface for matrix operations using MPI-3 RMA: A case study with elemental. In 45th International Conference on Parallel Processing, ICPP 2016, Philadelphia, PA, USA, August 16-19, 2016, pages 185–194. IEEE Computer Society, 2016. (doi:10.1109/ICPP.2016.28)
Jichi Guo, Qing Yi, Jiayuan Meng, Junchao Zhang, and Pavan Balaji. Compiler-assisted overlapping of communication and computation in MPI applications. In 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016, Taipei, Taiwan, September 12-16, 2016, pages 60–69. IEEE Computer Society, 2016. (doi:10.1109/CLUSTER.2016.62)
Xin Zhao, Pavan Balaji, and William Gropp. Scalability challenges in current MPI one-sided implementations. In Riqing Chen, Chunming Rong, and Dan Grigoras, editors, 15th International Symposium on Parallel and Distributed Computing, ISPDC 2016, Fuzhou, China, July 8-10, 2016, pages 38–47. IEEE Computer Society, 2016. (doi:10.1109/ISPDC.2016.14)
Abdelhalim Amer, Huiwei Lu, Pavan Balaji, and Satoshi Matsuoka. Characterizing MPI and hybrid mpi+threads applications at scale: Case study with BFS. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 1075–1083. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.93)
Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, and Satoshi Matsuoka. Mpi+threads: runtime contention and remedies. In Albert Cohen and David Grove, editors, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7-11, 2015, pages 239–248. ACM, 2015. (doi:10.1145/2688500.2688522)
Wesley Bland, Huiwei Lu, Sangmin Seo, and Pavan Balaji. Lessons learned implementing user-level failure mitigation in MPICH. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 1123–1126. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.51)
Yanfei Guo, Wesley Bland, Pavan Balaji, and Xiaobo Zhou. Fault tolerant mapreduce-mpi for HPC clusters. In Jackie Kern and Jeffrey S. Vetter, editors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA, November 15-20, 2015, pages 34:1–34:12. ACM, 2015. (doi:10.1145/2807591.2807617)
Torsten Hoefler, James Dinan, Rajeev Thakur, Brian Barrett, Pavan Balaji, William Gropp, and Keith D. Underwood. Remote memory access programming in MPI-3. ACM Trans. Parallel Comput., 2(2):9:1–9:26, 2015. (doi:10.1145/2780584)
Huiwei Lu, Sangmin Seo, and Pavan Balaji. MPI+ULT: overlapping communication and computation with user-level threads. In 17th IEEE International Conference on High Performance Computing and Communications, HPCC 2015, 7th IEEE International Symposium on Cyberspace Safety and Security, CSS 2015, and 12th IEEE International Conference on Embedded Software and Systems, ICESS 2015, New York, NY, USA, August 24-26, 2015, pages 444–454. IEEE, 2015. (doi:10.1109/HPCC-CSS-ICESS.2015.82)
Ken Raffenetti, Antonio J. Peña, and Pavan Balaji. Toward implementing robust support for portals 4 networks in MPICH. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 1173–1176. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.79)
Sangmin Seo, Robert Latham, Junchao Zhang, and Pavan Balaji. Implementation and evaluation of MPI nonblocking collective I/O. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 1084–1091. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.81)
Min Si, Antonio J. Peña, Jeff R. Hammond, Pavan Balaji, and Yutaka Ishikawa. Scaling nwchem with efficient and portable asynchronous communication in MPI RMA. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 811–816. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.48)
Min Si, Antonio J. Peña, Jeff R. Hammond, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. Casper: An asynchronous progress model for MPI RMA on many-core architectures. In 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, May 25-29, 2015, pages 665–676. IEEE Computer Society, 2015. (doi:10.1109/IPDPS.2015.35)
Karthikeyan Vaidyanathan, Dhiraj D. Kalamkar, Kiran Pamnany, Jeff R. Hammond, Pavan Balaji, Dipankar Das, Jongsoo Park, and Bálint Joó. Improving concurrency and asynchrony in multithreaded MPI applications using software offloading. In Jackie Kern and Jeffrey S. Vetter, editors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA, November 15-20, 2015, pages 30:1–30:12. ACM, 2015. (doi:10.1145/2807591.2807602)
Xin Zhao, Pavan Balaji, and William Gropp. Runtime support for irregular computation in mpi-based applications. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 701–704. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.82)
Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang, and Pavan Balaji. Analyzing MPI-3.0 process-level shared memory: A case study with stencil computations. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages 1099–1106. IEEE Computer Society, 2015. (doi:10.1109/CCGRID.2015.131)
Wesley Bland, Kenneth Raffenetti, and Pavan Balaji. Simplifying the recovery model of user-level failure mitigation. In Proceedings of the 2014 Workshop on Exascale MPI, ExaMPI ’14, New Orleans, Louisiana, USA, November 16-21, 2014, pages 20–25. IEEE, 2014. (doi:10.1109/EXAMPI.2014.4)
Zhezhe Chen, James Dinan, Zhen Tang, Pavan Balaji, Hua Zhong, Jun Wei, Tao Huang, and Feng Qin. Mc-checker: Detecting memory consistency errors in MPI one-sided applications. In Trish Damkroger and Jack J. Dongarra, editors, International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, 2014, pages 499–510. IEEE Computer Society, 2014. (doi:10.1109/SC.2014.46)
James Dinan, Ryan E. Grant, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. Enabling communication concurrency through flexible MPI endpoints. Int. J. High Perform. Comput. Appl., 28(4):390–405, 2014. (doi:10.1177/1094342014548772)
John Jenkins, James Dinan, Pavan Balaji, Tom Peterka, Nagiza F. Samatova, and Rajeev Thakur. Processing MPI derived datatypes on noncontiguous gpu-resident data. IEEE Trans. Parallel Distributed Syst., 25(10):2627–2637, 2014. (doi:10.1109/TPDS.2013.234)
Min Si, Antonio J. Peña, Pavan Balaji, Masamichi Takagi, and Yutaka Ishikawa. MT-MPI: multithreaded MPI for many-core environments. In Arndt Bode, Michael Gerndt, Per Stenström, Lawrence Rauchwerger, Barton P. Miller, and Martin Schulz, editors, 2014 International Conference on Supercomputing, ICS’14, Muenchen, Germany, June 10-13, 2014, pages 125–134. ACM, 2014. (doi:10.1145/2597652.2597658)
Chaoran Yang, Wesley Bland, John M. Mellor-Crummey, and Pavan Balaji. Portable, mpi-interoperable coarray fortran. In José E. Moreira and James R. Larus, editors, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, Orlando, FL, USA, February 15-19, 2014, pages 81–92. ACM, 2014. (doi:10.1145/2555243.2555270)
Junchao Zhang, Bill Long, Kenneth Raffenetti, and Pavan Balaji. Implementing the MPI-3.0 fortran 2008 binding. In Jack J. Dongarra, Yutaka Ishikawa, and Atsushi Hori, editors, 21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, Kyoto, Japan – September 09 – 12, 2014, page 1. ACM, 2014. (doi:10.1145/2642769.2642777)
Judicael A. Zounmevo, Xin Zhao, Pavan Balaji, William Gropp, and Ahmad Afsahi. Nonblocking epochs in MPI one-sided communication. In Trish Damkroger and Jack J. Dongarra, editors, International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, 2014, pages 475–486. IEEE Computer Society, 2014. (doi:10.1109/SC.2014.44)
Ashwin M. Aji, Pavan Balaji, James Dinan, Wu-chun Feng, and Rajeev Thakur. Synchronization and ordering semantics in hybrid MPI+GPU programming. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, Cambridge, MA, USA, May 20-24, 2013, pages 1020–1029. IEEE, 2013. (doi:10.1109/IPDPSW.2013.256)
Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James Dinan, Wu-chun Feng, John M. Mellor-Crummey, Xiaosong Ma, and Rajeev Thakur. On the efficacy of gpu-integrated MPI for scientific applications. In Manish Parashar, Jon B. Weissman, Dick H. J. Epema, and Renato J. O. Figueiredo, editors, The 22nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC’13, New York, NY, USA – June 17 – 21, 2013, pages 191–202. ACM, 2013.
Pavan Balaji and Dries Kimpe. On the reproducibility of MPI reduction operations. In 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, HPCC/EUC 2013, Zhangjiajie, China, November 13-15, 2013, pages 407–414. IEEE, 2013. (doi:10.1109/HPCC.AND.EUC.2013.65)
James Dinan, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. Enabling MPI interoperability through flexible communication endpoints. In Jack J. Dongarra, Javier García Blas, and Jesús Carretero, editors, 20th European MPI Users’s Group Meeting, EuroMPI ’13, Madrid, Spain – September 15 – 18, 2013, pages 13–18. ACM, 2013. (doi:10.1145/2488551.2488553)
Md. Ziaul Haque, Qing Yi, James Dinan, and Pavan Balaji. Enhancing performance portability of MPI applications through annotation-based transformations. In 42nd International Conference on Parallel Processing, ICPP 2013, Lyon, France, October 1-4, 2013, pages 631–640. IEEE Computer Society, 2013. (doi:10.1109/ICPP.2013.77)
Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing, 95(12):1121–1136, 2013. (doi:10.1007/S00607-013-0324-2)
Antonio J. Peña, Ralf G. Correa Carvalho, James Dinan, Pavan Balaji, Rajeev Thakur, and William Gropp. Analysis of topology-dependent MPI performance on gemini networks. In Jack J. Dongarra, Javier García Blas, and Jesús Carretero, editors, 20th European MPI Users’s Group Meeting, EuroMPI ’13, Madrid, Spain – September 15 – 18, 2013, pages 61–66. ACM, 2013. (doi:10.1145/2488551.2488564)
Xin Zhao, Pavan Balaji, William Gropp, and Rajeev Thakur. Mpi-interoperable generalized active messages. In 19th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2013, Seoul, Korea, December 15-18, 2013, pages 200–207. IEEE Computer Society, 2013. (doi:10.1109/ICPADS.2013.38)
Xin Zhao, Pavan Balaji, William Gropp, and Rajeev Thakur. Optimization strategies for mpi-interoperable active messages. In IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, DASC 2013, Chengdu, China, December 21-22, 2013, pages 508–515. IEEE Computer Society, 2013. (doi:10.1109/DASC.2013.116)
Xin Zhao, Darius Buntinas, Judicael A. Zounmevo, James Dinan, David Goodell, Pavan Balaji, Rajeev Thakur, Ahmad Afsahi, and William Gropp. Toward asynchronous and mpi-interoperable active messages. In 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, May 13-16, 2013, pages 87–94. IEEE Computer Society, 2013. (doi:10.1109/CCGRID.2013.84)
Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset, and Rajeev Thakur. MPI-ACC: an integrated and extensible approach to data movement in accelerator-based systems. In Geyong Min, Jia Hu, Lei (Chris) Liu, Laurence Tianruo Yang, Seetharami Seelam, and Laurent Lefèvre, editors, 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, HPCC-ICESS 2012, Liverpool, United Kingdom, June 25-27, 2012, pages 647–654. IEEE Computer Society, 2012. (doi:10.1109/HPCC.2012.92)
James Dinan, Pavan Balaji, Jeff R. Hammond, Sriram Krishnamoorthy, and Vinod Tipparaju. Supporting the global arrays PGAS model using MPI one-sided communication. In 26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, May 21-25, 2012, pages 739–750. IEEE Computer Society, 2012. (doi:10.1109/IPDPS.2012.72)
James Dinan, David Goodell, William Gropp, Rajeev Thakur, and Pavan Balaji. Efficient multithreaded context ID allocation in MPI. In Jesper Larsson Träff, Siegfried Benkner, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, volume 7490 of Lecture Notes in Computer Science, pages 57–66. Springer, 2012. (doi:10.1007/978-3-642-33518-1_11)
William Gropp, Ewing L. Lusk, and Rajeev Thakur. Advanced MPI including new MPI-3 features. In Jesper Larsson Träff, Siegfried Benkner, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, volume 7490 of Lecture Notes in Computer Science, page 14. Springer, 2012. (doi:10.1007/978-3-642-33518-1_5)
Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian W. Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. Leveraging mpi’s one-sided communication interface for shared-memory programming. In Jesper Larsson Träff, Siegfried Benkner, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, volume 7490 of Lecture Notes in Computer Science, pages 132–141. Springer, 2012. (doi:10.1007/978-3-642-33518-1_18)
John Jenkins, James Dinan, Pavan Balaji, Nagiza F. Samatova, and Rajeev Thakur. Enabling fast, noncontiguous GPU data movement in hybrid MPI+GPU environments. In 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012, Beijing, China, September 24-28, 2012, pages 468–476. IEEE Computer Society, 2012. (doi:10.1109/CLUSTER.2012.72)
Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Torsten Hoefler, Sameer Kumar, Ewing L. Lusk, Rajeev Thakur, and Jesper Larsson Träff. Mpi on millions of cores. Parallel Process. Lett., 21(1):45–60, 2011. (doi:10.1142/S0129626411000060)
James Dinan, Pavan Balaji, Jeff R. Hammond, Sriram Krishnamoorthy, and Vinod Tipparaju. Poster: High-level, one-sided programming models on MPI: a case study with global arrays and nwchem. In Scott A. Lathrop, Jim Costa, and William Kramer, editors, Conference on High Performance Computing Networking, Storage and Analysis – Companion Volume, SC 2011, Seattle, WA, USA, November 12-18, 2011, pages 37–38. ACM, 2011. (doi:10.1145/2148600.2148620)
James Dinan, Sriram Krishnamoorthy, Pavan Balaji, Jeff R. Hammond, Manojkumar Krishnan, Vinod Tipparaju, and Abhinav Vishnu. Noncollective communicator creation in MPI. In Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes in Computer Science, pages 282–291. Springer, 2011. (doi:10.1007/978-3-642-24449-0_32)
David Goodell, William Gropp, Xin Zhao, and Rajeev Thakur. Scalable memory use in MPI: A case study with MPICH2. In Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes in Computer Science, pages 140–149. Springer, 2011. (doi:10.1007/978-3-642-24449-0_17)
Ganesh Gopalakrishnan, Robert M. Kirby, Stephen F. Siegel, Rajeev Thakur, William Gropp, Ewing L. Lusk, Bronis R. de Supinski, Martin Schulz, and Greg Bronevetsky. Formal analysis of mpi-based parallel programs. Commun. ACM, 54(12):82–91, 2011. (doi:10.1145/2043174.2043194)
William Gropp, Torsten Hoefler, Rajeev Thakur, and Jesper Larsson Träff. Performance expectations and guidelines for MPI derived datatypes. In Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes in Computer Science, pages 150–159. Springer, 2011. (doi:10.1007/978-3-642-24449-0_18)
Torsten Hoefler, Rolf Rabenseifner, Hubert Ritzdorf, Bronis R. de Supinski, Rajeev Thakur, and Jesper Larsson Träff. The scalable process topology interface of MPI 2.2. Concurr. Comput. Pract. Exp., 23(4):293–310, 2011. (doi:10.1002/CPE.1643)
Mohammad J. Rashti, Jonathan Green, Pavan Balaji, Ahmad Afsahi, and William Gropp. Multi-core and network aware MPI topology functions. In Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes in Computer Science, pages 50–60. Springer, 2011. (doi:10.1007/978-3-642-24449-0_8)
Rui Wang, Erlin Yao, Mingyu Chen, Guangming Tan, Pavan Balaji, and Darius Buntinas. Building algorithmically nonstop fault tolerant MPI programs. In 18th International Conference on High Performance Computing, HiPC 2011, Bengaluru, India, December 18-21, 2011, pages 1–9. IEEE Computer Society, 2011. (doi:10.1109/HIPC.2011.6152716)
Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, and Rajeev Thakur. Fine-grained multithreading support for hybrid threaded MPI programming. Int. J. High Perform. Comput. Appl., 24(1):49–57, 2010. (doi:10.1177/1094342009360206)
Pavan Balaji, Anthony Chan, William Gropp, Rajeev Thakur, and Ewing L. Lusk. The importance of non-data-communication overheads in MPI. Int. J. High Perform. Comput. Appl., 24(1):5–15, 2010. (doi:10.1177/1094342009359258)
James Dinan, Pavan Balaji, Ewing L. Lusk, P. Sadayappan, and Rajeev Thakur. Hybrid parallel programming with MPI and unified parallel C. In Nancy M. Amato, Hubertus Franke, and Paul H. J. Kelly, editors, Proceedings of the 7th Conference on Computing Frontiers, 2010, Bertinoro, Italy, May 17-19, 2010, pages 177–186. ACM, 2010. (doi:10.1145/1787275.1787323)
Gábor Dózsa, Sameer Kumar, Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Joe Ratterman, and Rajeev Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Rainer Keller, Edgar Gabriel, Michael M. Resch, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 17th European MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany, September 12-15, 2010. Proceedings, volume 6305 of Lecture Notes in Computer Science, pages 11–20. Springer, 2010. (doi:10.1007/978-3-642-15646-5_2)
David Goodell, Pavan Balaji, Darius Buntinas, Gábor Dózsa, William Gropp, Sameer Kumar, Bronis R. de Supinski, and Rajeev Thakur. Minimizing MPI resource contention in multithreaded multicore environments. In Proceedings of the 2010 IEEE International Conference on Cluster Computing, Heraklion, Crete, Greece, 20-24 September, 2010, pages 1–8. IEEE Computer Society, 2010. (doi:10.1109/CLUSTER.2010.11)
Torsten Hoefler, William Gropp, Rajeev Thakur, and Jesper Larsson Träff. Toward performance models of MPI implementations for understanding application scaling issues. In Rainer Keller, Edgar Gabriel, Michael M. Resch, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 17th European MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany, September 12-15, 2010. Proceedings, volume 6305 of Lecture Notes in Computer Science, pages 21–30. Springer, 2010. (doi:10.1007/978-3-642-15646-5_3)
Jayesh Krishna, Pavan Balaji, Ewing L. Lusk, Rajeev Thakur, and Fabian Tiller. Implementing MPI on windows: Comparison with common approaches on unix. In Rainer Keller, Edgar Gabriel, Michael M. Resch, and Jack J. Dongarra, editors, Recent Advances in the Message Passing Interface – 17th European MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany, September 12-15, 2010. Proceedings, volume 6305 of Lecture Notes in Computer Science, pages 160–169. Springer, 2010. (doi:10.1007/978-3-642-15646-5_17)
Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Rajeev Thakur, and William Gropp. Formal methods applied to high-performance computing software design: a case study of MPI one-sided communication-based locking. Softw. Pract. Exp., 40(1):23–43, 2010. (doi:10.1002/SPE.946)
Jesper Larsson Träff, William D. Gropp, and Rajeev Thakur. Self-consistent MPI performance guidelines. IEEE Trans. Parallel Distributed Syst., 21(5):698–709, 2010. (doi:10.1109/TPDS.2009.120)
Sriram Aananthakrishnan, Michael Delisi, Sarvani S. Vakkalanka, Anh Vo, Ganesh Gopalakrishnan, Robert M. Kirby, and Rajeev Thakur. How formal dynamic verification tools facilitate novel concurrency visualizations. In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 261–270. Springer, 2009. (doi:10.1007/978-3-642-03770-2_32)
Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Sameer Kumar, Ewing L. Lusk, Rajeev Thakur, and Jesper Larsson Träff. MPI on a million processors. In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 20–30. Springer, 2009. (doi:10.1007/978-3-642-03770-2_9)
Pavan Balaji, Anthony Chan, Rajeev Thakur, William Gropp, and Ewing L. Lusk. Toward message passing for a million processes: characterizing MPI on a massive scale blue gene/p. Comput. Sci. Res. Dev., 24(1-2):11–19, 2009. (doi:10.1007/S00450-009-0095-3)
Robert B. Ross, Robert Latham, William Gropp, Ewing L. Lusk, and Rajeev Thakur. Processing MPI datatypes outside MPI. In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 42–53. Springer, 2009. (doi:10.1007/978-3-642-03770-2_11)
Saba Sehrish, Jun Wang, and Rajeev Thakur. Conflict detection algorithm to minimize locking for MPI-IO atomicity. In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 143–153. Springer, 2009. (doi:10.1007/978-3-642-03770-2_21)
Rajeev Thakur and William Gropp. Test suite for evaluating performance of multithreaded MPI communication. Parallel Comput., 35(12):608–617, 2009. (doi:10.1016/J.PARCO.2008.12.013)
Vinod Tipparaju, William Gropp, Hubert Ritzdorf, Rajeev Thakur, and Jesper Larsson Träff. Investigating high performance RMA interfaces for the MPI-3 standard. In ICPP 2009, International Conference on Parallel Processing, Vienna, Austria, 22-25 September 2009, pages 293–300. IEEE Computer Society, 2009. (doi:10.1109/ICPP.2009.54)
Sarvani S. Vakkalanka, Grzegorz Szubzda, Anh Vo, Ganesh Gopalakrishnan, Robert M. Kirby, and Rajeev Thakur. Static-analysis assisted dynamic verification of MPI waitany programs (poster abstract). In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 329–330. Springer, 2009. (doi:10.1007/978-3-642-03770-2_43)
Anh Vo, Sarvani S. Vakkalanka, Michael Delisi, Ganesh Gopalakrishnan, Robert M. Kirby, and Rajeev Thakur. Formal verification of practical MPI programs. In Daniel A. Reed and Vivek Sarkar, editors, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February 14-18, 2009, pages 261–270. ACM, 2009. (doi:10.1145/1504176.1504214)
Anh Vo, Sarvani S. Vakkalanka, Jason Williams, Ganesh Gopalakrishnan, Robert M. Kirby, and Rajeev Thakur. Sound and efficient dynamic verification of MPI programs with probe non-determinism. In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 271–281. Springer, 2009. (doi:10.1007/978-3-642-03770-2_33)
Hao Zhu, David Goodell, William Gropp, and Rajeev Thakur. Hierarchical collectives in MPICH2. In Matti Ropo, Jan Westerholm, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, volume 5759 of Lecture Notes in Computer Science, pages 325–326. Springer, 2009. (doi:10.1007/978-3-642-03770-2_41)
Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, and Rajeev Thakur. Toward efficient support for multithreaded MPI communication. In Alexey L. Lastovetsky, M. Tahar Kechadi, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 120–129. Springer, 2008. (doi:10.1007/978-3-540-87475-1_20)
Pavan Balaji, Anthony Chan, William Gropp, Rajeev Thakur, and Ewing L. Lusk. Non-data-communication overheads in MPI: analysis on blue gene/p. In Alexey L. Lastovetsky, M. Tahar Kechadi, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 13–22. Springer, 2008. (doi:10.1007/978-3-540-87475-1_9)
Pavan Balaji, Wu-chun Feng, Jeremy S. Archuleta, Heshan Lin, Rajkumar Kettimuthu, Rajeev Thakur, and Xiaosong Ma. Semantics-based distributed I/O for mpiblast. In Siddhartha Chatterjee and Michael L. Scott, editors, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2008, Salt Lake City, UT, USA, February 20-23, 2008, pages 293–294. ACM, 2008. (doi:10.1145/1345206.1345262)
Surendra Byna, Yong Chen, Xian-He Sun, Rajeev Thakur, and William Gropp. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2008, November 15-21, 2008, Austin, Texas, USA, page 44. IEEE/ACM, 2008. (doi:10.1109/SC.2008.5213604)
William D. Gropp, Dries Kimpe, Robert B. Ross, Rajeev Thakur, and Jesper Larsson Träff. Self-consistent MPI-IO performance requirements and expectations. In Alexey L. Lastovetsky, M. Tahar Kechadi, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 167–176. Springer, 2008. (doi:10.1007/978-3-540-87475-1_25)
Subodh Sharma, Sarvani S. Vakkalanka, Ganesh Gopalakrishnan, Robert M. Kirby, Rajeev Thakur, and William Gropp. A formal approach to detect functionally irrelevant barriers in MPI programs. In Alexey L. Lastovetsky, M. Tahar Kechadi, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 265–273. Springer, 2008. (doi:10.1007/978-3-540-87475-1_36)
Jesper Larsson Träff, Andreas Ripke, Christian Siebert, Pavan Balaji, Rajeev Thakur, and William Gropp. A simple, pipelined algorithm for large, irregular all-gather problems. In Alexey L. Lastovetsky, M. Tahar Kechadi, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 84–93. Springer, 2008. (doi:10.1007/978-3-540-87475-1_16)
Sarvani S. Vakkalanka, Michael Delisi, Ganesh Gopalakrishnan, Robert M. Kirby, Rajeev Thakur, and William Gropp. Implementing efficient dynamic formal verification methods for MPI programs. In Alexey L. Lastovetsky, M. Tahar Kechadi, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’ Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 248–256. Springer, 2008. (doi:10.1007/978-3-540-87475-1_34)
Pavan Balaji, Darius Buntinas, Satish Balay, Barry F. Smith, Rajeev Thakur, and William Gropp. Nonuniformly communicating noncontiguous data: A case study with petsc and MPI. In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA, pages 1–10. IEEE, 2007. (doi:10.1109/IPDPS.2007.370223)
William Gropp and Rajeev Thakur. Thread-safety in an MPI implementation: Requirements and analysis. Parallel Comput., 33(9):595–604, 2007. (doi:10.1016/J.PARCO.2007.07.002)
William D. Gropp and Rajeev Thakur. Revealing the performance of MPI RMA implementations. In Franck Cappello, Thomas Hérault, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s Group Meeting, Paris, France, September 30 – October 3, 2007, Proceedings, volume 4757 of Lecture Notes in Computer Science, pages 272–280. Springer, 2007. (doi:10.1007/978-3-540-75416-9_38)
Robert Latham, William Gropp, Robert B. Ross, and Rajeev Thakur. Extending the MPI-2 generalized request interface. In Franck Cappello, Thomas Hérault, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s Group Meeting, Paris, France, September 30 – October 3, 2007, Proceedings, volume 4757 of Lecture Notes in Computer Science, pages 223–232. Springer, 2007. (doi:10.1007/978-3-540-75416-9_33)
Robert Latham, Robert B. Ross, and Rajeev Thakur. Implementing MPI-IO atomic mode and shared file pointers using MPI one-sided communication. Int. J. High Perform. Comput. Appl., 21(2):132–143, 2007. (doi:10.1177/1094342007077859)
Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Robert Palmer, Rajeev Thakur, and William Gropp. Practical model-checking method for verifying correctness of MPI programs. In Franck Cappello, Thomas Hérault, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s Group Meeting, Paris, France, September 30 – October 3, 2007, Proceedings, volume 4757 of Lecture Notes in Computer Science, pages 344–353. Springer, 2007. (doi:10.1007/978-3-540-75416-9_46)
Rajeev Thakur and William Gropp. Open issues in MPI implementation. In Lynn Choi, Yunheung Paek, and Sangyeun Cho, editors, Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Seoul, Korea, August 23-25, 2007, Proceedings, volume 4697 of Lecture Notes in Computer Science, pages 327–338. Springer, 2007. (doi:10.1007/978-3-540-74309-5_31)
Rajeev Thakur and William Gropp. Test suite for evaluating performance of MPI implementations that support mpi_thread_multiple. In Franck Cappello, Thomas Hérault, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s Group Meeting, Paris, France, September 30 – October 3, 2007, Proceedings, volume 4757 of Lecture Notes in Computer Science, pages 46–55. Springer, 2007. (doi:10.1007/978-3-540-75416-9_13)
Jesper Larsson Träff, William Gropp, and Rajeev Thakur. Self-consistent MPI performance requirements. In Franck Cappello, Thomas Hérault, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s Group Meeting, Paris, France, September 30 – October 3, 2007, Proceedings, volume 4757 of Lecture Notes in Computer Science, pages 36–45. Springer, 2007. (doi:10.1007/978-3-540-75416-9_12)
Surendra Byna, Xian-He Sun, Rajeev Thakur, and William Gropp. Automatic memory optimizations for improving MPI derived datatype performance. In Bernd Mohr, Jesper Larsson Träff, Joachim Worringen, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, September 17-20, 2006, Proceedings, volume 4192 of Lecture Notes in Computer Science, pages 238–246. Springer, 2006. (doi:10.1007/11846802_36)
Kenin Coloma, Avery Ching, Alok N. Choudhary, Wei-keng Liao, Robert B. Ross, Rajeev Thakur, and Lee Ward. A new flexible MPI collective I/O implementation. In Proceedings of the 2006 IEEE International Conference on Cluster Computing, September 25-28, 2006, Barcelona, Spain. IEEE Computer Society, 2006. (doi:10.1109/CLUSTR.2006.311865)
William D. Gropp and Rajeev Thakur. Issues in developing a thread-safe MPI implementation. In Bernd Mohr, Jesper Larsson Träff, Joachim Worringen, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, September 17-20, 2006, Proceedings, volume 4192 of Lecture Notes in Computer Science, pages 12–21. Springer, 2006. (doi:10.1007/11846802_11)
William Gropp, Ewing L. Lusk, Rajeev Thakur, and Robert B. Ross. S01 – advanced MPI: I/O and one-sided communication. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11-17, 2006, Tampa, FL, USA, page 202. ACM Press, 2006. (doi:10.1145/1188455.1188666)
Robert Latham, Robert B. Ross, and Rajeev Thakur. Can MPI be used for persistent parallel services?. In Bernd Mohr, Jesper Larsson Träff, Joachim Worringen, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, September 17-20, 2006, Proceedings, volume 4192 of Lecture Notes in Computer Science, pages 275–284. Springer, 2006. (doi:10.1007/11846802_40)
Jonghyun Lee, Robert B. Ross, Scott Atchley, Micah Beck, and Rajeev Thakur. MPI-IO/L: efficient remote I/O for MPI-IO via logistical networking. In 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Proceedings, 25-29 April 2006, Rhodes Island, Greece. IEEE, 2006. (doi:10.1109/IPDPS.2006.1639305)
Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Rajeev Thakur, and William D. Gropp. Formal verification of programs that use MPI one-sided communication. In Bernd Mohr, Jesper Larsson Träff, Joachim Worringen, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, September 17-20, 2006, Proceedings, volume 4192 of Lecture Notes in Computer Science, pages 30–39. Springer, 2006. (doi:10.1007/11846802_13)
William D. Gropp and Rajeev Thakur. An evaluation of implementation options for MPI one-sided communication. In Beniamino Di Martino, Dieter Kranzlmüller, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 12th European PVM/MPI Users’ Group Meeting, Sorrento, Italy, September 18-21, 2005, Proceedings, volume 3666 of Lecture Notes in Computer Science, pages 415–424. Springer, 2005. (doi:10.1007/11557265_53)
Robert Latham, Robert B. Ross, Rajeev Thakur, and Brian R. Toonen. Implementing MPI-IO shared file pointers without file system support. In Beniamino Di Martino, Dieter Kranzlmüller, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 12th European PVM/MPI Users’ Group Meeting, Sorrento, Italy, September 18-21, 2005, Proceedings, volume 3666 of Lecture Notes in Computer Science, pages 84–93. Springer, 2005. (doi:10.1007/11557265_15)
Robert B. Ross, Robert Latham, William Gropp, Rajeev Thakur, and Brian R. Toonen. Implementing MPI-IO atomic mode without file system support. In 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 9-12 May, 2005, Cardiff, UK, pages 1135–1142. IEEE Computer Society, 2005. (doi:10.1109/CCGRID.2005.1558687)
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl., 19(1):49–66, 2005. (doi:10.1177/1094342005051521)
Rajeev Thakur, Robert B. Ross, and Robert Latham. Implementing byte-range locks using MPI one-sided communication. In Beniamino Di Martino, Dieter Kranzlmüller, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 12th European PVM/MPI Users’ Group Meeting, Sorrento, Italy, September 18-21, 2005, Proceedings, volume 3666 of Lecture Notes in Computer Science, pages 119–128. Springer, 2005. (doi:10.1007/11557265_19)
Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, Darius Buntinas, Rajeev Thakur, and William D. Gropp. Efficient implementation of MPI-2 passive one-sided communication on infiniband clusters. In Dieter Kranzlmüller, Péter Kacsuk, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004, Proceedings, volume 3241 of Lecture Notes in Computer Science, pages 68–76. Springer, 2004. (doi:10.1007/978-3-540-30218-6_16)
Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, William Gropp, and Rajeev Thakur. High performance MPI-2 one-sided communication over infiniband. In 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), April 19-22, 2004, Chicago, Illinois, USA, pages 531–538. IEEE Computer Society, 2004. (doi:10.1109/CCGRID.2004.1336648)
Robert Latham, Robert B. Ross, and Rajeev Thakur. The impact of file systems on MPI-IO scalability. In Dieter Kranzlmüller, Péter Kacsuk, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004, Proceedings, volume 3241 of Lecture Notes in Computer Science, pages 87–96. Springer, 2004. (doi:10.1007/978-3-540-30218-6_18)
Jonghyun Lee, Robert B. Ross, Rajeev Thakur, Xiaosong Ma, and Marianne Winslett. RFS: efficient and flexible remote file access for MPI-IO. In 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), September 20-23 2004, San Diego, California, USA, pages 71–81. IEEE Computer Society, 2004. (doi:10.1109/CLUSTR.2004.1392604)
Rajeev Thakur, William D. Gropp, and Brian R. Toonen. Minimizing synchronization overhead in the implementation of MPI one-sided communication. In Dieter Kranzlmüller, Péter Kacsuk, and Jack J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004, Proceedings, volume 3241 of Lecture Notes in Computer Science, pages 57–67. Springer, 2004. (doi:10.1007/978-3-540-30218-6_15)
Surendra Byna, William D. Gropp, Xian-He Sun, and Rajeev Thakur. Improving the performance of MPI derived datatypes by optimizing memory-access cost. In 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 1-4 December 2003, Kowloon, Hong Kong, China, pages 412–419. IEEE Computer Society, 2003. (doi:10.1109/CLUSTR.2003.1253341)
William D. Gropp, Ewing L. Lusk, Robert B. Ross, and Rajeev Thakur. Using MPI-2: advanced features of the message passing interface. In 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 1-4 December 2003, Kowloon, Hong Kong, China. IEEE Computer Society, 2003. (doi:10.1109/CLUSTER.2003.10010)
Rajeev Thakur and William Gropp. Improving the performance of collective operations in MPICH. In Jack J. Dongarra, Domenico Laforenza, and Salvatore Orlando, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users’ Group Meeting, Venice, Italy, September 29 – October 2, 2003, Proceedings, volume 2840 of Lecture Notes in Computer Science, pages 257–267. Springer, 2003. (doi:10.1007/978-3-540-39924-7_38)
William Gropp. MPICH2: A new start for MPI implementations. In Dieter Kranzlmüller, Péter Kacsuk, Jack J. Dongarra, and Jens Volkert, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users’ Group Meeting, Linz, Austria, September 29 – October 2, 2002, Proceedings, volume 2474 of Lecture Notes in Computer Science, page 7. Springer, 2002. (doi:10.1007/3-540-45825-5_5)
Rajeev Thakur, William Gropp, and Ewing L. Lusk. Optimizing noncontiguous accesses in MPI-IO. Parallel Comput., 28(1):83–105, 2002. (doi:10.1016/S0167-8191(01)00129-6)
Alain J. Roy, Ian T. Foster, William Gropp, Nicholas T. Karonis, Volker Sander, and Brian R. Toonen. MPICH-GQ: quality-of-service for message passing programs. In Jed Donnelley, editor, Proceedings Supercomputing 2000, November 4-10, 2000, Dallas, Texas, USA. IEEE Computer Society, CD-ROM, page 19. IEEE Computer Society, 2000. (doi:10.1109/SC.2000.10017)
Rajeev Thakur, William Gropp, and Ewing L. Lusk. On implementing MPI-IO portably and with high performance. In Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems, IOPADS 1999, May 5, 1999, Atlanta, GA, USA, pages 23–32. ACM, 1999. (doi:10.1145/301816.301826)
Rajeev Thakur, William Gropp, and Ewing L. Lusk. A case for using mpi’s derived datatypes to improve I/O performance. In Proceedings of the ACM/IEEE Conference on Supercomputing, SC 1998, November 7-13, 1998, Orlando, FL, USA, page 1. IEEE Computer Society, 1998. (doi:10.1109/SC.1998.10006)
William Gropp and Ewing L. Lusk. Sowing mpich: a case study in the dissemination of a portable environment for parallel scientific computing. Int. J. High Perform. Comput. Appl., 11(2):103–114, 1997. (doi:10.1177/109434209701100204)