Publications
- Yanfei Guo, Ken Raffenetti,
Hui Zhou, Pavan Balaji, Min Si,
Abdelhalim Amer, Shintaro Iwasaki,
Sangmin Seo, Giuseppe Congiu,
Robert Latham, Lena Oden, Thomas
Gillis, Rohit Zambre, Kaiming Ouyang,
Charles Archer, Wesley Bland,
Jithin Jose, Sayantan Sur, Hajime
Fujita, Dmitry Durnov, Michael Chuvelev,
Gengbin Zheng, Alex Brooks, Sagar
Thapaliya, Taru Doodi, Maria Garazan,
Steve Oyanagi, Marc Snir, and
Rajeev Thakur.
Preparing MPICH for
exascale.
Int. J. High Perform. Comput. Appl., 39(2):283–305, 2025.
(doi:10.1177/10943420241311608)
- Jiajun Huang, Sheng Di,
Xiaodong Yu, Yujia Zhai, Zhaorui
Zhang, Jinyang Liu, Xiaoyi Lu,
Ken Raffenetti, Hui Zhou, Kai
Zhao, Zizhong Chen, Franck Cappello,
Yanfei Guo, and Rajeev Thakur.
An optimized
error-controlled MPI collective framework integrated with lossy
compression.
In IEEE International Parallel and Distributed Processing Symposium,
IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024, pages 752–764.
IEEE, 2024.
(doi:10.1109/IPDPS57955.2024.00072)
- Hui Zhou, Robert Latham,
Ken Raffenetti, Yanfei Guo, and
Rajeev Thakur.
MPI progress for
all.
In SC24-W: Workshops of the International Conference for High
Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA,
November 17-22, 2024, pages 425–435. IEEE, 2024.
(doi:10.1109/SCW63240.2024.00063)
- Hui Zhou, Ken Raffenetti,
Wesley Bland, and Yanfei Guo.
Generating bindings in
MPICH.
CoRR, abs/2401.16547, 2024.
(doi:10.48550/ARXIV.2401.16547)
- Thomas Gillis, Ken
Raffenetti, Hui Zhou, Yanfei Guo, and
Rajeev Thakur.
Quantifying the performance
benefits of partitioned communication in MPI.
In Proceedings of the 52nd International Conference on Parallel
Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023,
pages 285–294. ACM, 2023.
(doi:10.1145/3605573.3605599)
- Jiajun Huang, Kaiming
Ouyang, Yujia Zhai, Jinyang Liu,
Min Si, Ken Raffenetti, Hui
Zhou, Atsushi Hori, Zizhong Chen,
Yanfei Guo, and Rajeev Thakur.
Accelerating MPI
collectives with process-in-process-based multi-object techniques.
In Ali Raza Butt, Ningfang Mi, and
Kyle Chard, editors, Proceedings of the 32nd
International Symposium on High-Performance Parallel and Distributed
Computing, HPDC 2023, Orlando, FL, USA, June 16-23, 2023, pages
333–334. ACM, 2023.
(doi:10.1145/3588195.3595955)
- Jiajun Huang, Kaiming
Ouyang, Yujia Zhai, Jinyang Liu,
Min Si, Ken Raffenetti, Hui
Zhou, Atsushi Hori, Zizhong Chen,
Yanfei Guo, and Rajeev Thakur.
Pip-mcoll:
Process-in-process-based multi-object MPI collectives.
In IEEE International Conference on Cluster Computing, CLUSTER 2023,
Santa Fe, NM, USA, October 31 – Nov. 3, 2023, pages 354–364. IEEE,
2023.
(doi:10.1109/CLUSTER52292.2023.00037)
- Chen Wang, Yanfei Guo,
Pavan Balaji, and Marc Snir.
Near-lossless MPI tracing
and proxy application autogeneration.
IEEE Trans. Parallel Distributed Syst., 34(1):123–140, 2023.
(doi:10.1109/TPDS.2022.3215942)
- Hui Zhou, Ken Raffenetti,
Junchao Zhang, Yanfei Guo, and
Rajeev Thakur.
Frustrated with mpi+threads?
try mpixthreads!.
In Proceedings of the 30th European MPI Users’ Group Meeting, EuroMPI
2023, Bristol, United Kingdom, September 11-13, 2023, pages 2:1–2:10.
ACM, 2023.
(doi:10.1145/3615318.3615320)
- Michael Wilkins, Yanfei Guo,
Rajeev Thakur, Peter A. Dinda, and
Nikos Hardavellas.
Acclaim: Advancing
the practicality of MPI collective communication autotuning using machine
learning.
In IEEE International Conference on Cluster Computing, CLUSTER 2022,
Heidelberg, Germany, September 5-8, 2022, pages 161–171. IEEE,
2022.
(doi:10.1109/CLUSTER51413.2022.00030)
- Hui Zhou, Ken Raffenetti,
Yanfei Guo, and Rajeev Thakur.
MPIX stream: An explicit
solution to hybrid MPI+X programming.
In EuroMPI/USA’22: 29th European MPI Users’ Group Meeting, Chattanooga,
TN, USA, September 26 – 28, 2022, pages 1–10. ACM, 2022.
(doi:10.1145/3555819.3555820)
- Sayan Ghosh, Yanfei Guo,
Pavan Balaji, and Assefaw H. Gebremedhin.
RMACXX: an efficient
high-level C++ interface over MPI-3 RMA.
In Laurent Lefèvre, Stacy Patterson,
Young Choon Lee, Haiying Shen,
Shashikant Ilager, Mohammad Goudarzi,
Adel Nadjaran Toosi, and Rajkumar Buyya,
editors, 21st IEEE/ACM International Symposium on Cluster, Cloud and
Internet Computing, CCGrid 2021, Melbourne, Australia, May 10-13,
2021, pages 143–155. IEEE, 2021.
(doi:10.1109/CCGRID51090.2021.00024)
- William Gropp, Rajeev Thakur,
and Pavan Balaji.
Translational research in
the MPICH project.
J. Comput. Sci., 52:101203, 2021.
(doi:10.1016/J.JOCS.2020.101203)
- Kaiming Ouyang, Min Si,
Atsushi Hori, Zizhong Chen, and
Pavan Balaji.
Daps: A dynamic
asynchronous progress stealing model for MPI communication.
In IEEE International Conference on Cluster Computing, CLUSTER 2021,
Portland, OR, USA, September 7-10, 2021, pages 516–527. IEEE, 2021.
(doi:10.1109/CLUSTER48925.2021.00027)
- Min Si, Huansong Fu,
Jeff R. Hammond, and Pavan Balaji.
Openshmem over MPI as
a performance contender: Thorough analysis and optimizations.
In Stephen W. Poole, Oscar R. Hernandez,
Matthew B. Baker, and Tony Curtis, editors,
OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale
and Smart Networks – 8th Workshop on OpenSHMEM and Related Technologies,
OpenSHMEM 2021, Virtual Event, September 14-16, 2021, Revised Selected
Papers, volume 13159 of Lecture Notes in Computer
Science, pages 39–60. Springer, 2021.
(doi:10.1007/978-3-031-04888-3_3)
- Chen Wang, Pavan Balaji, and
Marc Snir.
Pilgrim: scalable and (near)
lossless MPI tracing.
In Bronis R. de Supinski, Mary W. Hall, and
Todd Gamblin, editors, International Conference for High
Performance Computing, Networking, Storage and Analysis, SC 2021, St.
Louis, Missouri, USA, November 14-19, 2021, page 52. ACM, 2021.
(doi:10.1145/3458817.3476151)
- Rohit Zambre, Damodar
Sahasrabudhe, Hui Zhou, Martin Berzins,
Aparna Chandramowlishwaran, and Pavan Balaji.
Logically parallel
communication for fast mpi+threads applications.
IEEE Trans. Parallel Distributed Syst., 32(12):3038–3052, 2021.
(doi:10.1109/TPDS.2021.3075157)
- Noah Evans, Jan Ciesko,
Stephen L. Olivier, Howard Pritchard,
Shintaro Iwasaki, Ken Raffenetti, and
Pavan Balaji.
Implementing flexible
threading support in open MPI.
In Workshop on Exascale MPI, ExaMPI@SC 2020, Atlanta, GA, USA, November
13, 2020, pages 21–30. IEEE, 2020.
(doi:10.1109/EXAMPI52011.2020.00008)
- Tao Gao, Yanfei Guo,
Boyu Zhang, Pietro Cicotti,
Yutong Lu, Pavan Balaji, and
Michela Taufer.
Memory-efficient and
skew-tolerant mapreduce over MPI for supercomputing systems.
IEEE Trans. Parallel Distributed Syst., 31(12):2734–2748, 2020.
(doi:10.1109/TPDS.2019.2932066)
- Kaiming Ouyang, Min Si,
Atsushi Hori, Zizhong Chen, and
Pavan Balaji.
CAB-MPI: exploring
interprocess work-stealing towards balanced MPI communication.
In Christine Cuicchi, Irene Qualters, and
William T. Kramer, editors, Proceedings of the
International Conference for High Performance Computing, Networking, Storage
and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November
9-19, 2020, page 36. IEEE/ACM, 2020.
(doi:10.1109/SC41405.2020.00040)
- Rohit Zambre, Aparna
Chandramowlishwaran, and Pavan Balaji.
How I learned to stop
worrying about user-visible endpoints and love MPI.
In Eduard Ayguadé, Wen-mei W. Hwu,
Rosa M. Badia, and H. Peter Hofstee, editors,
ICS ’20: 2020 International Conference on Supercomputing, Barcelona
Spain, June, 2020, pages 35:1–35:13. ACM, 2020.
(doi:10.1145/3392717.3392773)
- Rohit Zambre, Aparna
Chandramowlishwaran, and Pavan Balaji.
How I learned to stop worrying
about user-visible endpoints and love MPI.
CoRR, abs/2005.00263, 2020.
- Rohit Zambre, Aparna
Chandramowlishwaran, and Pavan Balaji.
Scalable communication endpoints for
mpi+threads applications.
CoRR, abs/2002.02509, 2020.
- Abdelhalim Amer, Charles
Archer, Michael Blocksome, Chongxiao Cao,
Michael Chuvelev, Hajime Fujita,
Maria Garzaran, Yanfei Guo,
Jeff R. Hammond, Shintaro Iwasaki,
Kenneth J. Raffenetti, Mikhail Shiryaev,
Min Si, Kenjiro Taura, Sagar
Thapaliya, and Pavan Balaji.
Software combining to
mitigate multithreaded MPI contention.
In Rudolf Eigenmann, Chen Ding, and
Sally A. McKee, editors, Proceedings of the ACM
International Conference on Supercomputing, ICS 2019, Phoenix, AZ, USA,
June 26-28, 2019, pages 367–379. ACM, 2019.
(doi:10.1145/3330345.3330378)
- Joshua Hoke Davis, Tao Gao,
Sunita Chandrasekaran, Heike Jagode,
Anthony Danalis, Jack J. Dongarra,
Pavan Balaji, and Michela Taufer.
Characterization of power usage and
performance in data-intensive applications using mapreduce over MPI.
In Ian T. Foster, Gerhard R. Joubert,
Ludek Kucera, Wolfgang E. Nagel, and
Frans J. Peters, editors, Parallel Computing: Technology
Trends, Proceedings of the International Conference on Parallel Computing,
PARCO 2019, Prague, Czech Republic, September 10-13, 2019, volume 36
of Advances in Parallel Computing, pages 287–298. IOS Press,
2019.
(doi:10.3233/APC200053)
- Abdelhalim Amer, Huiwei Lu,
Pavan Balaji, Milind Chabbi,
Yanjie Wei, Jeff R. Hammond, and
Satoshi Matsuoka.
Lock contention management in
multithreaded MPI.
ACM Trans. Parallel Comput., 5(3):12:1–12:21, 2018.
(doi:10.1145/3275443)
- Sudheer Chunduri, Scott
Parker, Pavan Balaji, Kevin Harms, and
Kalyan Kumaran.
Characterization of MPI
usage on a production supercomputer.
In Proceedings of the International Conference for High Performance
Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA,
November 11-16, 2018, pages 30:1–30:15. IEEE / ACM, 2018.
- Tao Gao, Yanfei Guo,
Boyu Zhang, Pietro Cicotti,
Yutong Lu, Pavan Balaji, and
Michela Taufer.
On the power of combiner
optimizations in mapreduce over MPI workflows.
In 24th IEEE International Conference on Parallel and Distributed
Systems, ICPADS 2018, Singapore, December 11-13, 2018, pages
441–448. IEEE, 2018.
(doi:10.1109/PADSW.2018.8644595)
- Min Si, Antonio J.
Peña, Jeff R. Hammond, Pavan Balaji,
Masamichi Takagi, and Yutaka Ishikawa.
Dynamic adaptable
asynchronous progress model for MPI RMA multiphase applications.
IEEE Trans. Parallel Distributed Syst., 29(9):1975–1989, 2018.
(doi:10.1109/TPDS.2018.2815568)
- Rohit Zambre, Aparna
Chandramowlishwaran, and Pavan Balaji.
Scalable communication
endpoints for mpi+threads applications.
In 24th IEEE International Conference on Parallel and Distributed
Systems, ICPADS 2018, Singapore, December 11-13, 2018, pages
803–812. IEEE, 2018.
(doi:10.1109/PADSW.2018.8645059)
- Hoang-Vu Dang, Sangmin Seo,
Abdelhalim Amer, and Pavan Balaji.
Advanced thread
synchronization for multithreaded MPI implementations.
In Proceedings of the 17th IEEE/ACM International Symposium on Cluster,
Cloud and Grid Computing, CCGRID 2017, Madrid, Spain, May 14-17,
2017, pages 314–324. IEEE Computer Society / ACM, 2017.
(doi:10.1109/CCGRID.2017.65)
- Yanfei Guo, Charles J.
Archer, Michael Blocksome, Scott Parker,
Wesley Bland, Ken Raffenetti, and
Pavan Balaji.
Memory compression techniques
for network address management in MPI.
In 2017 IEEE International Parallel and Distributed Processing
Symposium, IPDPS 2017, Orlando, FL, USA, May 29 – June 2, 2017,
pages 1008–1017. IEEE Computer Society, 2017.
(doi:10.1109/IPDPS.2017.18)
- Robert Latham, Leonardo
Bautista-Gomez, and Pavan Balaji.
Portable topology-aware
MPI-I/O.
In 23rd IEEE International Conference on Parallel and Distributed
Systems, ICPADS 2017, Shenzhen, China, December 15-17, 2017, pages
710–719. IEEE Computer Society, 2017.
(doi:10.1109/ICPADS.2017.00096)
- Seyed Hessam Mirsadeghi,
Jesper Larsson Träff, Pavan Balaji, and
Ahmad Afsahi.
Exploiting common
neighborhoods to optimize MPI neighborhood collectives.
In 24th IEEE International Conference on High Performance Computing,
HiPC 2017, Jaipur, India, December 18-21, 2017, pages 348–357. IEEE
Computer Society, 2017.
(doi:10.1109/HIPC.2017.00047)
- Antonio J. Peña, Pavan
Balaji, William Gropp, and Rajeev Thakur,
editors.
Proceedings of the
24th European MPI Users’ Group Meeting, EuroMPI/USA 2017, Chicago, IL, USA,
September 25-28, 2017. ACM, 2017.
- Ken Raffenetti, Abdelhalim
Amer, Lena Oden, Charles Archer,
Wesley Bland, Hajime Fujita,
Yanfei Guo, Tomislav Janjusic,
Dmitry Durnov, Michael Blocksome,
Min Si, Sangmin Seo, Akhil
Langer, Gengbin Zheng, Masamichi Takagi,
Paul K. Coffman, Jithin Jose,
Sayantan Sur, Alexander Sannikov,
Sergey Oblomov, Michael Chuvelev,
Masayuki Hatanaka, Xin Zhao,
Paul F. Fischer, Thilina Rathnayake,
Matthew Otten, Misun Min, and
Pavan Balaji.
Why is MPI so slow?:
analyzing the fundamental limits in implementing MPI-3.1.
In Bernd Mohr and Padma Raghavan, editors,
Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA,
November 12 – 17, 2017, page 62. ACM, 2017.
(doi:10.1145/3126908.3126963)
- Min Si and Pavan Balaji.
Process-based
asynchronous progress model for MPI point-to-point communication.
In 19th IEEE International Conference on High Performance Computing and
Communications; 15th IEEE International Conference on Smart City; 3rd
IEEE International Conference on Data Science and Systems,
HPCC/SmartCity/DSS 2017, Bangkok, Thailand, December 18-20, 2017,
pages 206–214. IEEE Computer Society, 2017.
(doi:10.1109/HPCC-SMARTCITY-DSS.2017.27)
- Ashwin M. Aji, Lokendra S.
Panwar, Feng Ji, Karthik Murthy,
Milind Chabbi, Pavan Balaji,
Keith R. Bisset, James Dinan,
Wu-chun Feng, John M. Mellor-Crummey,
Xiaosong Ma, and Rajeev Thakur.
MPI-ACC:
accelerator-aware MPI for scientific applications.
IEEE Trans. Parallel Distributed Syst., 27(5):1401–1414, 2016.
(doi:10.1109/TPDS.2015.2446479)
- James Dinan, Pavan Balaji,
Darius Buntinas, David Goodell,
William Gropp, and Rajeev Thakur.
An implementation and evaluation of
the MPI 3.0 one-sided communication interface.
Concurr. Comput. Pract. Exp., 28(17):4385–4404, 2016.
(doi:10.1002/CPE.3758)
- Sayan Ghosh, Jeff R. Hammond,
Antonio J. Peña, Pavan Balaji,
Assefaw Hadish Gebremedhin, and Barbara M.
Chapman.
One-sided interface for matrix
operations using MPI-3 RMA: A case study with elemental.
In 45th International Conference on Parallel Processing, ICPP 2016,
Philadelphia, PA, USA, August 16-19, 2016, pages 185–194. IEEE
Computer Society, 2016.
(doi:10.1109/ICPP.2016.28)
- Jichi Guo, Qing Yi,
Jiayuan Meng, Junchao Zhang, and
Pavan Balaji.
Compiler-assisted overlapping
of communication and computation in MPI applications.
In 2016 IEEE International Conference on Cluster Computing, CLUSTER
2016, Taipei, Taiwan, September 12-16, 2016, pages 60–69. IEEE
Computer Society, 2016.
(doi:10.1109/CLUSTER.2016.62)
- Xin Zhao, Pavan Balaji, and
William Gropp.
Scalability challenges in
current MPI one-sided implementations.
In Riqing Chen, Chunming Rong, and
Dan Grigoras, editors, 15th International Symposium on
Parallel and Distributed Computing, ISPDC 2016, Fuzhou, China, July 8-10,
2016, pages 38–47. IEEE Computer Society, 2016.
(doi:10.1109/ISPDC.2016.14)
- Abdelhalim Amer, Huiwei Lu,
Pavan Balaji, and Satoshi Matsuoka.
Characterizing MPI and
hybrid mpi+threads applications at scale: Case study with BFS.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
1075–1083. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.93)
- Abdelhalim Amer, Huiwei Lu,
Yanjie Wei, Pavan Balaji, and
Satoshi Matsuoka.
Mpi+threads: runtime
contention and remedies.
In Albert Cohen and David Grove, editors,
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA,
February 7-11, 2015, pages 239–248. ACM, 2015.
(doi:10.1145/2688500.2688522)
- Wesley Bland, Huiwei Lu,
Sangmin Seo, and Pavan Balaji.
Lessons learned implementing
user-level failure mitigation in MPICH.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
1123–1126. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.51)
- Yanfei Guo, Wesley Bland,
Pavan Balaji, and Xiaobo Zhou.
Fault tolerant mapreduce-mpi
for HPC clusters.
In Jackie Kern and Jeffrey S. Vetter, editors,
Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA,
November 15-20, 2015, pages 34:1–34:12. ACM, 2015.
(doi:10.1145/2807591.2807617)
- Torsten Hoefler, James Dinan,
Rajeev Thakur, Brian Barrett,
Pavan Balaji, William Gropp, and
Keith D. Underwood.
Remote memory access programming in
MPI-3.
ACM Trans. Parallel Comput., 2(2):9:1–9:26, 2015.
(doi:10.1145/2780584)
- Huiwei Lu, Sangmin Seo, and
Pavan Balaji.
MPI+ULT: overlapping
communication and computation with user-level threads.
In 17th IEEE International Conference on High Performance Computing and
Communications, HPCC 2015, 7th IEEE International Symposium on Cyberspace
Safety and Security, CSS 2015, and 12th IEEE International Conference on
Embedded Software and Systems, ICESS 2015, New York, NY, USA, August 24-26,
2015, pages 444–454. IEEE, 2015.
(doi:10.1109/HPCC-CSS-ICESS.2015.82)
- Ken Raffenetti, Antonio J.
Peña, and Pavan Balaji.
Toward implementing robust
support for portals 4 networks in MPICH.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
1173–1176. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.79)
- Sangmin Seo, Robert Latham,
Junchao Zhang, and Pavan Balaji.
Implementation and evaluation
of MPI nonblocking collective I/O.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
1084–1091. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.81)
- Min Si, Antonio J.
Peña, Jeff R. Hammond, Pavan Balaji, and
Yutaka Ishikawa.
Scaling nwchem with efficient
and portable asynchronous communication in MPI RMA.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
811–816. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.48)
- Min Si, Antonio J.
Peña, Jeff R. Hammond, Pavan Balaji,
Masamichi Takagi, and Yutaka Ishikawa.
Casper: An asynchronous
progress model for MPI RMA on many-core architectures.
In 2015 IEEE International Parallel and Distributed Processing
Symposium, IPDPS 2015, Hyderabad, India, May 25-29, 2015, pages
665–676. IEEE Computer Society, 2015.
(doi:10.1109/IPDPS.2015.35)
- Karthikeyan Vaidyanathan,
Dhiraj D. Kalamkar, Kiran Pamnany,
Jeff R. Hammond, Pavan Balaji,
Dipankar Das, Jongsoo Park, and
Bálint Joó.
Improving concurrency and
asynchrony in multithreaded MPI applications using software offloading.
In Jackie Kern and Jeffrey S. Vetter, editors,
Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA,
November 15-20, 2015, pages 30:1–30:12. ACM, 2015.
(doi:10.1145/2807591.2807602)
- Xin Zhao, Pavan Balaji, and
William Gropp.
Runtime support for irregular
computation in mpi-based applications.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
701–704. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.82)
- Xiaomin Zhu, Junchao Zhang,
Kazutomo Yoshii, Shigang Li,
Yunquan Zhang, and Pavan Balaji.
Analyzing MPI-3.0
process-level shared memory: A case study with stencil computations.
In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, CCGrid 2015, Shenzhen, China, May 4-7, 2015, pages
1099–1106. IEEE Computer Society, 2015.
(doi:10.1109/CCGRID.2015.131)
- Wesley Bland, Kenneth
Raffenetti, and Pavan Balaji.
Simplifying the recovery model
of user-level failure mitigation.
In Proceedings of the 2014 Workshop on Exascale MPI, ExaMPI ’14, New
Orleans, Louisiana, USA, November 16-21, 2014, pages 20–25. IEEE,
2014.
(doi:10.1109/EXAMPI.2014.4)
- Zhezhe Chen, James Dinan,
Zhen Tang, Pavan Balaji, Hua
Zhong, Jun Wei, Tao Huang, and
Feng Qin.
Mc-checker: Detecting memory
consistency errors in MPI one-sided applications.
In Trish Damkroger and Jack J. Dongarra, editors,
International Conference for High Performance Computing, Networking,
Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21,
2014, pages 499–510. IEEE Computer Society, 2014.
(doi:10.1109/SC.2014.46)
- James Dinan, Ryan E. Grant,
Pavan Balaji, David Goodell,
Douglas Miller, Marc Snir, and
Rajeev Thakur.
Enabling communication
concurrency through flexible MPI endpoints.
Int. J. High Perform. Comput. Appl., 28(4):390–405, 2014.
(doi:10.1177/1094342014548772)
- John Jenkins, James Dinan,
Pavan Balaji, Tom Peterka,
Nagiza F. Samatova, and Rajeev Thakur.
Processing MPI derived
datatypes on noncontiguous gpu-resident data.
IEEE Trans. Parallel Distributed Syst., 25(10):2627–2637, 2014.
(doi:10.1109/TPDS.2013.234)
- Min Si, Antonio J.
Peña, Pavan Balaji, Masamichi Takagi,
and Yutaka Ishikawa.
MT-MPI: multithreaded MPI
for many-core environments.
In Arndt Bode, Michael Gerndt, Per
Stenström, Lawrence Rauchwerger,
Barton P. Miller, and Martin Schulz, editors,
2014 International Conference on Supercomputing, ICS’14, Muenchen,
Germany, June 10-13, 2014, pages 125–134. ACM, 2014.
(doi:10.1145/2597652.2597658)
- Chaoran Yang, Wesley Bland,
John M. Mellor-Crummey, and Pavan Balaji.
Portable, mpi-interoperable
coarray fortran.
In José E. Moreira and James R. Larus,
editors, ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming, PPoPP ’14, Orlando, FL, USA, February 15-19,
2014, pages 81–92. ACM, 2014.
(doi:10.1145/2555243.2555270)
- Junchao Zhang, Bill Long,
Kenneth Raffenetti, and Pavan Balaji.
Implementing the MPI-3.0
fortran 2008 binding.
In Jack J. Dongarra, Yutaka Ishikawa, and
Atsushi Hori, editors, 21st European MPI Users’ Group
Meeting, EuroMPI/ASIA ’14, Kyoto, Japan – September 09 – 12, 2014,
page 1. ACM, 2014.
(doi:10.1145/2642769.2642777)
- Judicael A. Zounmevo, Xin
Zhao, Pavan Balaji, William Gropp, and
Ahmad Afsahi.
Nonblocking epochs in MPI
one-sided communication.
In Trish Damkroger and Jack J. Dongarra, editors,
International Conference for High Performance Computing, Networking,
Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21,
2014, pages 475–486. IEEE Computer Society, 2014.
(doi:10.1109/SC.2014.44)
- Ashwin M. Aji, Pavan Balaji,
James Dinan, Wu-chun Feng, and
Rajeev Thakur.
Synchronization and ordering
semantics in hybrid MPI+GPU programming.
In 2013 IEEE International Symposium on Parallel & Distributed
Processing, Workshops and Phd Forum, Cambridge, MA, USA, May 20-24,
2013, pages 1020–1029. IEEE, 2013.
(doi:10.1109/IPDPSW.2013.256)
- Ashwin M. Aji, Lokendra S.
Panwar, Feng Ji, Milind Chabbi,
Karthik Murthy, Pavan Balaji,
Keith R. Bisset, James Dinan,
Wu-chun Feng, John M. Mellor-Crummey,
Xiaosong Ma, and Rajeev Thakur.
On the efficacy of
gpu-integrated MPI for scientific applications.
In Manish Parashar, Jon B. Weissman,
Dick H. J. Epema, and Renato J. O. Figueiredo,
editors, The 22nd International Symposium on High-Performance Parallel
and Distributed Computing, HPDC’13, New York, NY, USA – June 17 – 21,
2013, pages 191–202. ACM, 2013.
- Pavan Balaji and Dries Kimpe.
On the reproducibility
of MPI reduction operations.
In 10th IEEE International Conference on High Performance Computing and
Communications & 2013 IEEE International Conference on Embedded and
Ubiquitous Computing, HPCC/EUC 2013, Zhangjiajie, China, November 13-15,
2013, pages 407–414. IEEE, 2013.
(doi:10.1109/HPCC.AND.EUC.2013.65)
- James Dinan, Pavan Balaji,
David Goodell, Douglas Miller,
Marc Snir, and Rajeev Thakur.
Enabling MPI
interoperability through flexible communication endpoints.
In Jack J. Dongarra, Javier García Blas,
and Jesús Carretero, editors, 20th European MPI
Users’s Group Meeting, EuroMPI ’13, Madrid, Spain – September 15 – 18,
2013, pages 13–18. ACM, 2013.
(doi:10.1145/2488551.2488553)
- Md. Ziaul Haque, Qing Yi,
James Dinan, and Pavan Balaji.
Enhancing performance
portability of MPI applications through annotation-based
transformations.
In 42nd International Conference on Parallel Processing, ICPP 2013,
Lyon, France, October 1-4, 2013, pages 631–640. IEEE Computer
Society, 2013.
(doi:10.1109/ICPP.2013.77)
- Torsten Hoefler, James Dinan,
Darius Buntinas, Pavan Balaji,
Brian Barrett, Ron Brightwell,
William Gropp, Vivek Kale, and
Rajeev Thakur.
MPI + MPI: a new hybrid
approach to parallel programming with MPI plus shared memory.
Computing, 95(12):1121–1136, 2013.
(doi:10.1007/S00607-013-0324-2)
- Antonio J. Peña, Ralf
G. Correa Carvalho, James Dinan, Pavan Balaji,
Rajeev Thakur, and William Gropp.
Analysis of
topology-dependent MPI performance on gemini networks.
In Jack J. Dongarra, Javier García Blas,
and Jesús Carretero, editors, 20th European MPI
Users’s Group Meeting, EuroMPI ’13, Madrid, Spain – September 15 – 18,
2013, pages 61–66. ACM, 2013.
(doi:10.1145/2488551.2488564)
- Xin Zhao, Pavan Balaji,
William Gropp, and Rajeev Thakur.
Mpi-interoperable generalized
active messages.
In 19th IEEE International Conference on Parallel and Distributed
Systems, ICPADS 2013, Seoul, Korea, December 15-18, 2013, pages
200–207. IEEE Computer Society, 2013.
(doi:10.1109/ICPADS.2013.38)
- Xin Zhao, Pavan Balaji,
William Gropp, and Rajeev Thakur.
Optimization strategies for
mpi-interoperable active messages.
In IEEE 11th International Conference on Dependable, Autonomic and
Secure Computing, DASC 2013, Chengdu, China, December 21-22, 2013,
pages 508–515. IEEE Computer Society, 2013.
(doi:10.1109/DASC.2013.116)
- Xin Zhao, Darius Buntinas,
Judicael A. Zounmevo, James Dinan,
David Goodell, Pavan Balaji,
Rajeev Thakur, Ahmad Afsahi, and
William Gropp.
Toward asynchronous and
mpi-interoperable active messages.
In 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid
Computing, CCGrid 2013, Delft, Netherlands, May 13-16, 2013, pages
87–94. IEEE Computer Society, 2013.
(doi:10.1109/CCGRID.2013.84)
- Ashwin M. Aji, James Dinan,
Darius Buntinas, Pavan Balaji,
Wu-chun Feng, Keith R. Bisset, and
Rajeev Thakur.
MPI-ACC: an integrated and
extensible approach to data movement in accelerator-based systems.
In Geyong Min, Jia Hu, Lei (Chris)
Liu, Laurence Tianruo Yang, Seetharami Seelam,
and Laurent Lefèvre, editors, 14th IEEE
International Conference on High Performance Computing and Communication &
9th IEEE International Conference on Embedded Software and Systems,
HPCC-ICESS 2012, Liverpool, United Kingdom, June 25-27, 2012, pages
647–654. IEEE Computer Society, 2012.
(doi:10.1109/HPCC.2012.92)
- James Dinan, Pavan Balaji,
Jeff R. Hammond, Sriram Krishnamoorthy, and
Vinod Tipparaju.
Supporting the global arrays
PGAS model using MPI one-sided communication.
In 26th IEEE International Parallel and Distributed Processing
Symposium, IPDPS 2012, Shanghai, China, May 21-25, 2012, pages
739–750. IEEE Computer Society, 2012.
(doi:10.1109/IPDPS.2012.72)
- James Dinan, David Goodell,
William Gropp, Rajeev Thakur, and
Pavan Balaji.
Efficient multithreaded
context ID allocation in MPI.
In Jesper Larsson Träff, Siegfried Benkner,
and Jack J. Dongarra, editors, Recent Advances in the
Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI
2012, Vienna, Austria, September 23-26, 2012. Proceedings, volume 7490
of Lecture Notes in Computer Science, pages 57–66. Springer,
2012.
(doi:10.1007/978-3-642-33518-1_11)
- William Gropp, Ewing L. Lusk,
and Rajeev Thakur.
Advanced MPI including
new MPI-3 features.
In Jesper Larsson Träff, Siegfried Benkner,
and Jack J. Dongarra, editors, Recent Advances in the
Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI
2012, Vienna, Austria, September 23-26, 2012. Proceedings, volume 7490
of Lecture Notes in Computer Science, page 14. Springer, 2012.
(doi:10.1007/978-3-642-33518-1_5)
- Torsten Hoefler, James Dinan,
Darius Buntinas, Pavan Balaji,
Brian W. Barrett, Ron Brightwell,
William Gropp, Vivek Kale, and
Rajeev Thakur.
Leveraging mpi’s
one-sided communication interface for shared-memory programming.
In Jesper Larsson Träff, Siegfried Benkner,
and Jack J. Dongarra, editors, Recent Advances in the
Message Passing Interface – 19th European MPI Users’ Group Meeting, EuroMPI
2012, Vienna, Austria, September 23-26, 2012. Proceedings, volume 7490
of Lecture Notes in Computer Science, pages 132–141. Springer,
2012.
(doi:10.1007/978-3-642-33518-1_18)
- John Jenkins, James Dinan,
Pavan Balaji, Nagiza F. Samatova, and
Rajeev Thakur.
Enabling fast, noncontiguous
GPU data movement in hybrid MPI+GPU environments.
In 2012 IEEE International Conference on Cluster Computing, CLUSTER
2012, Beijing, China, September 24-28, 2012, pages 468–476. IEEE
Computer Society, 2012.
(doi:10.1109/CLUSTER.2012.72)
- Pavan Balaji, Darius
Buntinas, David Goodell, William Gropp,
Torsten Hoefler, Sameer Kumar,
Ewing L. Lusk, Rajeev Thakur, and
Jesper Larsson Träff.
Mpi on millions of
cores.
Parallel Process. Lett., 21(1):45–60, 2011.
(doi:10.1142/S0129626411000060)
- James Dinan, Pavan Balaji,
Jeff R. Hammond, Sriram Krishnamoorthy, and
Vinod Tipparaju.
Poster: High-level, one-sided
programming models on MPI: a case study with global arrays and nwchem.
In Scott A. Lathrop, Jim Costa, and
William Kramer, editors, Conference on High Performance
Computing Networking, Storage and Analysis – Companion Volume, SC 2011,
Seattle, WA, USA, November 12-18, 2011, pages 37–38. ACM, 2011.
(doi:10.1145/2148600.2148620)
- James Dinan, Sriram
Krishnamoorthy, Pavan Balaji, Jeff R. Hammond,
Manojkumar Krishnan, Vinod Tipparaju, and
Abhinav Vishnu.
Noncollective
communicator creation in MPI.
In Yiannis Cotronis, Anthony Danalis,
Dimitrios S. Nikolopoulos, and Jack J.
Dongarra, editors, Recent Advances in the Message Passing Interface –
18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece,
September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes
in Computer Science, pages 282–291. Springer, 2011.
(doi:10.1007/978-3-642-24449-0_32)
- David Goodell, William Gropp,
Xin Zhao, and Rajeev Thakur.
Scalable memory use in
MPI: A case study with MPICH2.
In Yiannis Cotronis, Anthony Danalis,
Dimitrios S. Nikolopoulos, and Jack J.
Dongarra, editors, Recent Advances in the Message Passing Interface –
18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece,
September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes
in Computer Science, pages 140–149. Springer, 2011.
(doi:10.1007/978-3-642-24449-0_17)
- Ganesh Gopalakrishnan,
Robert M. Kirby, Stephen F. Siegel,
Rajeev Thakur, William Gropp,
Ewing L. Lusk, Bronis R. de Supinski,
Martin Schulz, and Greg Bronevetsky.
Formal analysis of mpi-based
parallel programs.
Commun. ACM, 54(12):82–91, 2011.
(doi:10.1145/2043174.2043194)
- William Gropp, Torsten
Hoefler, Rajeev Thakur, and Jesper Larsson
Träff.
Performance
expectations and guidelines for MPI derived datatypes.
In Yiannis Cotronis, Anthony Danalis,
Dimitrios S. Nikolopoulos, and Jack J.
Dongarra, editors, Recent Advances in the Message Passing Interface –
18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece,
September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes
in Computer Science, pages 150–159. Springer, 2011.
(doi:10.1007/978-3-642-24449-0_18)
- Torsten Hoefler, Rolf
Rabenseifner, Hubert Ritzdorf, Bronis R.
de Supinski, Rajeev Thakur, and Jesper Larsson
Träff.
The scalable process topology
interface of MPI 2.2.
Concurr. Comput. Pract. Exp., 23(4):293–310, 2011.
(doi:10.1002/CPE.1643)
- Mohammad J. Rashti, Jonathan
Green, Pavan Balaji, Ahmad Afsahi, and
William Gropp.
Multi-core and network
aware MPI topology functions.
In Yiannis Cotronis, Anthony Danalis,
Dimitrios S. Nikolopoulos, and Jack J.
Dongarra, editors, Recent Advances in the Message Passing Interface –
18th European MPI Users’ Group Meeting, EuroMPI 2011, Santorini, Greece,
September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes
in Computer Science, pages 50–60. Springer, 2011.
(doi:10.1007/978-3-642-24449-0_8)
- Rui Wang, Erlin Yao,
Mingyu Chen, Guangming Tan, Pavan
Balaji, and Darius Buntinas.
Building algorithmically
nonstop fault tolerant MPI programs.
In 18th International Conference on High Performance Computing, HiPC
2011, Bengaluru, India, December 18-21, 2011, pages 1–9. IEEE
Computer Society, 2011.
(doi:10.1109/HIPC.2011.6152716)
- Pavan Balaji, Darius
Buntinas, David Goodell, William Gropp, and
Rajeev Thakur.
Fine-grained multithreading
support for hybrid threaded MPI programming.
Int. J. High Perform. Comput. Appl., 24(1):49–57, 2010.
(doi:10.1177/1094342009360206)
- Pavan Balaji, Anthony Chan,
William Gropp, Rajeev Thakur, and
Ewing L. Lusk.
The importance of
non-data-communication overheads in MPI.
Int. J. High Perform. Comput. Appl., 24(1):5–15, 2010.
(doi:10.1177/1094342009359258)
- James Dinan, Pavan Balaji,
Ewing L. Lusk, P. Sadayappan, and
Rajeev Thakur.
Hybrid parallel programming
with MPI and unified parallel C.
In Nancy M. Amato, Hubertus Franke, and
Paul H. J. Kelly, editors, Proceedings of the 7th
Conference on Computing Frontiers, 2010, Bertinoro, Italy, May 17-19,
2010, pages 177–186. ACM, 2010.
(doi:10.1145/1787275.1787323)
- Gábor Dózsa,
Sameer Kumar, Pavan Balaji,
Darius Buntinas, David Goodell,
William Gropp, Joe Ratterman, and
Rajeev Thakur.
Enabling concurrent
multithreaded MPI communication on multicore petascale systems.
In Rainer Keller, Edgar Gabriel,
Michael M. Resch, and Jack J. Dongarra,
editors, Recent Advances in the Message Passing Interface – 17th
European MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany,
September 12-15, 2010. Proceedings, volume 6305 of Lecture Notes
in Computer Science, pages 11–20. Springer, 2010.
(doi:10.1007/978-3-642-15646-5_2)
- David Goodell, Pavan Balaji,
Darius Buntinas, Gábor Dózsa,
William Gropp, Sameer Kumar,
Bronis R. de Supinski, and Rajeev Thakur.
Minimizing MPI resource
contention in multithreaded multicore environments.
In Proceedings of the 2010 IEEE International Conference on Cluster
Computing, Heraklion, Crete, Greece, 20-24 September, 2010, pages
1–8. IEEE Computer Society, 2010.
(doi:10.1109/CLUSTER.2010.11)
- Torsten Hoefler, William
Gropp, Rajeev Thakur, and Jesper Larsson
Träff.
Toward performance
models of MPI implementations for understanding application scaling
issues.
In Rainer Keller, Edgar Gabriel,
Michael M. Resch, and Jack J. Dongarra,
editors, Recent Advances in the Message Passing Interface – 17th
European MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany,
September 12-15, 2010. Proceedings, volume 6305 of Lecture Notes
in Computer Science, pages 21–30. Springer, 2010.
(doi:10.1007/978-3-642-15646-5_3)
- Jayesh Krishna, Pavan Balaji,
Ewing L. Lusk, Rajeev Thakur, and
Fabian Tiller.
Implementing MPI on
windows: Comparison with common approaches on unix.
In Rainer Keller, Edgar Gabriel,
Michael M. Resch, and Jack J. Dongarra,
editors, Recent Advances in the Message Passing Interface – 17th
European MPI Users’ Group Meeting, EuroMPI 2010, Stuttgart, Germany,
September 12-15, 2010. Proceedings, volume 6305 of Lecture Notes
in Computer Science, pages 160–169. Springer, 2010.
(doi:10.1007/978-3-642-15646-5_17)
- Salman Pervez, Ganesh
Gopalakrishnan, Robert M. Kirby, Rajeev
Thakur, and William Gropp.
Formal methods applied to
high-performance computing software design: a case study of MPI one-sided
communication-based locking.
Softw. Pract. Exp., 40(1):23–43, 2010.
(doi:10.1002/SPE.946)
- Jesper Larsson Träff,
William D. Gropp, and Rajeev Thakur.
Self-consistent MPI
performance guidelines.
IEEE Trans. Parallel Distributed Syst., 21(5):698–709, 2010.
(doi:10.1109/TPDS.2009.120)
- Sriram Aananthakrishnan,
Michael Delisi, Sarvani S. Vakkalanka,
Anh Vo, Ganesh Gopalakrishnan,
Robert M. Kirby, and Rajeev Thakur.
How formal dynamic
verification tools facilitate novel concurrency visualizations.
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages
261–270. Springer, 2009.
(doi:10.1007/978-3-642-03770-2_32)
- Pavan Balaji, Darius
Buntinas, David Goodell, William Gropp,
Sameer Kumar, Ewing L. Lusk,
Rajeev Thakur, and Jesper Larsson Träff.
MPI on a million
processors.
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages 20–30.
Springer, 2009.
(doi:10.1007/978-3-642-03770-2_9)
- Pavan Balaji, Anthony Chan,
Rajeev Thakur, William Gropp, and
Ewing L. Lusk.
Toward message passing for
a million processes: characterizing MPI on a massive scale blue gene/p.
Comput. Sci. Res. Dev., 24(1-2):11–19, 2009.
(doi:10.1007/S00450-009-0095-3)
- Robert B. Ross, Robert
Latham, William Gropp, Ewing L. Lusk, and
Rajeev Thakur.
Processing MPI
datatypes outside MPI.
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages 42–53.
Springer, 2009.
(doi:10.1007/978-3-642-03770-2_11)
- Saba Sehrish, Jun Wang, and
Rajeev Thakur.
Conflict detection
algorithm to minimize locking for MPI-IO atomicity.
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages
143–153. Springer, 2009.
(doi:10.1007/978-3-642-03770-2_21)
- Rajeev Thakur and William
Gropp.
Test suite for evaluating
performance of multithreaded MPI communication.
Parallel Comput., 35(12):608–617, 2009.
(doi:10.1016/J.PARCO.2008.12.013)
- Vinod Tipparaju, William
Gropp, Hubert Ritzdorf, Rajeev Thakur, and
Jesper Larsson Träff.
Investigating high performance
RMA interfaces for the MPI-3 standard.
In ICPP 2009, International Conference on Parallel Processing, Vienna,
Austria, 22-25 September 2009, pages 293–300. IEEE Computer
Society, 2009.
(doi:10.1109/ICPP.2009.54)
- Sarvani S. Vakkalanka,
Grzegorz Szubzda, Anh Vo, Ganesh
Gopalakrishnan, Robert M. Kirby, and Rajeev
Thakur.
Static-analysis
assisted dynamic verification of MPI waitany programs (poster
abstract).
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages
329–330. Springer, 2009.
(doi:10.1007/978-3-642-03770-2_43)
- Anh Vo, Sarvani S.
Vakkalanka, Michael Delisi, Ganesh
Gopalakrishnan, Robert M. Kirby, and Rajeev
Thakur.
Formal verification of
practical MPI programs.
In Daniel A. Reed and Vivek Sarkar, editors,
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPOPP 2009, Raleigh, NC, USA, February
14-18, 2009, pages 261–270. ACM, 2009.
(doi:10.1145/1504176.1504214)
- Anh Vo, Sarvani S.
Vakkalanka, Jason Williams, Ganesh
Gopalakrishnan, Robert M. Kirby, and Rajeev
Thakur.
Sound and efficient
dynamic verification of MPI programs with probe non-determinism.
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages
271–281. Springer, 2009.
(doi:10.1007/978-3-642-03770-2_33)
- Hao Zhu, David Goodell,
William Gropp, and Rajeev Thakur.
Hierarchical
collectives in MPICH2.
In Matti Ropo, Jan Westerholm, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’
Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings,
volume 5759 of Lecture Notes in Computer Science, pages
325–326. Springer, 2009.
(doi:10.1007/978-3-642-03770-2_41)
- Pavan Balaji, Darius
Buntinas, David Goodell, William Gropp, and
Rajeev Thakur.
Toward efficient
support for multithreaded MPI communication.
In Alexey L. Lastovetsky, M. Tahar Kechadi, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’
Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings,
volume 5205 of Lecture Notes in Computer Science, pages
120–129. Springer, 2008.
(doi:10.1007/978-3-540-87475-1_20)
- Pavan Balaji, Anthony Chan,
William Gropp, Rajeev Thakur, and
Ewing L. Lusk.
Non-data-communication
overheads in MPI: analysis on blue gene/p.
In Alexey L. Lastovetsky, M. Tahar Kechadi, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’
Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings,
volume 5205 of Lecture Notes in Computer Science, pages 13–22.
Springer, 2008.
(doi:10.1007/978-3-540-87475-1_9)
- Pavan Balaji, Wu-chun
Feng, Jeremy S. Archuleta, Heshan Lin,
Rajkumar Kettimuthu, Rajeev Thakur, and
Xiaosong Ma.
Semantics-based distributed
I/O for mpiblast.
In Siddhartha Chatterjee and Michael L. Scott,
editors, Proceedings of the 13th ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, PPOPP 2008, Salt Lake
City, UT, USA, February 20-23, 2008, pages 293–294. ACM, 2008.
(doi:10.1145/1345206.1345262)
- Surendra Byna, Yong Chen,
Xian-He Sun, Rajeev Thakur, and
William Gropp.
Parallel I/O prefetching
using MPI file caching and I/O signatures.
In Proceedings of the ACM/IEEE Conference on High Performance
Computing, SC 2008, November 15-21, 2008, Austin, Texas, USA,
page 44. IEEE/ACM, 2008.
(doi:10.1109/SC.2008.5213604)
- William D. Gropp, Dries
Kimpe, Robert B. Ross, Rajeev Thakur, and
Jesper Larsson Träff.
Self-consistent
MPI-IO performance requirements and expectations.
In Alexey L. Lastovetsky, M. Tahar Kechadi, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’
Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings,
volume 5205 of Lecture Notes in Computer Science, pages
167–176. Springer, 2008.
(doi:10.1007/978-3-540-87475-1_25)
- Subodh Sharma, Sarvani S.
Vakkalanka, Ganesh Gopalakrishnan, Robert M.
Kirby, Rajeev Thakur, and William Gropp.
A formal approach to
detect functionally irrelevant barriers in MPI programs.
In Alexey L. Lastovetsky, M. Tahar Kechadi, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’
Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings,
volume 5205 of Lecture Notes in Computer Science, pages
265–273. Springer, 2008.
(doi:10.1007/978-3-540-87475-1_36)
- Jesper Larsson Träff,
Andreas Ripke, Christian Siebert,
Pavan Balaji, Rajeev Thakur, and
William Gropp.
A simple, pipelined
algorithm for large, irregular all-gather problems.
In Alexey L. Lastovetsky, M. Tahar Kechadi, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’
Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings,
volume 5205 of Lecture Notes in Computer Science, pages 84–93.
Springer, 2008.
(doi:10.1007/978-3-540-87475-1_16)
- Sarvani S. Vakkalanka, Michael
Delisi, Ganesh Gopalakrishnan, Robert M.
Kirby, Rajeev Thakur, and William Gropp.
Implementing efficient
dynamic formal verification methods for MPI programs.
In Alexey L. Lastovetsky, M. Tahar Kechadi, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users’
Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings,
volume 5205 of Lecture Notes in Computer Science, pages
248–256. Springer, 2008.
(doi:10.1007/978-3-540-87475-1_34)
- Pavan Balaji, Darius
Buntinas, Satish Balay, Barry F. Smith,
Rajeev Thakur, and William Gropp.
Nonuniformly communicating
noncontiguous data: A case study with petsc and MPI.
In 21th International Parallel and Distributed Processing Symposium
(IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California,
USA, pages 1–10. IEEE, 2007.
(doi:10.1109/IPDPS.2007.370223)
- William Gropp and Rajeev
Thakur.
Thread-safety in an MPI
implementation: Requirements and analysis.
Parallel Comput., 33(9):595–604, 2007.
(doi:10.1016/J.PARCO.2007.07.002)
- William D. Gropp and Rajeev
Thakur.
Revealing the
performance of MPI RMA implementations.
In Franck Cappello, Thomas Hérault, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s
Group Meeting, Paris, France, September 30 – October 3, 2007,
Proceedings, volume 4757 of Lecture Notes in Computer
Science, pages 272–280. Springer, 2007.
(doi:10.1007/978-3-540-75416-9_38)
- Robert Latham, William
Gropp, Robert B. Ross, and Rajeev Thakur.
Extending the MPI-2
generalized request interface.
In Franck Cappello, Thomas Hérault, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s
Group Meeting, Paris, France, September 30 – October 3, 2007,
Proceedings, volume 4757 of Lecture Notes in Computer
Science, pages 223–232. Springer, 2007.
(doi:10.1007/978-3-540-75416-9_33)
- Robert Latham, Robert B.
Ross, and Rajeev Thakur.
Implementing MPI-IO atomic
mode and shared file pointers using MPI one-sided communication.
Int. J. High Perform. Comput. Appl., 21(2):132–143, 2007.
(doi:10.1177/1094342007077859)
- Salman Pervez, Ganesh
Gopalakrishnan, Robert M. Kirby, Robert
Palmer, Rajeev Thakur, and William Gropp.
Practical
model-checking method for verifying correctness of MPI programs.
In Franck Cappello, Thomas Hérault, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s
Group Meeting, Paris, France, September 30 – October 3, 2007,
Proceedings, volume 4757 of Lecture Notes in Computer
Science, pages 344–353. Springer, 2007.
(doi:10.1007/978-3-540-75416-9_46)
- Rajeev Thakur and William
Gropp.
Open issues in MPI
implementation.
In Lynn Choi, Yunheung Paek, and
Sangyeun Cho, editors, Advances in Computer Systems
Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Seoul, Korea,
August 23-25, 2007, Proceedings, volume 4697 of Lecture Notes in
Computer Science, pages 327–338. Springer, 2007.
(doi:10.1007/978-3-540-74309-5_31)
- Rajeev Thakur and William
Gropp.
Test suite for
evaluating performance of MPI implementations that support
mpi_thread_multiple.
In Franck Cappello, Thomas Hérault, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s
Group Meeting, Paris, France, September 30 – October 3, 2007,
Proceedings, volume 4757 of Lecture Notes in Computer
Science, pages 46–55. Springer, 2007.
(doi:10.1007/978-3-540-75416-9_13)
- Jesper Larsson Träff,
William Gropp, and Rajeev Thakur.
Self-consistent MPI
performance requirements.
In Franck Cappello, Thomas Hérault, and
Jack J. Dongarra, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface, 14th European PVM/MPI User’s
Group Meeting, Paris, France, September 30 – October 3, 2007,
Proceedings, volume 4757 of Lecture Notes in Computer
Science, pages 36–45. Springer, 2007.
(doi:10.1007/978-3-540-75416-9_12)
- Surendra Byna, Xian-He Sun,
Rajeev Thakur, and William Gropp.
Automatic memory optimizations
for improving MPI derived datatype performance.
In Bernd Mohr, Jesper Larsson Träff,
Joachim Worringen, and Jack J. Dongarra,
editors, Recent Advances in Parallel Virtual Machine and Message
Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn,
Germany, September 17-20, 2006, Proceedings, volume 4192 of
Lecture Notes in Computer Science, pages 238–246. Springer,
2006.
(doi:10.1007/11846802_36)
- Kenin Coloma, Avery Ching,
Alok N. Choudhary, Wei-keng Liao,
Robert B. Ross, Rajeev Thakur, and
Lee Ward.
A new flexible MPI
collective I/O implementation.
In Proceedings of the 2006 IEEE International Conference on Cluster
Computing, September 25-28, 2006, Barcelona, Spain. IEEE Computer
Society, 2006.
(doi:10.1109/CLUSTR.2006.311865)
- William D. Gropp and Rajeev
Thakur.
Issues in developing a
thread-safe MPI implementation.
In Bernd Mohr, Jesper Larsson Träff,
Joachim Worringen, and Jack J. Dongarra,
editors, Recent Advances in Parallel Virtual Machine and Message
Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn,
Germany, September 17-20, 2006, Proceedings, volume 4192 of
Lecture Notes in Computer Science, pages 12–21. Springer, 2006.
(doi:10.1007/11846802_11)
- William Gropp, Ewing L. Lusk,
Rajeev Thakur, and Robert B. Ross.
S01 – advanced MPI: I/O
and one-sided communication.
In Proceedings of the ACM/IEEE SC2006 Conference on High Performance
Networking and Computing, November 11-17, 2006, Tampa, FL, USA, page
202. ACM Press, 2006.
(doi:10.1145/1188455.1188666)
- Robert Latham, Robert B.
Ross, and Rajeev Thakur.
Can MPI be used for persistent
parallel services?.
In Bernd Mohr, Jesper Larsson Träff,
Joachim Worringen, and Jack J. Dongarra,
editors, Recent Advances in Parallel Virtual Machine and Message
Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn,
Germany, September 17-20, 2006, Proceedings, volume 4192 of
Lecture Notes in Computer Science, pages 275–284. Springer,
2006.
(doi:10.1007/11846802_40)
- Jonghyun Lee, Robert B. Ross,
Scott Atchley, Micah Beck, and
Rajeev Thakur.
MPI-IO/L: efficient
remote I/O for MPI-IO via logistical networking.
In 20th International Parallel and Distributed Processing Symposium
(IPDPS 2006), Proceedings, 25-29 April 2006, Rhodes Island, Greece.
IEEE, 2006.
(doi:10.1109/IPDPS.2006.1639305)
- Salman Pervez, Ganesh
Gopalakrishnan, Robert M. Kirby, Rajeev
Thakur, and William D. Gropp.
Formal verification of programs
that use MPI one-sided communication.
In Bernd Mohr, Jesper Larsson Träff,
Joachim Worringen, and Jack J. Dongarra,
editors, Recent Advances in Parallel Virtual Machine and Message
Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn,
Germany, September 17-20, 2006, Proceedings, volume 4192 of
Lecture Notes in Computer Science, pages 30–39. Springer, 2006.
(doi:10.1007/11846802_13)
- William D. Gropp and Rajeev
Thakur.
An evaluation of implementation
options for MPI one-sided communication.
In Beniamino Di Martino, Dieter
Kranzlmüller, and Jack J. Dongarra, editors,
Recent Advances in Parallel Virtual Machine and Message Passing
Interface, 12th European PVM/MPI Users’ Group Meeting, Sorrento, Italy,
September 18-21, 2005, Proceedings, volume 3666 of Lecture Notes
in Computer Science, pages 415–424. Springer, 2005.
(doi:10.1007/11557265_53)
- Robert Latham, Robert B.
Ross, Rajeev Thakur, and Brian R. Toonen.
Implementing MPI-IO shared
file pointers without file system support.
In Beniamino Di Martino, Dieter
Kranzlmüller, and Jack J. Dongarra, editors,
Recent Advances in Parallel Virtual Machine and Message Passing
Interface, 12th European PVM/MPI Users’ Group Meeting, Sorrento, Italy,
September 18-21, 2005, Proceedings, volume 3666 of Lecture Notes
in Computer Science, pages 84–93. Springer, 2005.
(doi:10.1007/11557265_15)
- Robert B. Ross, Robert
Latham, William Gropp, Rajeev Thakur, and
Brian R. Toonen.
Implementing MPI-IO
atomic mode without file system support.
In 5th International Symposium on Cluster Computing and the Grid (CCGrid
2005), 9-12 May, 2005, Cardiff, UK, pages 1135–1142. IEEE
Computer Society, 2005.
(doi:10.1109/CCGRID.2005.1558687)
- Rajeev Thakur, Rolf
Rabenseifner, and William Gropp.
Optimization of collective
communication operations in MPICH.
Int. J. High Perform. Comput. Appl., 19(1):49–66, 2005.
(doi:10.1177/1094342005051521)
- Rajeev Thakur, Robert B.
Ross, and Robert Latham.
Implementing byte-range locks
using MPI one-sided communication.
In Beniamino Di Martino, Dieter
Kranzlmüller, and Jack J. Dongarra, editors,
Recent Advances in Parallel Virtual Machine and Message Passing
Interface, 12th European PVM/MPI Users’ Group Meeting, Sorrento, Italy,
September 18-21, 2005, Proceedings, volume 3666 of Lecture Notes
in Computer Science, pages 119–128. Springer, 2005.
(doi:10.1007/11557265_19)
- Weihang Jiang, Jiuxing Liu,
Hyun-Wook Jin, Dhabaleswar K. Panda,
Darius Buntinas, Rajeev Thakur, and
William D. Gropp.
Efficient
implementation of MPI-2 passive one-sided communication on infiniband
clusters.
In Dieter Kranzlmüller, Péter Kacsuk,
and Jack J. Dongarra, editors, Recent Advances in
Parallel Virtual Machine and Message Passing Interface, 11th European
PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004,
Proceedings, volume 3241 of Lecture Notes in Computer
Science, pages 68–76. Springer, 2004.
(doi:10.1007/978-3-540-30218-6_16)
- Weihang Jiang, Jiuxing Liu,
Hyun-Wook Jin, Dhabaleswar K. Panda,
William Gropp, and Rajeev Thakur.
High performance MPI-2
one-sided communication over infiniband.
In 4th IEEE/ACM International Symposium on Cluster Computing and the
Grid (CCGrid 2004), April 19-22, 2004, Chicago, Illinois, USA, pages
531–538. IEEE Computer Society, 2004.
(doi:10.1109/CCGRID.2004.1336648)
- Robert Latham, Robert B.
Ross, and Rajeev Thakur.
The impact of file
systems on MPI-IO scalability.
In Dieter Kranzlmüller, Péter Kacsuk,
and Jack J. Dongarra, editors, Recent Advances in
Parallel Virtual Machine and Message Passing Interface, 11th European
PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004,
Proceedings, volume 3241 of Lecture Notes in Computer
Science, pages 87–96. Springer, 2004.
(doi:10.1007/978-3-540-30218-6_18)
- Jonghyun Lee, Robert B. Ross,
Rajeev Thakur, Xiaosong Ma, and
Marianne Winslett.
RFS: efficient and
flexible remote file access for MPI-IO.
In 2004 IEEE International Conference on Cluster Computing (CLUSTER
2004), September 20-23 2004, San Diego, California, USA, pages
71–81. IEEE Computer Society, 2004.
(doi:10.1109/CLUSTR.2004.1392604)
- Rajeev Thakur, William D.
Gropp, and Brian R. Toonen.
Minimizing
synchronization overhead in the implementation of MPI one-sided
communication.
In Dieter Kranzlmüller, Péter Kacsuk,
and Jack J. Dongarra, editors, Recent Advances in
Parallel Virtual Machine and Message Passing Interface, 11th European
PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004,
Proceedings, volume 3241 of Lecture Notes in Computer
Science, pages 57–67. Springer, 2004.
(doi:10.1007/978-3-540-30218-6_15)
- Surendra Byna, William D.
Gropp, Xian-He Sun, and Rajeev Thakur.
Improving the performance
of MPI derived datatypes by optimizing memory-access cost.
In 2003 IEEE International Conference on Cluster Computing (CLUSTER
2003), 1-4 December 2003, Kowloon, Hong Kong, China, pages 412–419.
IEEE Computer Society, 2003.
(doi:10.1109/CLUSTR.2003.1253341)
- William D. Gropp, Ewing L.
Lusk, Robert B. Ross, and Rajeev Thakur.
Using
MPI-2: advanced features of the message passing interface.
In 2003 IEEE International Conference on Cluster Computing (CLUSTER
2003), 1-4 December 2003, Kowloon, Hong Kong, China. IEEE Computer
Society, 2003.
(doi:10.1109/CLUSTER.2003.10010)
- Rajeev Thakur and William
Gropp.
Improving the
performance of collective operations in MPICH.
In Jack J. Dongarra, Domenico Laforenza, and
Salvatore Orlando, editors, Recent Advances in Parallel
Virtual Machine and Message Passing Interface,10th European PVM/MPI Users’
Group Meeting, Venice, Italy, September 29 – October 2, 2003,
Proceedings, volume 2840 of Lecture Notes in Computer
Science, pages 257–267. Springer, 2003.
(doi:10.1007/978-3-540-39924-7_38)
- Rajeev Thakur, William Gropp,
and Ewing L. Lusk.
Optimizing
noncontiguous accesses in MPI-IO.
Parallel Comput., 28(1):83–105, 2002.
(doi:10.1016/S0167-8191(01)00129-6)
- Rajeev Thakur, William Gropp,
and Ewing L. Lusk.
On implementing MPI-IO
portably and with high performance.
In Proceedings of the Sixth Workshop on I/O in Parallel and Distributed
Systems, IOPADS 1999, May 5, 1999, Atlanta, GA, USA, pages 23–32.
ACM, 1999.
(doi:10.1145/301816.301826)
- Rajeev Thakur, William Gropp,
and Ewing L. Lusk.
A case for using mpi’s derived
datatypes to improve I/O performance.
In Proceedings of the ACM/IEEE Conference on Supercomputing, SC 1998,
November 7-13, 1998, Orlando, FL, USA, page 1. IEEE Computer
Society, 1998.
(doi:10.1109/SC.1998.10006)