Development of a hybrid MPI-thread programming system called PCU.
Inter-thread message passing, including non-blocking collectives.
A novel, scalable termination detection technique for communication rounds.
Hybrid parallel scalability to 16K cores on an IBM Blue Gene/Q.