OSU Micro Benchmarks v5.9 (03/02/2022) * New Features & Enhancements - Add support for data validation in mpi pt2pt benchmarks * Enabled with the `-c` option - Add support for data validation in mpi collective benchmarks * Enabled with the `-c` option - Add support for NCCL collective benchmarks * osu_nccl_alltoall * Bug Fixes (since v5.8) - Remove osu_latency_mp & osu_latency_mt from list of benchmarks that have GPU support * Thanks to Carl Ponder @NVIDIA for reporting the issue. - Ignore the `--accelerator` option when using pt2pt benchmarks with GPU enabled builds. * Thanks to Adamn Goldman @Intel for the report OSU Micro Benchmarks v5.8 (08/12/2021) * New Features & Enhancements - Add support for NCCL pt2pt benchmarks * osu_nccl_bibw * osu_nccl_bw * osu_nccl_latency - Add support for NCCL collective benchmarks * osu_nccl_allgather * osu_nccl_allreduce * osu_nccl_bcast * osu_nccl_reduce * osu_nccl_reduce_scatter - Add data validation support for * osu_allreduce * osu_nccl_allreduce * osu_reduce * osu_nccl_reduce * osu_alltoall * Bug Fixes (since v5.7.1) - Fix bug in support for CUDA managed memory benchmarks - Thanks to Adam Goldman @Intel for the report and the initial patch - Protect managed memory functionality with appropriate compile time flag OSU Micro Benchmarks v5.7.1 (05/11/2021) * New Features & Enhancements - Add support to send and receive data from different buffers for osu_latency, osu_bw, osu_bibw, and osu_mbw_mr - Enhance support for CUDA managed memory benchmarks - Thanks to Ian Karlin and Nathan Hanford @LLNL for the feedback - Add support to print minimum and maximum communication times for non-blocking benchmarks * Bug Fixes (since v5.7) - Update README file with updated description for osu_latency_mp - Thanks to Honggang Li @RedHat for the suggestion - Fix error in setting benchmark name in osu_allgatherv.c and osu_allgatherv.c - Thanks to Brandon Cook @LBL for the report OSU Micro Benchmarks v5.7 (12/11/2020) * New Features & Enhancements - Add support to OMB to evaluate the performance of various primitives with AMD GPU device and ROCm support - This functionality is exposed when configured with --enable-rocm option - Thanks to AMD for the initial patch * Bug Fixes (since v5.6.3) - Enhance one-sided window creation and fix a potential issue when using MPI_Win_allocate * Thanks to Bert Wesarg and George Katevenis for reporting the issue and providing initial patch - Remove additional '-M' option that gets printed with help message for osu_latency_mt and osu_latency_mp * Thanks to Nick Papior for the report - Added missing '-W' option support for one-sided bandwidth tests OSU Micro Benchmarks v5.6.3 (06/01/2020) * New Features & Enhancements - Add support for benchmarking applications that use 'fork' system call * osu_latency_mp * Bug Fixes (since v5.6.2) - Allow passing flags to nvcc compiler - Fix compilation issue with IBM XLC++ compilers and CUDA 10.2 - Fix issues in window creation with host-to-device and device-to-host transfers for one-sided tests OSU Micro Benchmarks v5.6.2 (08/08/2019) * New Features & Enhancements - Add support for benchmarking GPU-Aware multi-threaded point-to-point operations * osu_latency_mt * Bug Fixes (since v5.6.1) - Fix issue with freeing in osu_get_bw benchmark - Fix issues with out of tree builds - Thanks to Joseph Schuchart@HLRS for reporting the issue - Fix incorrect header in osu_mbw_mr benchmark - Fix memory alignment for non-heap allocations in openshmem message rate benchmarks - Thanks to Yossi@Mellanox for pointing out the issue OSU Micro Benchmarks v5.6.1 (03/15/2019) * Bug Fixes (since v5.6) - Fix issue with latency computation in osu_latency_mt benchmark. OSU Micro Benchmarks v5.6 (03/01/2019) * New Features & Enhancements - Add support for benchmarking GPU-Aware multi-pair point-to-point operations * osu_mbw_mr * osu_multi_lat - Add support to specify different number of sender and receiver threads for osu_latency_mt benchmark * Bug Fixes - Remove -t option for blocking collectives - Fix issue when only building OpenSHMEM benchmarks - Improve error reporting for one-sided benchmarks - Fix compilation issues with PGI compilers for GPU-Aware benchmarks - Fix incorrect handling of configure options - Thanks to Tony Curtis @StonyBrook for the report - Fix compilation warnings OSU Micro Benchmarks v5.5 (11/10/18) * New Features & Enhancements - Introduce new MPI non-blocking collective benchmarks with support to measure overlap of computation and communication for CPUs and GPUs - osu_ireduce - osu_iallreduce OSU Micro Benchmarks v5.4.4 (09/21/18) * Bug Fixes - Deprecate the need to specify get_local_rank when running GPU-based benchmarks built with MVAPICH2 and launched with mpirun_rsh - Enhance error checking for GPU-based benchmarks - Fix issues in GPU-based OpenACC benchmarks - Fix race condition with OpenSHMEM non-blocking and overlap benchmarks OSU Micro Benchmarks v5.4.3 (07/23/18) * Bug Fixes - Fix buffer overflow in osu_reduce_scatter - Thanks to Matias A Cabral @Intel for reporting the issue and patch - Thanks to Gilles Gouaillardet for creating the patch - Fix buffer overflow in one sided tests - Thanks to John Byrne @HPE for reporting this issue - Fix buffer overflow in multi threaded latency test - Fix issues with freeing buffers for one-sided tests - Fix issues with freeing buffers for CUDA-enabled tests - Fix warning messages for benchmarks that do not support CUDA and/or Managed memory - Thanks to Carl Ponder@NVIDIA for reporting this issue - Fix compilation warnings OSU Micro Benchmarks v5.4.2 (04/30/18) * New Features & Enhancements - Add "-W --window-size" option to osu_bw and osu_bibw * Bug Fixes - Fix issues with out of tree builds - Thanks to Adam Moody @LLNL for reporting the issue - Fix PGI and XLC builds by using the correct generated cpp files - Fix crash with osu_mbw_mr for large messages - Fix minor error in Makefile - Fix compilation warnings OSU Micro Benchmarks v5.4.1 (02/19/18) * New Features & Enhancements - Enhanced help messages and runtime parameters * Bug Fixes - Fix compile and runtime issues in PGAS benchmarks (OpenSHMEM, UPC, and UPC++) exposed by PGI compiler - Added warning message to display memory limitation when running benchmarks with very large messages - Fix memory leaks for device buffers - Fix issues with type overflows - Fix an issue with pWork symmetric heap allocation in oshm_reduce benchmark - Thanks to Naveen Ravichandrasekaran@Cray for the report OSU Micro Benchmarks v5.4.0 (10/30/17) * New Features & Enhancements - Introduce new OpenSHMEM Non-blocking Benchmarks * osu_oshm_get_mr_nb * osu_oshm_get_nb * osu_oshm_put_mr_nb * osu_oshm_put_nb * osu_oshm_put_overlap - Automatically build OpenSHMEM 1.3 benchmarks when library support is detected - Add ability to specify min and max message size for point-to-point and one-sided benchmarks - Enhanced error handling for MPI benchmarks - Code clean-ups and unification of utility functions across benchmarks - Enhanced help messages and runtime parameters * Bug Fixes - Fix compile-time warnings - Fix peer calculation formula in UPC/UPC++ benchmarks - Fix correct number of warmup iterations in osu_barrier benchmark OSU Micro Benchmarks v5.3.2 (09/08/16) * New Features & Enhancements - Allow specifying very large message sizes (>2GB) for collective benchmarks * Bug Fixes - Fix compilation errors due to missing type casting OSU Micro Benchmarks v5.3.1 (08/08/16) * New Features & Enhancements - Add option to control whether CUDA kernels are built - Add runtime option to specify number of threads for osu_latency_mt * Bug Fixes - Check if -lrt or -lpthread is needed - Fix compilation warnings - Fix non-blocking collective memory leak - Correct documentation for osu_multi_lat OSU Micro Benchmarks v5.3 (03/25/16) * New Features & Enhancements - Introduce new UPC++ Benchmarks * osu_upcxx_allgather * osu_upcxx_alltoall * osu_upcxx_async_copy_get * osu_upcxx_async_copy_put * osu_upcxx_bcast * osu_upcxx_gather * osu_upcxx_reduce * osu_upcxx_scatter * Bug Fixes - Determine page size at runtime in OpenSHMEM benchmarks (fixes issue seen on OpenPower machines) OSU Micro Benchmarks v5.2 (02/05/16) * New Features & Enhancements - Support for CUDA-Aware Managed memory * osu_bibw * osu_bw * osu_latency * osu_allgather * osu_allgatherv * osu_allreduce * osu_alltoall * osu_alltoallv * osu_bcast * osu_gather * osu_gatherv * osu_reduce * osu_reduce_scatter * osu_scatter * osu_scatterv - Add ability to specify minimum message size in addition to maximum message size for all collective benchmarks OSU Micro Benchmarks v5.1 (11/10/15) * New Features & Enhancements - Introduce non-blocking collective v-variants as well as ialltoallw * osu_iallgatherv * osu_ialltoallv * osu_igatherv * osu_iscatterv * osu_ialltoallw - Add support for benchmarking GPU-Aware non-blocking collectives. Overlap can be computed using either CPU or GPU kernels * osu_iallgather * osu_iallgatherv * osu_ialltoall * osu_ialltoallv * osu_ialltoallw * osu_ibcast * osu_igather * osu_igatherv * osu_iscatter * osu_iscatterv - Allow users the ability to specify zero warmup iterations * Bug Fixes - fix openacc pragma OSU Micro Benchmarks v5.0 (08/17/15) * New Features & Enhancements - Support for a set of non-blocking collectives. The benchmarks can display both the amount of time spent in the collectives and the amount of overlap achievable * osu_iallgather * osu_ialltoall * osu_ibarrier * osu_ibcast * osu_igather * osu_iscatter - Add startup benchmarks to facilitate the ability to measure the amount of time it takes for an MPI library to complete MPI_Init * osu_init * osu_hello - Allocate and align data dynamically - Thanks to Devendar Bureddy from Mellanox for the suggestion - Add options for number of warmup iterations [-x] and number of iterations used per message size [-i] to MPI benchmarks - Thanks to Devendar Bureddy from Mellanox for the suggestion * Bug Fixes - Do not truncate user specified max memory limits - Thanks to Devendar Bureddy from Mellanox for the report and patch OSU Micro Benchmarks v4.4.1 (10/30/14) * Bug Fixes - adding missing MPI3 guard for WIN_ALLOCATE - capture getopt return value in an int instead of char OSU Micro Benchmarks v4.4 (8/23/14) * New Features & Enhancements - Support for MPI-3 RMA (one-sided) and atomic operations using GPU buffers * osu_acc_latency * osu_cas_latency * osu_fop_latency * osu_get_bw * osu_get_latency * osu_put_bibw * osu_put_bw * osu_put_latency * Bug Fixes - remove use of AC_FUNC_MALLOC to avoid undefined rpl_malloc reference - add missing upc benchmarks for make dist rule OSU Micro Benchmarks v4.3.1 (6/20/14) * Bug Fixes - Fix typo in MPI collective benchmark help message - Explicitly mention that -m and -M parameters are specified in bytes OSU Micro Benchmarks v4.3 (3/24/14) * New Features & Enhancements - This new suite includes several new (or updated) benchmarks to measure performance of MPI-3 RMA communication operations with options to select different window creation (WIN_CREATE, WIN_DYNAMIC, and WIN_ALLOCATE) and synchronization functions (LOCK, PSCW, FENCE, FLUSH, FLUSH_LOCAL, and LOCK_ALL) in each benchmark * osu_acc_latency * osu_cas_latency * osu_fop_latency * osu_get_acc_latency * osu_get_bw * osu_get_latency * osu_put_bibw * osu_put_bw * osu_put_latency - New UPC Collective Benchmarks * osu_upc_all_barrier * osu_upc_all_broadcast * osu_upc_all_exchange * osu_upc_all_gather * osu_upc_all_gather_all * osu_upc_all_reduce * osu_upc_all_scatter - Build MPI3 benchmarks when MPI library support is detected * Bug Fixes - Add shmem_quiet() in OpenSHMEM Message Rate benchmark to ensure all previously issued operations are completed - Allocate pWrk from symmetric heap in OpenSHMEM Reduce benchmark OSU Micro Benchmarks v4.2 (11/08/13) * New Features & Enhancements - New OpenSHMEM benchmarks * osu_oshm_fcollect - Enable handling of GPU device buffers in all MPI collective benchmarks - Add device binding for OpenACC benchmarks * Bug Fixes - Add upc_fence after memput in osu_upc_memput benchmark - Correct CUDA configuration example in README - Fix several warnings OSU Micro Benchmarks v4.1 (8/24/13) * New Features & Enhancements - New OpenSHMEM benchmarks * osu_oshm_barrier * osu_oshm_broadcast * osu_oshm_collect * osu_oshm_reduce - New MPI-3 RMA Atomics benchmarks * osu_cas_flush * osu_fop_flush OSU Micro Benchmarks v4.0.1 (5/06/13) * Bug Fixes - Fix several warnings OSU Micro Benchmarks v4.0 (4/16/13) * New Features & Enhancements - Support buffer allocation using OpenACC and CUDA in osu_alltoall, osu_gather, and osu_scatter benchmarks - Limit amount of memory allocated by collective benchmarks dynamically based on number of processes - Memory limit can also be explicitly set by the user through the -m option - Support for 64-bit atomic operations in osu_oshm_atomics * Bug Fixes - Fix numerical overflow error with reporting bandwidth in osu_mbw_mr OSU Micro Benchmarks v3.9 (2/28/13) * New Features & Enhancements - Support buffer allocation using OpenACC in GPU benchmarks - Use average time instead of max time for calculating the bandwidth and message rate in osu_mbw_mr - Thanks to Alex Mikheev from Mellanox for the patch * Bug Fixes - Properly initialize host buffers for DH and HD transfers in GPU benchmarks OSU Micro Benchmarks v3.8 (11/07/12) * New Features & Enhancements - New UPC benchmarks * osu_upc_memput * osu_upc_memget OSU Micro Benchmarks v3.7 (9/07/12) * New Features & Enhancements - New OpenSHMEM benchmarks * osu_oshm_get * osu_oshm_put_mr * osu_oshm_atomics * osu_oshm_put - Organize installation directory according to benchmark type * Bug Fixes - Destroy cuda context before exiting OSU Micro Benchmarks v3.6 (4/30/12) * New Features & Enhancements - New collective benchmarks * osu_allgather * osu_allgatherv * osu_allreduce * osu_alltoall * osu_alltoallv * osu_barrier * osu_bcast * osu_gather * osu_gatherv * osu_reduce * osu_reduce_scatter * osu_scatter * osu_scatterv * Bug Fixes - Fix GPU binding issue when running with HH mode OSU Micro Benchmarks v3.5.2 (3/22/12) * Bug Fixes - Fix typo which led to use of incorrect buffers OSU Micro Benchmarks v3.5.1 (2/02/12) * New Features & Enhancements - Provide script to set GPU affinity for MPI processes * Bug Fixes - Removed GPU binding after MPI_Init to avoid switching context OSU Micro Benchmarks v3.5 (11/09/11) * New Features & Enhancements - Extension of osu_latency, osu_bw, and osu_bibw benchmarks to evaluate the performance of MPI_Send/MPI_Recv operation with NVIDIA GPU device and CUDA support - This functionality is exposed when configured with --enable-cuda option - Flexibility for using buffers in NVIDIA GPU device (D) and host memory (H) - Flexibility for selecting data movement between D->D, D->H and H->D OSU Micro Benchmarks v3.4 (09/13/11) * New Features & Enhancements - Add passive one-sided communication benchmarks - Update one-sided communication benchmarks to provide shared memory hint in MPI_Alloc_mem calls - Update one-sided communication benchmarks to use MPI_Alloc_mem for buffer allocation - Give default values to configure definitions (can now build directly with mpicc) - Update latency benchmarks to begin from 0 byte message * Bug Fixes - Remove memory leaks in one-sided communication benchmarks - Update benchmarks to touch buffers before using them for communication - Fix osu_get_bw test to use different buffers for concurrent communication operations - Fix compilation warnings