direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

KBS Publikationen

Communication-aware process and thread mapping using online communication detection
Zitatschlüssel Diener:2015:CDSM
Autor Matthias Diener and Eduardo H.M. Cruz and Philippe O.A. Navaux and Anselm Busse and Hans-Ulrich Heiß
Seiten 43–63
Jahr 2015
DOI 10.1016/j.parco.2015.01.005
Journal Parallel Computing
Jahrgang 43
Monat mar
Zusammenfassung Abstract The rising complexity of memory hierarchies and interconnections in parallel shared memory architectures leads to differences in the communication performance. These differences can be exploited to perform a communication-aware mapping of parallel applications to the hardware topology, improving their performance and energy efficiency. To perform the mapping, it is necessary to determine the communication behavior of the processes and threads of the application. Previous methods rely on static communication traces to detect communication, require hardware changes or support only a subset of parallelization models. We propose CDSM, Communication Detection in Shared Memory, a mechanism that detects communication in from page faults and uses this information to perform the mapping. \CDSM\ works on the operating system level during the execution of the parallel application and supports all parallelization models that use shared memory for communication. It does not require modifications to the applications, previous knowledge about their behavior, or changes to the hardware and runtime libraries. Experiments with the MPI, MPI+OpenMP and OpenMP implementations of the \NAS\ parallel benchmarks, the \HPCC\ benchmark and the \PARSEC\ benchmark suite on a shared memory machine show that \CDSM\ has a high detection accuracy with a negligible overhead. Execution time and processor energy consumption were reduced by up to 35.9% and 18.9%, respectively (10.2% and 7.3%, on average). Experiments on a cluster system, where \CDSM\ optimizes the communication within each node, showed an average execution time reduction of 10.4%.
Link zur Originalpublikation Download Bibtex Eintrag

Zusatzinformationen / Extras

Direktzugang

Schnellnavigation zur Seite über Nummerneingabe