Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors
KTH Royal Institute of Technology, Sweden.
RISE, Swedish ICT, SICS.
RISE, Swedish ICT, SICS. KTH Royal Institute of Technology, Sweden.ORCID iD: 0000-0002-9637-2065
2015 (English)In: Scientific Programming, ISSN 1058-9244, E-ISSN 1875-919X, Vol. 2015, article id 981759Article in journal (Refereed) Published
Abstract [en]

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers

Place, publisher, year, edition, pages
IOS Press , 2015. Vol. 2015, article id 981759
Keywords [en]
Application programming interfaces (API), Architectural design, Benchmarking, Computer architecture, Multiprocessing systems, Network management, Scheduling, Scheduling algorithms, Software architecture, Architectural knowledge, Data distribution, Improve performance, Many-core processors, Non uniform data, Performance degradation, Processor architectures, Scheduling techniques, Multitasking
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-41881DOI: 10.1155/2015/981759Scopus ID: 2-s2.0-84947272497OAI: oai:DiVA.org:ri-41881DiVA, id: diva2:1377782
Available from: 2019-12-12 Created: 2019-12-12 Last updated: 2019-12-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Brorsson, Mats

Search in DiVA

By author/editor
Brorsson, Mats
By organisation
SICS
In the same journal
Scientific Programming
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 3 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.35.9