Change search
Refine search result
1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Bhatti, Muhammad Khurram
    et al.
    RISE, Swedish ICT, SICS.
    Oz, Isil
    RISE, Swedish ICT, SICS.
    Popov, Konstantin
    RISE, Swedish ICT, SICS.
    Muddukrishna, Ananya
    KTH Royal Institute of Technology, Sweden.
    Brorsson, Mats
    RISE, Swedish ICT, SICS. KTH Royal Institute of Technology, Sweden.
    Noodle: A heuristic algorithm for task scheduling in MPSoC architectures2014In: Proceedings - 2014 17th Euromicro Conference on Digital System Design, DSD 2014, Institute of Electrical and Electronics Engineers Inc. , 2014, p. 667-670, article id 6927309Conference paper (Refereed)
    Abstract [en]

    Task scheduling is crucial for the performance of parallel applications. Given dependence constraints between tasks, their arbitrary sizes, and bounded resources available for execution, optimal task scheduling is considered as an NP-hard problem. Therefore, proposed scheduling algorithms are based on heuristics. This paper1 presents a novel heuristic algorithm, called the Noodle heuristic, which differs from the existing list scheduling techniques in the way it assigns task priorities. We conduct an extensive experimental to validate Noodle for task graphs taken from Standard Task Graph (STG). Results show that Noodle produces schedules that are within a maximum of 12% (in worst-case) of the optimal schedule for 2, 4, and 8 core systems. We also compare Noodle with existing scheduling heuristics and perform comparative analysis of its performance.

  • 2.
    Muddukrishna, A.
    et al.
    KTH Royal Institute of Technology, Sweden.
    Jonsson, Peter
    RISE, Swedish ICT, SICS.
    Brorsson, Mats
    RISE, Swedish ICT, SICS. KTH Royal Institute of Technology, Sweden.
    Characterizing task-based OpenMP programs2015In: PLOS ONE, E-ISSN 1932-6203, Vol. 10, no 4, article id e0123545Article in journal (Refereed)
    Abstract [en]

    Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance.

  • 3.
    Muddukrishna, A.
    et al.
    KTH Royal Institute of Technology, Sweden.
    Jonsson, Peter
    RISE, Swedish ICT, SICS.
    Brorsson, Mats
    RISE, Swedish ICT, SICS. KTH Royal Institute of Technology, Sweden.
    Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors2015In: Scientific Programming, ISSN 1058-9244, E-ISSN 1875-919X, Vol. 2015, article id 981759Article in journal (Refereed)
    Abstract [en]

    Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers

  • 4.
    Muddukrishna, Ananya
    et al.
    KTH Royal Institute of Technology, Sweden.
    Jonsson, Peter A.
    RISE, Swedish ICT, SICS.
    Vlassov, Vladimir
    KTH Royal Institute of Technology, Sweden.
    Brorsson, Mats
    RISE, Swedish ICT, SICS. KTH Royal Institute of Technology, Sweden.
    Locality-aware task scheduling and data distribution on NUMA systems2013In: Lecture Notes in Computer Science, 2013, Vol. 8122, p. 156-170Conference paper (Refereed)
    Abstract [en]

    Modern parallel computer systems exhibit Non-Uniform Memory Access (NUMA) behavior. For best performance, any parallel program therefore has to match data allocation and scheduling of computations to the memory architecture of the machine. When done manually, this becomes a tedious process and since each individual system has its own peculiarities this also leads to programs that are not performance-portable. We propose the use of a data distribution scheme in which NUMA hardware peculiarities are abstracted away from the programmer and data distribution is delegated to a runtime system which is generated once for each machine. In addition we propose using task data dependence information now possible with the OpenMP 4.0RC2 proposal to guide the scheduling of OpenMP tasks to further reduce data stall times. We demonstrate the viability and performance of our proposals on a four socket AMD Opteron machine with eight NUMA nodes. We identify that both data distribution and locality-aware task scheduling improves performance compared to default policies while still providing an architecture-oblivious approach for the programmer.

  • 5. Podobas, Artur
    et al.
    Brorsson, Mats
    RISE - Research Institutes of Sweden (2017-2019), ICT, SICS.
    Faxén, Karl-Filip
    RISE, Swedish ICT, SICS.
    A Comparison of some recent Task-based Parallel Programming Models2010Conference paper (Refereed)
    Abstract [en]

    The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0. The choice of model and implementation can have a major impact on the final performance and in order to understand some of the trade-offs we have made a quantitative study comparing four implementations of OpenMP (gcc, Intel icc, Sun studio and the research compiler Mercurium/nanos mcc), Cilk++ and Wool, a high-performance task-based library developed at SICS. Abstract. We use microbenchmarks to characterize costs for task-creation and stealing and the Barcelona OpenMP Tasks Suite for characterizing application performance. By far Wool and Cilk++ have the lowest overhead in both spawning and stealing tasks. This is reflected in application performance when many tasks with small granularity are spawned where Cilk++ and, in particular, has the highest performance. For coarse granularity applications, the OpenMP implementations have quite similar performance as the more light-weight Cilk++ and Wool except for one application where mcc is superior thanks to a superior task scheduler. Abstract. The OpenMP implemenations are generally not yet ready for use when the task granularity becomes very small. There is no inherent reason for this, so we expect future implementations of OpenMP to focus on this issue.

    Download full text (pdf)
    fulltext
  • 6.
    Varisteas, Georgios
    et al.
    KTH Royal Institute of Technology, Sweden.
    Brorsson, Mats
    RISE, Swedish ICT, SICS. KTH Royal Institute of Technology, Sweden.
    Palirria: accurate on-line parallelism estimation for adaptive work-stealing2016In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 28, no 2, p. 472-491Article in journal (Refereed)
    Abstract [en]

    Summary We present Palirria, a self-adapting work-stealing scheduling method for nested fork/join parallelism that can be used to estimate the number of utilizable workers and self-adapt accordingly. The estimation mechanism is optimized for accuracy, minimizing the requested resources without degrading performance. We implemented Palirria for both the Linux and Barrelfish operating systems and evaluated it on two platforms: a 48-core Non-Uniform Memory Access (NUMA) multiprocessor and a simulated 32-core system. Compared with state-of-the-art, we observed higher accuracy in estimating resource requirements. This leads to improved resource utilization and performance on par or better to executing with fixed resource allotments.

1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf