Change search
Link to record
Permanent link

Direct link
Publications (10 of 14) Show all publications
Oz, I., Bhatti, M. K. K., Popov, K. & Brorsson, M. (2019). Regression-Based Prediction for Task-Based Program Performance. Journal of Circuits, Systems and Computers, 8(4), Article ID 1950060.
Open this publication in new window or tab >>Regression-Based Prediction for Task-Based Program Performance
2019 (English)In: Journal of Circuits, Systems and Computers, ISSN 0218-1266, Vol. 8, no 4, article id 1950060Article in journal (Refereed) Published
Abstract [en]

As multicore systems evolve by increasing the number of parallel execution units, parallel programming models have been released to exploit parallelism in the applications. Task-based programming model uses task abstractions to specify parallel tasks and schedules tasks onto processors at runtime. In order to increase the efficiency and get the highest performance, it is required to identify which runtime configuration is needed and how processor cores must be shared among tasks. Exploring design space for all possible scheduling and runtime options, especially for large input data, becomes infeasible and requires statistical modeling. Regression-based modeling determines the effects of multiple factors on a response variable, and makes predictions based on statistical analysis. In this work, we propose a regression-based modeling approach to predict the task-based program performance for different scheduling parameters with variable data size. We execute a set of task-based programs by varying the runtime parameters, and conduct a systematic measurement for influencing factors on execution time. Our approach uses executions with different configurations for a set of input data, and derives different regression models to predict execution time for larger input data. Our results show that regression models provide accurate predictions for validation inputs with mean error rate as low as 6.3%, and 14% on average among four task-based programs.

Keywords
Performance prediction, regression, task-based programs, Computer systems programming, Forecasting, Input output programs, Parallel processing systems, Parallel programming, Regression analysis, Scheduling, Parallel programming model, Regression-based model, Run-time configuration, Scheduling parameters, Task-based, Task-based programming, Multicore programming
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-34593 (URN)10.1142/S0218126619500609 (DOI)2-s2.0-85049081368 (Scopus ID)
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2023-05-25Bibliographically approved
Bhatti, M. K., Oz, I., Amin, S., Mushtaq, M., Farooq, U., Popov, K. & Brorsson, M. (2018). Locality-aware task scheduling for homogeneous parallel computing systems. Computing, 100(6), 557-595
Open this publication in new window or tab >>Locality-aware task scheduling for homogeneous parallel computing systems
Show others...
2018 (English)In: Computing, ISSN 0010-485X, E-ISSN 1436-5057, Vol. 100, no 6, p. 557-595Article in journal (Refereed) Published
Abstract [en]

In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.

Keywords
Directed acyclic graph (DAG), Embedded systems, Homogeneous systems, Multicore scheduling, Parallel computing, Runtime resource management, Directed graphs, Distributed computer systems, Energy utilization, Hardware, Heuristic algorithms, Multitasking, Optimization, Parallel processing systems, Scheduling algorithms, Homogeneous system, Multi core, Multi-level cache architecture, Parallel computing system, Performance parameters, Resource management, State-of-the-art algorithms, Scheduling
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-33981 (URN)10.1007/s00607-017-0581-6 (DOI)2-s2.0-85032798462 (Scopus ID)
Available from: 2018-07-03 Created: 2018-07-03 Last updated: 2023-05-25Bibliographically approved
Bhatti, M. K., Oz, I., Popov, K., Brorsson, M. & Farooq, U. (2016). Scheduling of Parallel Tasks with Proportionate Priorities. Arabian Journal for Science and Engineering, 41(8), 3279-3295
Open this publication in new window or tab >>Scheduling of Parallel Tasks with Proportionate Priorities
Show others...
2016 (English)In: Arabian Journal for Science and Engineering, ISSN 2193-567X, Vol. 41, no 8, p. 3279-3295Article in journal (Refereed) Published
Abstract [en]

Parallel computing systems promise higher performance for computationally intensive applications. Since programmes for parallel systems consist of tasks that can be executed simultaneously, task scheduling becomes crucial for the performance of these applications. Given dependence constraints between tasks, their arbitrary sizes, and bounded resources available for execution, optimal task scheduling is considered as an NP-hard problem. Therefore, proposed scheduling algorithms are based on heuristics. This paper presents a novel list scheduling heuristic, called the Noodle heuristic. Noodle is a simple yet effective scheduling heuristic that differs from the existing list scheduling techniques in the way it assigns task priorities. The priority mechanism of Noodle maintains a proportionate fairness among all ready tasks belonging to all paths within a task graph. We conduct an extensive experimental evaluation of Noodle heuristic with task graphs taken from Standard Task Graph. Our experimental study includes results for task graphs comprising of 50, 100, and 300 tasks per graph and execution scenarios with 2-, 4-, 8-, and 16-core systems. We report results for average Schedule Length Ratio (SLR) obtained by producing variations in Communication to Computation cost Ratio. We also analyse results for different degree of parallelism and number of edges in the task graphs. Our results demonstrate that Noodle produces schedules that are within a maximum of 12 % (in worst-case) of the optimal schedule for 2-, 4-, and 8-core systems. We also compare Noodle with existing scheduling heuristics and perform comparative analysis of its performance. Noodle outperforms existing heuristics for average SLR values. 

Place, publisher, year, edition, pages
Springer Verlag, 2016
Keywords
Directed acyclic graph (DAG), List scheduling, Multicore, Multiprocessor, Parallel computing, Static task scheduling
National Category
Natural Sciences
Identifiers
urn:nbn:se:ri:diva-41860 (URN)10.1007/s13369-016-2180-9 (DOI)2-s2.0-84976530428 (Scopus ID)
Available from: 2019-12-12 Created: 2019-12-12 Last updated: 2023-05-25Bibliographically approved
Bhatti, M. K., Oz, I., Popov, K., Muddukrishna, A. & Brorsson, M. (2014). Noodle: A heuristic algorithm for task scheduling in MPSoC architectures. In: Proceedings - 2014 17th Euromicro Conference on Digital System Design, DSD 2014: . Paper presented at 17th Euromicro Conference on Digital System Design, DSD 2014, 27 August 2014 through 29 August 2014 (pp. 667-670). Institute of Electrical and Electronics Engineers Inc., Article ID 6927309.
Open this publication in new window or tab >>Noodle: A heuristic algorithm for task scheduling in MPSoC architectures
Show others...
2014 (English)In: Proceedings - 2014 17th Euromicro Conference on Digital System Design, DSD 2014, Institute of Electrical and Electronics Engineers Inc. , 2014, p. 667-670, article id 6927309Conference paper, Published paper (Refereed)
Abstract [en]

Task scheduling is crucial for the performance of parallel applications. Given dependence constraints between tasks, their arbitrary sizes, and bounded resources available for execution, optimal task scheduling is considered as an NP-hard problem. Therefore, proposed scheduling algorithms are based on heuristics. This paper1 presents a novel heuristic algorithm, called the Noodle heuristic, which differs from the existing list scheduling techniques in the way it assigns task priorities. We conduct an extensive experimental to validate Noodle for task graphs taken from Standard Task Graph (STG). Results show that Noodle produces schedules that are within a maximum of 12% (in worst-case) of the optimal schedule for 2, 4, and 8 core systems. We also compare Noodle with existing scheduling heuristics and perform comparative analysis of its performance.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2014
Keywords
Directed Acyclic Graph (DAG), List Scheduling, Multiprocessor System-on-Chip(MPSoC), Parallel Computing, Algorithms, Application specific integrated circuits, Computational complexity, Directed graphs, Heuristic algorithms, Microprocessor chips, Multiprocessing systems, Multitasking, Optimization, Parallel processing systems, Scheduling, System-on-chip, Comparative analysis, List-scheduling, MPSoC architectures, Multiprocessor system on chips, Optimal schedule, Parallel application, Scheduling heuristics, Scheduling algorithms
National Category
Engineering and Technology
Identifiers
urn:nbn:se:ri:diva-46489 (URN)10.1109/DSD.2014.71 (DOI)2-s2.0-84928812217 (Scopus ID)9781479957934 (ISBN)
Conference
17th Euromicro Conference on Digital System Design, DSD 2014, 27 August 2014 through 29 August 2014
Available from: 2020-08-24 Created: 2020-08-24 Last updated: 2023-05-25Bibliographically approved
Al-Shishtawy, A., Asif Fayyaz, M., Popov, K. & Vlassov, V. (2010). Achieving Robust Self-Management for Large-Scale Distributed Applications (7ed.). Kista, Sweden
Open this publication in new window or tab >>Achieving Robust Self-Management for Large-Scale Distributed Applications
2010 (English)Report (Other academic)
Abstract [en]

Autonomic managers are the main architectural building blocks for constructing self-management capabilities of computing systems and applications. One of the major challenges in developing self-managing applications is robustness of management elements which form autonomic managers. We believe that transparent handling of the effects of resource churn (joins/leaves/failures) on management should be an essential feature of a platform for self-managing large-scale dynamic distributed applications, because it facilitates the development of robust autonomic managers and hence improves robustness of self-managing applications. This feature can be achieved by providing a robust management element abstraction that hides churn from the programmer. In this paper, we present a generic approach to achieve robust services that is based on finite state machine replication with dynamic reconfiguration of replica sets. We contribute a decentralized algorithm that maintains the set of nodes hosting service replicas in the presence of churn. We use this approach to implement robust management elements as robust services that can operate despite of churn. Our proposed decentralized algorithm uses peer-to-peer replica placement schemes to automate replicated state machine migration in order to tolerate churn. Our algorithm exploits lookup and failure detection facilities of a structured overlay network for managing the set of active replicas. Using the proposed approach, we can achieve a long running and highly available service, without human intervention, in the presence of resource churn. In order to validate and evaluate our approach, we have implemented a prototype that includes the proposed algorithm.

Place, publisher, year, edition, pages
Kista, Sweden: , 2010 Edition: 7
Series
SICS Technical Report, ISSN 1100-3154 ; 2010:02
Keywords
autonomic computing, distributed systems, self-management, replicated state machines, service migration, peer-to-peer
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23666 (URN)
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-05-25Bibliographically approved
De Palma, N., Parlavanzas, N., Popov, K., Brand, P. & Vlassov, V. (2009). Tools for Autonomic Computing (10ed.). In: 5th International Conference on Autonomic and Autonomous Systems (ICAS 2009): . Paper presented at 5th International Conference on Autonomic and Autonomous Systems (ICAS 2009), 21-25 April 2009, Valencia, Spain. (pp. 313-320). IEEE Computer Society
Open this publication in new window or tab >>Tools for Autonomic Computing
Show others...
2009 (English)In: 5th International Conference on Autonomic and Autonomous Systems (ICAS 2009), IEEE Computer Society , 2009, 10, p. 313-320Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2009 Edition: 10
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23669 (URN)
Conference
5th International Conference on Autonomic and Autonomous Systems (ICAS 2009), 21-25 April 2009, Valencia, Spain.
Projects
Grid4All
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-05-25Bibliographically approved
Faxén, K.-F., Popov, K., Janson, S. & Albertsson, L. (2008). Embla - Data Dependence Profiling for Parallel Programming (10ed.). In: Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems: . Paper presented at 2008 International Conference on Complex, Intelligent and Software Intensive Systems (pp. 780-785).
Open this publication in new window or tab >>Embla - Data Dependence Profiling for Parallel Programming
2008 (English)In: Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems, 2008, 10, p. 780-785Conference paper, Published paper (Refereed)
Abstract [en]

With the proliferation of multicore processors, there is an urgent need for tools and methodologies supporting parallelization of existing applications. In this paper, we present a novel tool for aiding programmers in parallelizing programs. The tool, Embla, is based on the Valgrind framework, and allows the user to discover the data dependences in a sequential program, thereby exposing opportunities for parallelization. Embla performs an off-line dynamic analysis, and records dependences as they arise during program execution. It reports an optimistic view of parallelizable sequences, and ignores dependences that do not arise during execution. Moreover, since the tool instruments the machine code of the program, it is largely language independent. Since Embla finds the dependencies that occur for particular executions, the confidence one would assign to its results depend on whether different executions yield different (bad) or largely the same (good) dependencies. We present a preliminary investigation into this issue using 84 different inputs to the SPEC CPU 2006 benchmark 403.gcc. The results indicate that there is a strong correlation between coverage and finding dependencies; executing the entire program is likely to reveal all dependencies.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-22255 (URN)
Conference
2008 International Conference on Complex, Intelligent and Software Intensive Systems
Projects
Embla
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-06-08Bibliographically approved
Nimar, G., Vlassov, V. & Popov, K. (2005). Practical Experience in Building an Agent System for Semantics-Based Provision and Selection of Grid Services (1ed.). In: : . Paper presented at Sixth International Conference on Palel Processing and Applied Mathematics.
Open this publication in new window or tab >>Practical Experience in Building an Agent System for Semantics-Based Provision and Selection of Grid Services
2005 (English)Conference paper, Published paper (Refereed)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-21099 (URN)
Conference
Sixth International Conference on Palel Processing and Applied Mathematics
Projects
GES3
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-05-25Bibliographically approved
Groleau, W., Vlassov, V. & Popov, K. (2005). Towards Semantics-Based Resource Discovery for the Grid (1ed.). In: : . Paper presented at CoreGRID Integration Workshop.
Open this publication in new window or tab >>Towards Semantics-Based Resource Discovery for the Grid
2005 (English)Conference paper, Published paper (Refereed)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-21100 (URN)
Conference
CoreGRID Integration Workshop
Projects
GES3
Note

Preliminary proceedings.

Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-05-25Bibliographically approved
Popov, K., Brand, P. & Haridi, S. (2003). An efficient marshaling framework for distributed systems (2ed.). In: : . Paper presented at PaCT 2003: 7th Int'l Conference on Parallel Computing Technologies, 15-19 Sept 2003, Nizhni Novgorod, Russia.
Open this publication in new window or tab >>An efficient marshaling framework for distributed systems
2003 (English)Conference paper, Published paper (Refereed)
Abstract [en]

An efficient (un)marshaling framework is presented. It is designed for distributed applications implemented in languages such as C++. A marshaler/unmarshaler pair converts arbitrary \emph{structured} data between its host and network representations. This technology can also be used for persistent storage. Our framework simplifies the design of efficient and flexible marshalers. The network latency is reduced by concurrent execution of (un)marshaling and network operations. The framework is actually used in Mozart, a distributed programming system that implements Oz, a multi-paradigm concurrent language. Mozart, including the implementation of the framework, is available at http://www.mozart-oz.org.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-22429 (URN)
Conference
PaCT 2003: 7th Int'l Conference on Parallel Computing Technologies, 15-19 Sept 2003, Nizhni Novgorod, Russia
Note

http://www.sics.se/~kost/pubs/marshaling-PaCT03.ps

Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-06-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9431-5139

Search in DiVA

Show all publications