Change search
Link to record
Permanent link

Direct link
BETA
Vasiloudis, TheodoreORCID iD iconorcid.org/0000-0002-8180-7521
Publications (4 of 4) Show all publications
Vasiloudis, T., Beligianni, F. & De Francisci Morales, G. (2017). BoostVHT: Boosting distributed streaming decision trees. In: International Conference on Information and Knowledge Management, Proceedings: . Paper presented at 26th ACM International Conference on Information and Knowledge Management, CIKM 2017, 6 November 2017 through 10 November 2017 (pp. 899-908).
Open this publication in new window or tab >>BoostVHT: Boosting distributed streaming decision trees
2017 (English)In: International Conference on Information and Knowledge Management, Proceedings, 2017, p. 899-908Conference paper, Published paper (Refereed)
Abstract [en]

Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art. © 2017 Copyright held by the owner/author(s).

Keywords
Boosting, Decision trees, Distributed systems, Online learning, Big data, Cluster computing, Clustering algorithms, Data mining, Distributed computer systems, Forestry, Knowledge management, Online systems, Trees (mathematics), Distributed clusters, Distributed streaming, Flexible Learning, Open source platforms, Parallel learning algorithms, Learning algorithms
National Category
Natural Sciences
Identifiers
urn:nbn:se:ri:diva-33210 (URN)10.1145/3132847.3132974 (DOI)2-s2.0-85037345394 (Scopus ID)9781450349185 (ISBN)
Conference
26th ACM International Conference on Information and Knowledge Management, CIKM 2017, 6 November 2017 through 10 November 2017
Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2018-08-13Bibliographically approved
Görnerup, O., Gillblad, D. & Vasiloudis, T. (2017). Domain-Agnostic Discovery of Similarities and Concepts at Scale (7ed.). Knowledge and Information Systems, 51, 531-560
Open this publication in new window or tab >>Domain-Agnostic Discovery of Similarities and Concepts at Scale
2017 (English)In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 51, p. 531-560Article in journal (Refereed) Published
Abstract [en]

Appropriately defining and efficiently calculating similarities from large data sets are often essential in data mining, both for gaining understanding of data and generating processes, and for building tractable representations. Given a set of objects and their correlations, we here rely on the premise that each object is characterized by its context, i.e. its correlations to the other objects. The similarity between two objects can then be expressed in terms of the similarity between their contexts. In this way, similarity pertains to the general notion that objects are similar if they are exchangeable in the data. We propose a scalable approach for calculating all relevant similarities among objects by relating them in a correlation graph that is transformed to a similarity graph. These graphs can express rich structural properties among objects. Specifically, we show that concepts - abstractions of objects - are constituted by groups of similar objects that can be discovered by clustering the objects in the similarity graph. These principles and methods are applicable in a wide range of fields, and will here be demonstrated in three domains: computational linguistics, music and molecular biology, where the numbers of objects and correlations range from small to very large.

Place, publisher, year, edition, pages
Springer, 2017 Edition: 7
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-24561 (URN)10.1007/s10115-016-0984-2 (DOI)2-s2.0-84984793995 (Scopus ID)
Note

This paper is an extended version of Görnerup, O., Gillblad, D. and Vasiloudis, T. (2015), Knowing an object by the company it keeps: A domain-agnostic scheme for similarity discovery, in "IEEE International Conference on Data Mining (ICDM 2015)".

Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2019-01-07Bibliographically approved
Vasiloudis, T., Vahabi, H., Kravitz, R. & Rashkov, V. (2017). Predicting session length in media streaming. In: SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval: . Paper presented at 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, 7 August 2017 through 11 August 2017 (pp. 977-980).
Open this publication in new window or tab >>Predicting session length in media streaming
2017 (English)In: SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, p. 977-980Conference paper, Published paper (Refereed)
Abstract [en]

Session length is a very important aspect in determining a user's satisfaction with a media streaming service. Being able to predict how long a session will last can be of great use for various downstream tasks, such as recommendations and ad scheduling. Most of the related literature on user interaction duration has focused on dwell time for websites, usually in the context of approximating post-click satisfaction either in search results, or display ads. In this work we present the first analysis of session length in a mobile-focused online service, using a real world data-set from a major music streaming service.We use survival analysis techniques to show that the characteristics of the length distributions can differ significantly between users, and use gradient boosted trees with appropriate objectives to predict the length of a session using only information available at its beginning. Our evaluation on real world data illustrates that our proposed technique outperforms the considered baseline. © 2017 Copyright held by the owner/author(s).

Keywords
Dwell Time, Session Length, Survival Analysis, User Behavior, Behavioral research, Bioinformatics, Forecasting, Information retrieval, Trees (mathematics), Length distributions, Media streaming services, User behaviors, User interaction, User's satisfaction, Media streaming
National Category
Natural Sciences
Identifiers
urn:nbn:se:ri:diva-33211 (URN)10.1145/3077136.3080695 (DOI)2-s2.0-85029395373 (Scopus ID)9781450350228 (ISBN)
Conference
40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, 7 August 2017 through 11 August 2017
Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2019-01-22Bibliographically approved
Görnerup, O., Gillblad, D. & Vasiloudis, T. (2015). Knowing an Object by the Company It Keeps: A Domain-Agnostic Scheme for Similarity Discovery (18ed.). In: 2015 IEEE International Conference on Data Mining: . Paper presented at 15th IEEE International Conference on Data Mining (ICDM 2015), November 14-17, 2015, Atlantic City, US (pp. 121-130). , Article ID 7373316.
Open this publication in new window or tab >>Knowing an Object by the Company It Keeps: A Domain-Agnostic Scheme for Similarity Discovery
2015 (English)In: 2015 IEEE International Conference on Data Mining, 2015, 18, p. 121-130, article id 7373316Conference paper, Published paper (Refereed)
Abstract [en]

Appropriately defining and then efficiently calculating similarities from large data sets are often essential in data mining, both for building tractable representations and for gaining understanding of data and generating processes. Here we rely on the premise that given a set of objects and their correlations, each object is characterized by its context, i.e. its correlations to the other objects, and that the similarity between two objects therefore can be expressed in terms of the similarity between their respective contexts. Resting on this principle, we propose a data-driven and highly scalable approach for discovering similarities from large data sets by representing objects and their relations as a correlation graph that is transformed to a similarity graph. Together these graphs can express rich structural properties among objects. Specifically, we show that concepts - representations of abstract ideas and notions - are constituted by groups of similar objects that can be identified by clustering the objects in the similarity graph. These principles and methods are applicable in a wide range of domains, and will here be demonstrated for three distinct types of objects: codons, artists and words, where the numbers of objects and correlations range from small to very large.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-24463 (URN)10.1109/ICDM.2015.85 (DOI)978-1-4673-9504-5 (ISBN)
Conference
15th IEEE International Conference on Data Mining (ICDM 2015), November 14-17, 2015, Atlantic City, US
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2019-07-11Bibliographically approved
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8180-7521

Search in DiVA

Show all publications
v. 2.35.7