Change search
Refine search result
1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Görnerup, Olof
    et al.
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Gillblad, Daniel
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Vasiloudis, Theodore
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Domain-Agnostic Discovery of Similarities and Concepts at Scale2017In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 51, p. 531-560Article in journal (Refereed)
    Abstract [en]

    Appropriately defining and efficiently calculating similarities from large data sets are often essential in data mining, both for gaining understanding of data and generating processes, and for building tractable representations. Given a set of objects and their correlations, we here rely on the premise that each object is characterized by its context, i.e. its correlations to the other objects. The similarity between two objects can then be expressed in terms of the similarity between their contexts. In this way, similarity pertains to the general notion that objects are similar if they are exchangeable in the data. We propose a scalable approach for calculating all relevant similarities among objects by relating them in a correlation graph that is transformed to a similarity graph. These graphs can express rich structural properties among objects. Specifically, we show that concepts - abstractions of objects - are constituted by groups of similar objects that can be discovered by clustering the objects in the similarity graph. These principles and methods are applicable in a wide range of fields, and will here be demonstrated in three domains: computational linguistics, music and molecular biology, where the numbers of objects and correlations range from small to very large.

  • 2.
    Görnerup, Olof
    et al.
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Gillblad, Daniel
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Vasiloudis, Theodore
    RISE, Swedish ICT, SICS.
    Knowing an Object by the Company It Keeps: A Domain-Agnostic Scheme for Similarity Discovery2015In: 2015 IEEE International Conference on Data Mining, 2015, 18, p. 121-130, article id 7373316Conference paper (Refereed)
    Abstract [en]

    Appropriately defining and then efficiently calculating similarities from large data sets are often essential in data mining, both for building tractable representations and for gaining understanding of data and generating processes. Here we rely on the premise that given a set of objects and their correlations, each object is characterized by its context, i.e. its correlations to the other objects, and that the similarity between two objects therefore can be expressed in terms of the similarity between their respective contexts. Resting on this principle, we propose a data-driven and highly scalable approach for discovering similarities from large data sets by representing objects and their relations as a correlation graph that is transformed to a similarity graph. Together these graphs can express rich structural properties among objects. Specifically, we show that concepts - representations of abstract ideas and notions - are constituted by groups of similar objects that can be identified by clustering the objects in the similarity graph. These principles and methods are applicable in a wide range of domains, and will here be demonstrated for three distinct types of objects: codons, artists and words, where the numbers of objects and correlations range from small to very large.

    Download full text (pdf)
    FULLTEXT01
  • 3.
    Vasiloudis, Theodore
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Beligianni, Foteini
    KTH Royal Institute of Technology, Sweden.
    De Francisci Morales, Gianmarco
    Qatar Computing Research Institute, Qatar.
    BoostVHT: Boosting distributed streaming decision trees2017In: International Conference on Information and Knowledge Management, Proceedings, 2017, p. 899-908Conference paper (Refereed)
    Abstract [en]

    Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art. © 2017 Copyright held by the owner/author(s).

  • 4.
    Vasiloudis, Theodore
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Cho, Hyunsu
    Amazon Web Services, US.
    Boström, Henrik
    KTH Royal Institute of Technology, Sweden .
    Block-distributed gradient boosted trees2019In: SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc , 2019, p. 1025-1028Conference paper (Refereed)
    Abstract [en]

    The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication. 

  • 5.
    Vasiloudis, Theodore
    et al.
    RISE - Research Institutes of Sweden (2017-2019), ICT, SICS.
    de Francisci Morales, G.
    ISI Foundation, Italy.
    Boström, H.
    KTH Royal Institute of Technology, Sweden.
    Quantifying uncertainty in online regression forests2019In: Journal of machine learning research, ISSN 1532-4435, E-ISSN 1533-7928, Vol. 20Article in journal (Refereed)
    Abstract [en]

    Accurately quantifying uncertainty in predictions is essential for the deployment of machine learning algorithms in critical applications where mistakes are costly. Most approaches to quantifying prediction uncertainty have focused on settings where the data is static, or bounded. In this paper, we investigate methods that quantify the prediction uncertainty in a streaming setting, where the data is potentially unbounded. We propose two meta-algorithms that produce prediction intervals for online regression forests of arbitrary tree models; one based on conformal prediction, and the other based on quantile regression. We show that the approaches are able to maintain specified error rates, with constant computational cost per example and bounded memory usage. We provide empirical evidence that the methods outperform the state-of-the-art in terms of maintaining error guarantees, while being an order of magnitude faster. We also investigate how the algorithms are able to recover from concept drift. ©c 2019 Theodore Vasiloudis, Gianmarco De Francisci Morales, Henrik Boström.

  • 6.
    Vasiloudis, Theodore
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Vahabi, Hossein
    Pandora Media Inc, USA.
    Kravitz, Ross
    Pandora Media Inc, USA.
    Rashkov, Valery
    Pandora Media Inc, USA.
    Predicting session length in media streaming2017In: SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, p. 977-980Conference paper (Refereed)
    Abstract [en]

    Session length is a very important aspect in determining a user's satisfaction with a media streaming service. Being able to predict how long a session will last can be of great use for various downstream tasks, such as recommendations and ad scheduling. Most of the related literature on user interaction duration has focused on dwell time for websites, usually in the context of approximating post-click satisfaction either in search results, or display ads. In this work we present the first analysis of session length in a mobile-focused online service, using a real world data-set from a major music streaming service.We use survival analysis techniques to show that the characteristics of the length distributions can differ significantly between users, and use gradient boosted trees with appropriate objectives to predict the length of a session using only information available at its beginning. Our evaluation on real world data illustrates that our proposed technique outperforms the considered baseline. © 2017 Copyright held by the owner/author(s).

1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf