Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Domain-Agnostic Discovery of Similarities and Concepts at Scale
RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.ORCID iD: 0000-0001-9244-4546
RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.ORCID iD: 0000-0001-8952-3542
RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.ORCID iD: 0000-0002-8180-7521
2017 (English)In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 51, p. 531-560Article in journal (Refereed) Published
Abstract [en]

Appropriately defining and efficiently calculating similarities from large data sets are often essential in data mining, both for gaining understanding of data and generating processes, and for building tractable representations. Given a set of objects and their correlations, we here rely on the premise that each object is characterized by its context, i.e. its correlations to the other objects. The similarity between two objects can then be expressed in terms of the similarity between their contexts. In this way, similarity pertains to the general notion that objects are similar if they are exchangeable in the data. We propose a scalable approach for calculating all relevant similarities among objects by relating them in a correlation graph that is transformed to a similarity graph. These graphs can express rich structural properties among objects. Specifically, we show that concepts - abstractions of objects - are constituted by groups of similar objects that can be discovered by clustering the objects in the similarity graph. These principles and methods are applicable in a wide range of fields, and will here be demonstrated in three domains: computational linguistics, music and molecular biology, where the numbers of objects and correlations range from small to very large.

Place, publisher, year, edition, pages
Springer, 2017, 7. Vol. 51, p. 531-560
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-24561DOI: 10.1007/s10115-016-0984-2Scopus ID: 2-s2.0-84984793995OAI: oai:DiVA.org:ri-24561DiVA, id: diva2:1043646
Note

This paper is an extended version of Görnerup, O., Gillblad, D. and Vasiloudis, T. (2015), Knowing an object by the company it keeps: A domain-agnostic scheme for similarity discovery, in "IEEE International Conference on Data Mining (ICDM 2015)".

Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2023-06-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttp

Authority records

Görnerup, OlofGillblad, DanielVasiloudis, Theodore

Search in DiVA

By author/editor
Görnerup, OlofGillblad, DanielVasiloudis, Theodore
By organisation
Decisions, Networks and Analytics lab
In the same journal
Knowledge and Information Systems
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 43 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf