Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeGPar: Large Scale Topic Detection Using Node-Cut Partitioning on Dense Weighted Graphs
KTH Royal Institute of Techniology, Sweden.
KTH Royal Institute of Techniology, Sweden.ORCID iD: 0000-0003-4516-7317
RISE - Research Institutes of Sweden, ICT, SICS.
2017 (English)In: Proceedings - International Conference on Distributed Computing Systems, 2017, p. 775-785Conference paper, Published paper (Refereed)
Abstract [en]

Topic Detection (TD) refers to automatic techniques for locating topically related material in web documents. Nowadays, massive amounts of documents are generated by users of Online Social Networks (OSNs), in form of very short text, tweets and snippets of news. While topic detection, in its traditional form, is applied to a few documents containing a lot of information, the problem has now changed to dealing with massive number of documents with very little information. The traditional solutions, thus, fall short either in scalability (due to huge number of input items) or sparsity (due to insufficient information per input item). In this paper we address the scalability problem by introducing an efficient and scalable graph based algorithm for TD on short texts, leveraging dimensionality reduction and clustering techniques. We first, compress the input set of documents into a dense graph, such that frequent cooccurrence patterns in the documents create multiple dense topological areas in the graph. Then, we partition the graph into multiple dense sub-graphs, each representing a topic. We compare the accuracy and scalability of our solution with two state-of-the-art solutions (including the standard LDA, and BiTerm). The results on two widely used benchmark datasets show that our algorithm not only maintains a similar or better accuracy, but also performs by an order of magnitude faster than the state-of-the-art approaches.

Place, publisher, year, edition, pages
2017. p. 775-785
Keywords [en]
Dense Weighted Graph Partitioning, Dimensionality Reduction, Distributed Algorithms, Node-cut Graph Partitioning, Online Social Networks, Random Indexing, Topic Detection, Clustering algorithms, Distributed computer systems, Graphic methods, Parallel algorithms, Scalability, Scales (weighing instruments), Social networking (online), Topology, Graph Partitioning, On-line social networks, Weighted graph, Graph theory
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-30836DOI: 10.1109/ICDCS.2017.19Scopus ID: 2-s2.0-85027258993ISBN: 9781538617915 (print)OAI: oai:DiVA.org:ri-30836DiVA, id: diva2:1139350
Conference
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, 5 June 2017 through 8 June 2017
Available from: 2017-09-07 Created: 2017-09-07 Last updated: 2023-06-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Girdzijauskas, Sarunas

Search in DiVA

By author/editor
Girdzijauskas, Sarunas
By organisation
SICS
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 25 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf