Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
DeGPar: Large Scale Topic Detection Using Node-Cut Partitioning on Dense Weighted Graphs
KTH Royal Institute of Techniology, Sweden.
KTH Royal Institute of Techniology, Sweden.ORCID-id: 0000-0003-4516-7317
RISE - Research Institutes of Sweden, ICT, SICS.
2017 (Engelska)Ingår i: Proceedings - International Conference on Distributed Computing Systems, 2017, s. 775-785Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Topic Detection (TD) refers to automatic techniques for locating topically related material in web documents. Nowadays, massive amounts of documents are generated by users of Online Social Networks (OSNs), in form of very short text, tweets and snippets of news. While topic detection, in its traditional form, is applied to a few documents containing a lot of information, the problem has now changed to dealing with massive number of documents with very little information. The traditional solutions, thus, fall short either in scalability (due to huge number of input items) or sparsity (due to insufficient information per input item). In this paper we address the scalability problem by introducing an efficient and scalable graph based algorithm for TD on short texts, leveraging dimensionality reduction and clustering techniques. We first, compress the input set of documents into a dense graph, such that frequent cooccurrence patterns in the documents create multiple dense topological areas in the graph. Then, we partition the graph into multiple dense sub-graphs, each representing a topic. We compare the accuracy and scalability of our solution with two state-of-the-art solutions (including the standard LDA, and BiTerm). The results on two widely used benchmark datasets show that our algorithm not only maintains a similar or better accuracy, but also performs by an order of magnitude faster than the state-of-the-art approaches.

Ort, förlag, år, upplaga, sidor
2017. s. 775-785
Nyckelord [en]
Dense Weighted Graph Partitioning, Dimensionality Reduction, Distributed Algorithms, Node-cut Graph Partitioning, Online Social Networks, Random Indexing, Topic Detection, Clustering algorithms, Distributed computer systems, Graphic methods, Parallel algorithms, Scalability, Scales (weighing instruments), Social networking (online), Topology, Graph Partitioning, On-line social networks, Weighted graph, Graph theory
Nationell ämneskategori
Naturvetenskap
Identifikatorer
URN: urn:nbn:se:ri:diva-30836DOI: 10.1109/ICDCS.2017.19Scopus ID: 2-s2.0-85027258993ISBN: 9781538617915 (tryckt)OAI: oai:DiVA.org:ri-30836DiVA, id: diva2:1139350
Konferens
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, 5 June 2017 through 8 June 2017
Tillgänglig från: 2017-09-07 Skapad: 2017-09-07 Senast uppdaterad: 2023-06-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Girdzijauskas, Sarunas

Sök vidare i DiVA

Av författaren/redaktören
Girdzijauskas, Sarunas
Av organisationen
SICS
Naturvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 25 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf