Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
DeGPar: Large Scale Topic Detection Using Node-Cut Partitioning on Dense Weighted Graphs
KTH Royal Institute of Techniology, Sweden.
KTH Royal Institute of Techniology, Sweden.ORCID-id: 0000-0003-4516-7317
RISE - Research Institutes of Sweden, ICT, SICS.
2017 (engelsk)Inngår i: Proceedings - International Conference on Distributed Computing Systems, 2017, s. 775-785Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Topic Detection (TD) refers to automatic techniques for locating topically related material in web documents. Nowadays, massive amounts of documents are generated by users of Online Social Networks (OSNs), in form of very short text, tweets and snippets of news. While topic detection, in its traditional form, is applied to a few documents containing a lot of information, the problem has now changed to dealing with massive number of documents with very little information. The traditional solutions, thus, fall short either in scalability (due to huge number of input items) or sparsity (due to insufficient information per input item). In this paper we address the scalability problem by introducing an efficient and scalable graph based algorithm for TD on short texts, leveraging dimensionality reduction and clustering techniques. We first, compress the input set of documents into a dense graph, such that frequent cooccurrence patterns in the documents create multiple dense topological areas in the graph. Then, we partition the graph into multiple dense sub-graphs, each representing a topic. We compare the accuracy and scalability of our solution with two state-of-the-art solutions (including the standard LDA, and BiTerm). The results on two widely used benchmark datasets show that our algorithm not only maintains a similar or better accuracy, but also performs by an order of magnitude faster than the state-of-the-art approaches.

sted, utgiver, år, opplag, sider
2017. s. 775-785
Emneord [en]
Dense Weighted Graph Partitioning, Dimensionality Reduction, Distributed Algorithms, Node-cut Graph Partitioning, Online Social Networks, Random Indexing, Topic Detection, Clustering algorithms, Distributed computer systems, Graphic methods, Parallel algorithms, Scalability, Scales (weighing instruments), Social networking (online), Topology, Graph Partitioning, On-line social networks, Weighted graph, Graph theory
HSV kategori
Identifikatorer
URN: urn:nbn:se:ri:diva-30836DOI: 10.1109/ICDCS.2017.19Scopus ID: 2-s2.0-85027258993ISBN: 9781538617915 (tryckt)OAI: oai:DiVA.org:ri-30836DiVA, id: diva2:1139350
Konferanse
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, 5 June 2017 through 8 June 2017
Tilgjengelig fra: 2017-09-07 Laget: 2017-09-07 Sist oppdatert: 2023-06-07bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Girdzijauskas, Sarunas

Søk i DiVA

Av forfatter/redaktør
Girdzijauskas, Sarunas
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 27 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
v. 2.45.0