DeGPar: Large Scale Topic Detection Using Node-Cut Partitioning on Dense Weighted Graphs
2017 (English)In: Proceedings - International Conference on Distributed Computing Systems, 2017, p. 775-785Conference paper, Published paper (Refereed)
Abstract [en]
Topic Detection (TD) refers to automatic techniques for locating topically related material in web documents. Nowadays, massive amounts of documents are generated by users of Online Social Networks (OSNs), in form of very short text, tweets and snippets of news. While topic detection, in its traditional form, is applied to a few documents containing a lot of information, the problem has now changed to dealing with massive number of documents with very little information. The traditional solutions, thus, fall short either in scalability (due to huge number of input items) or sparsity (due to insufficient information per input item). In this paper we address the scalability problem by introducing an efficient and scalable graph based algorithm for TD on short texts, leveraging dimensionality reduction and clustering techniques. We first, compress the input set of documents into a dense graph, such that frequent cooccurrence patterns in the documents create multiple dense topological areas in the graph. Then, we partition the graph into multiple dense sub-graphs, each representing a topic. We compare the accuracy and scalability of our solution with two state-of-the-art solutions (including the standard LDA, and BiTerm). The results on two widely used benchmark datasets show that our algorithm not only maintains a similar or better accuracy, but also performs by an order of magnitude faster than the state-of-the-art approaches.
Place, publisher, year, edition, pages
2017. p. 775-785
Keywords [en]
Dense Weighted Graph Partitioning, Dimensionality Reduction, Distributed Algorithms, Node-cut Graph Partitioning, Online Social Networks, Random Indexing, Topic Detection, Clustering algorithms, Distributed computer systems, Graphic methods, Parallel algorithms, Scalability, Scales (weighing instruments), Social networking (online), Topology, Graph Partitioning, On-line social networks, Weighted graph, Graph theory
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-30836DOI: 10.1109/ICDCS.2017.19Scopus ID: 2-s2.0-85027258993ISBN: 9781538617915 (print)OAI: oai:DiVA.org:ri-30836DiVA, id: diva2:1139350
Conference
37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, 5 June 2017 through 8 June 2017
2017-09-072017-09-072023-06-07Bibliographically approved