Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
GDTM: Graph-based Dynamic Topic Models
KTH Royal Institute of Technology, Sweden.
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0001-5100-0535
2020 (English)In: Progress in Artificial Intelligence, ISSN 2192-6352, E-ISSN 2192-6360, Vol. 9, p. 195-207Article in journal (Refereed) Published
Abstract [en]

Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. It requires to be scalable and to be able to account for sparsity and dynamicity of short texts. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or Pitman-Yor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability of DTM. However, these methods basically rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimizations, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation method with graph partitioning to address scalability and dynamicity and uses a rich language model to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art models. In result, GDTM outperforms the best model by 11 % on accuracy and performs by an order of magnitude faster while creating four times better topic quality over standard evaluation metrics. © 2020, The Author(s).

Place, publisher, year, edition, pages
Springer , 2020. Vol. 9, p. 195-207
Keywords [en]
Dimensionality reduction, Distributional semantics, Graph partitioning, Language modeling, Topic modeling
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-45108DOI: 10.1007/s13748-020-00206-2Scopus ID: 2-s2.0-85085024680OAI: oai:DiVA.org:ri-45108DiVA, id: diva2:1443240
Available from: 2020-06-18 Created: 2020-06-18 Last updated: 2024-06-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Sahlgren, Magnus

Search in DiVA

By author/editor
Sahlgren, Magnus
By organisation
Data Science
In the same journal
Progress in Artificial Intelligence
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 122 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf