Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 46) Show all publications
Espinoza, F., Hamfors, O., Karlgren, J., Olsson, F., Persson, P., Hamberg, L. & Sahlgren, M. (2018). Analysis of Open Answers to Survey Questions through Interactive Clustering and Theme Extraction. In: : . Paper presented at Proceedings of the 2018 Conference on Human Information Interaction & Retrieval. New Brunswick, NJ, USA (pp. 317-320).
Open this publication in new window or tab >>Analysis of Open Answers to Survey Questions through Interactive Clustering and Theme Extraction
Show others...
2018 (English)Conference paper, Published paper (Other academic)
Abstract [en]

Œis paper describes design principles for and the implementationof Gavagai Explorer—a new application which builds on interactivetext clustering to extract themes from topically coherent text setssuch as open text answers to surveys or questionnaires.An automated system is quick, consistent, and has full coverageover the study material. A system allows an analyst to analyze moreanswers in a given time period; provides the same initial resultsregardless of who does the analysis, reducing the risks of interraterdiscrepancy; and does not risk miss responses due to fatige orboredom. Œese factors reduce the cost and increase the reliabilityof the service. Œe most important feature, however, is relievingthe human analyst from the frustrating aspects of the coding task,freeing the e‚ort to the central challenge of understanding themes.Gavagai Explorer is available on-line at hŠp://explorer.gavagai.se

Keywords
Information systems → Clustering; Online analytical processing;
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-34879 (URN)10.1145/3176349.3176892 (DOI)978-1-4503-4925-3 (ISBN)
Conference
Proceedings of the 2018 Conference on Human Information Interaction & Retrieval. New Brunswick, NJ, USA
Available from: 2018-08-21 Created: 2018-08-21 Last updated: 2018-08-21Bibliographically approved
Sandin, F., Emruli, B. & Sahlgren, M. (2017). Random indexing of multidimensional data. Knowledge and Information Systems, 52(1), 267-290
Open this publication in new window or tab >>Random indexing of multidimensional data
2017 (English)In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 52, no 1, p. 267-290Article in journal (Refereed) Published
Abstract [en]

Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided. © 2016, The Author(s).

Keywords
Data mining, Dimensionality reduction, Natural language processing, Random embeddings, Semantic similarity, Sparse coding, Streaming algorithm
National Category
Natural Sciences
Identifiers
urn:nbn:se:ri:diva-30273 (URN)10.1007/s10115-016-1012-2 (DOI)2-s2.0-85001755138 (Scopus ID)
Available from: 2017-08-11 Created: 2017-08-11 Last updated: 2018-08-21Bibliographically approved
Karlgren, J., Eriksson, G., Täckström, O. & Sahlgren, M. (2010). Between Bags and Trees - Constructional Patterns in Text Used for Attitude Identification (13ed.). In: : . Paper presented at ECIR 2010, 32nd European Conference on Information Retrieval, March 28-31, 2010, Milton Keynes, Great Britain.
Open this publication in new window or tab >>Between Bags and Trees - Constructional Patterns in Text Used for Attitude Identification
2010 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes experiments to use non-terminological information to find attitudinal expressions in written English text. The experiments are based on an analysis of text with respect to not only the vocabulary of content terms present in it (which most other approaches use as a basis for analysis) but also with respect to presence of structural features of the text represented by constructional features (typically disregarded by most other analyses). In our analysis, following a construction grammar framework, structural features are treated as occurrences, similarly to the treatment of vocabulary features. The constructional features in play are chosen to potentially signify opinion but are not specific to negative or positive expressions. The framework is used to classify clauses, headlines, and sentences from three different shared collections of attitudinal data. We find that constructional features transfer well across different text collections and that the information couched in them integrates easily with a vocabulary based approach, yielding improvements in classification without complicating the application end of the processing framework.

Keywords
NLP for IR, Text Categorization, Clustering, Opinion mining, Sentiment Analysis, Sentiment analysis, Constructional features
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23619 (URN)
Conference
ECIR 2010, 32nd European Conference on Information Retrieval, March 28-31, 2010, Milton Keynes, Great Britain
Projects
Attityd
Note

The original publication will be available at www.springerlink.com

Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Recchia, G., Jones, M., Sahlgren, M. & Kanerva, P. (2010). Encoding Sequential Information in Vector Space Models of Semantics: Comparing Holographic Reduced Representation and Random Permutation (11ed.). In: : . Paper presented at Proceedings of the 32nd Annual Cognitive Science Society (pp. 865-870).
Open this publication in new window or tab >>Encoding Sequential Information in Vector Space Models of Semantics: Comparing Holographic Reduced Representation and Random Permutation
2010 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Encoding information about the order in which words typically appear has been shown to improve the performance of high-dimensional semantic space models. This requires an encoding operation capable of binding together vectors in an order-sensitive way, and efficient enough to scale to large text corpora. Although both circular convolution and random permutations have been enlisted for this purpose in semantic models, these operations have never been systematically compared. In Experiment 1 we compare their storage capacity and probability of correct retrieval; in Experiments 2 and 3 we compare their performance on semantic tasks when integrated into existing models. We conclude that random permutations are a scalable alternative to circular convolution with several desirable properties.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23736 (URN)
Conference
Proceedings of the 32nd Annual Cognitive Science Society
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Sahlgren, M. & Knutsson, O. (2010). Workshop on Extracting and Using Constructions in Computational Linguistics (7ed.). Los Angeles, California, USA: ACL
Open this publication in new window or tab >>Workshop on Extracting and Using Constructions in Computational Linguistics
2010 (English)Book (Refereed)
Place, publisher, year, edition, pages
Los Angeles, California, USA: ACL, 2010 Edition: 7
Series
NAACL HLT 2010
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23780 (URN)
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Sahlgren, M. & Knutsson, O. (2009). Proceedings of the workshop on extracting and using constructions in NLP (1ed.). Kista, Sweden: Swedish Institute of Computer Science
Open this publication in new window or tab >>Proceedings of the workshop on extracting and using constructions in NLP
2009 (English)Report (Other academic)
Abstract [en]

This is a collection of papers presented at the Nodalida 2009 workshop on extracting and using constructions in NLP.

Place, publisher, year, edition, pages
Kista, Sweden: Swedish Institute of Computer Science, 2009. p. 37 Edition: 1
Series
SICS Technical Report, ISSN 1100-3154 ; 2009:10
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23526 (URN)
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Sahlgren, M. & Karlgren, J. (2009). Terminology mining in social media (1ed.). In: : . Paper presented at The 18th ACM Conference on Information and Knowledge Management (CIKM 2009), 2-5 November 2009, Hong Kong.
Open this publication in new window or tab >>Terminology mining in social media
2009 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exemplifies a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining.

Publisher
p. 10
Keywords
Word Space, Distributional Semantics, Random Indexing, Terminology Mining, Social Media, Language Technology, Linguistics
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-23546 (URN)
Conference
The 18th ACM Conference on Information and Knowledge Management (CIKM 2009), 2-5 November 2009, Hong Kong
Projects
Attityd
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Täckström, O., Bergh, C., Sahlgren, M., Sjölinder, M., Södersten, P. & Zandian, M. (2008). An Embodied question answering system for use in the treatment of eating disorders (1ed.). In: Proceedings of The 4th International Workshop on Human-Computer Conversation: . Paper presented at The Fourth International Workshop on Human-Computer Conversation, 6-7 October 2008, Bellagio, Italy.
Open this publication in new window or tab >>An Embodied question answering system for use in the treatment of eating disorders
Show others...
2008 (English)In: Proceedings of The 4th International Workshop on Human-Computer Conversation, 2008, 1, , p. 4Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents work in progress on implementing an embodied question answering system, Dr. Cecilia, in the form of a virtual caregiver, for use in the treatment of eating disorders. The rationale for the system is grounded in one of the few effective treatments for anorexia and bulimia nervosa. The questions and answers database is encoded using natural language, and is easily updatable by human caregivers without any technical expertise. Matching of users' questions with database entries is performed using a weighted and normalized n-gram similarity function. In this paper we give a comprehensive background to and an overview of the system, with a focus on aspects pertaining to natural language processing and user interaction. The system is currently only implemented for Swedish.

Publisher
p. 4
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-22914 (URN)
Conference
The Fourth International Workshop on Human-Computer Conversation, 6-7 October 2008, Bellagio, Italy
Projects
KARMA
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Sahlgren, M. & Karlgren, J. (2008). Buzz monitoring in word space (1ed.). In: : . Paper presented at European Conference on Intelligence and Security Informatics (EuroISI 2008), 3-5 December 2008, Esbjerg, Denmark.
Open this publication in new window or tab >>Buzz monitoring in word space
2008 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source - a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-22912 (URN)
Conference
European Conference on Intelligence and Security Informatics (EuroISI 2008), 3-5 December 2008, Esbjerg, Denmark
Projects
Attityd
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Karlgren, J., Holst, A. & Sahlgren, M. (2008). Filaments of Meaning in Word Space (1ed.). In: : . Paper presented at European Conference on Information Retrieval, 30 March - 3 April 2008, Glasgow, Scotland.
Open this publication in new window or tab >>Filaments of Meaning in Word Space
2008 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dimensionality of typical vector space models lead to unintuitive effects on modeling likeness of meaning and that the local structure of word spaces is where interesting semantic relations reside. We show that the local structure of word spaces has substantially different dimensionality and character than the global space and that this structure shows potential to be exploited for further semantic analysis using methods for local analysis of vector space structure rather than globally scoped methods typically in use today such as singular value decomposition or principal component analysis.

Publisher
p. 8
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:ri:diva-22250 (URN)
Conference
European Conference on Information Retrieval, 30 March - 3 April 2008, Glasgow, Scotland
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2018-08-21Bibliographically approved
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5100-0535

Search in DiVA

Show all publications
v. 2.35.4