Change search
Refine search result
1 - 50 of 50
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Argaw, Atelach Alemu
    et al.
    Asker, Lars
    Cöster, Rickard
    RISE, Swedish ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Dictionary-based Amharic-French Information Retrieval2006Conference paper (Refereed)
  • 2. Bigert, Johnny
    et al.
    Sjöbergh, Jonas
    Knutsson, Ola
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Unsupervised Evaluation of Parser Robustness2005Conference paper (Refereed)
  • 3.
    Boman, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS. KTH Royal Institute of Technology, Sweden.
    Ben Abdesslem, Fehmi
    RISE - Research Institutes of Sweden, ICT, SICS.
    Forsell, Erik
    Karolinska Institute, Sweden; Stockholm County Council, Sweden.
    Gillblad, Daniel
    RISE - Research Institutes of Sweden, ICT, SICS.
    Görnerup, Olof
    RISE - Research Institutes of Sweden, ICT, SICS.
    Isacsson, Nils
    Karolinska Institute, Sweden; Stockholm County Council, Sweden.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Kaldo, Viktor
    Karolinska Institute, Sweden; Stockholm County Council, Sweden; Linnaeus University, Sweden.
    Learning machines in Internet-delivered psychological treatment2019In: Progress in Artificial Intelligence, ISSN 2192-6352Article in journal (Refereed)
    Abstract [en]

    A learning machine, in the form of a gating network that governs a finite number of different machine learning methods, is described at the conceptual level with examples of concrete prediction subtasks. A historical data set with data from over 5000 patients in Internet-based psychological treatment will be used to equip healthcare staff with decision support for questions pertaining to ongoing and future cases in clinical care for depression, social anxiety, and panic disorder. The organizational knowledge graph is used to inform the weight adjustment of the gating network and for routing subtasks to the different methods employed locally for prediction. The result is an operational model for assisting therapists in their clinical work, about to be subjected to validation in a clinical trial.

  • 4.
    Cöster, Rickard
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Selective compound splitting of Swedish queries for boolean combination of truncated terms2003Conference paper (Refereed)
    Abstract [en]

    In compounding languages such as Swedish, it is often neccessary to split compound words when indexing documents or queries. One of the problems is that it is difficult to find constituents that express a concept similar to that expressed by the compound. The approach taken here is to expand a query with the leading constituents of the compound words. Every query term is truncated so as to increase recall by hopefully finding other compounds with the leading constituent as prefix. This approach increase recall in a rather uncontrolled way, so we use a Boolean quorum-level type of search to rank documents both according to a tf-idf factor but also to the number of matching Boolean combinations. The Boolean combinations performed relatively well, taken into consideration that the queries were very short (maximum five search terms). Also included in this paper are the results of two other methods we are currently working on in our lab; one for re-ranking search results on the basis of stylistic analysis of documents, and one for dimensionality reduction using Random Indexing.

  • 5.
    Espinoza, Fredrik
    et al.
    Gavagai, Sweden .
    Hamfors, Ola
    Gavagai, Sweden .
    Karlgren, Jussi
    Gavagai, Sweden.
    Olsson, Fredrik
    Gavagai, Sweden .
    Persson, Per
    Gavagai, Sweden .
    Hamberg, Lars
    Gavagai, Sweden .
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS. Gavagai, Sweden .
    Analysis of Open Answers to Survey Questions through Interactive Clustering and Theme Extraction2018Conference paper (Other academic)
    Abstract [en]

    Œis paper describes design principles for and the implementationof Gavagai Explorer—a new application which builds on interactivetext clustering to extract themes from topically coherent text setssuch as open text answers to surveys or questionnaires.An automated system is quick, consistent, and has full coverageover the study material. A system allows an analyst to analyze moreanswers in a given time period; provides the same initial resultsregardless of who does the analysis, reducing the risks of interraterdiscrepancy; and does not risk miss responses due to fatige orboredom. Œese factors reduce the cost and increase the reliabilityof the service. Œe most important feature, however, is relievingthe human analyst from the frustrating aspects of the coding task,freeing the e‚ort to the central challenge of understanding themes.Gavagai Explorer is available on-line at hŠp://explorer.gavagai.se

  • 6.
    Gambäck, Björn
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Argaw, Atelach Alemu
    Asker, Lars
    Applying machine learning to Amharic text classification2006Conference paper (Refereed)
  • 7.
    Gambäck, Björn
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Hansen, Preben
    RISE, Swedish ICT, SICS.
    A spoken Swedish e-mail interface2003In: Proceedings of the 14th Nordic Conference of Computational Linguistics, 2003, 2Conference paper (Refereed)
    Abstract [en]

    The paper describes the Swedish involvement in the EU project DUMAS (Dynamic Universal Mobility for Adaptive Speech Interfaces), a project which aims at developing multilingual speech-based applications, and more specifically, investigating adaptive multilingual interaction techniques to handle both spoken and text input and to provide coordinated linguistic responses to the user. The project has a clear focus on Northern Europe with two of the eight partners coming from Sweden and four from Finland; and the languages we aim at treating are English, Swedish and Finnish. We will construct an agent-based generic framework for multilingual speech applications, supporting adaptivity to both the individual user and the particular domain. Applications based on the general architecture will benefit from the advantages of fault-tolerant semantic analysis, which combined with the dialogue management routines will handle user interaction in a very robust manner. As an initial such application, we are building a mobile phone-based e-mail interface that will deal with multilingual issues in several forms and environments, and whose functionality can be adapted to different users, different situations and tasks. Such a system produces speech output only (in the form of spoken responses and read e-mails) to the user, but gets two types of input: user speech and textual e-mail messages. It must be able to distinguish between languages, both in e-mails and in the user utterances. The contents of a user's inbox must be continuously analysed in order to enable advanced search functions.

  • 8.
    Gyllensten, Amaru Cuba
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Distributional term set expansion2018In: LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2018, p. 2554-2558Conference paper (Refereed)
    Abstract [en]

    This paper is a short empirical study of the performance of centrality and classification based iterative term set expansion methods for distributional semantic models. Iterative term set expansion is an interactive process using distributional semantics models where a user labels terms as belonging to some sought after term set, and a system uses this labeling to supply the user with new, candidate, terms to label, trying to maximize the number of positive examples found. While centrality based methods have a long history in term set expansion (Sarmento et al., 2007; Pantel et al., 2009), we compare them to classification methods based on the the Simple Margin method, an Active Learning approach to classification using Support Vector Machines (Tong and Koller, 2002). Examining the performance of various centrality and classification based methods for a variety of distributional models over five different term sets, we can show that active learning based methods consistently outperform centrality based methods.

  • 9.
    Gyllensten, Amaru Cuba
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Measuring Issue Ownership using Word Embeddings2018Conference paper (Other academic)
    Abstract [en]

    Sentiment and topic analysis are commonmethods used for social media monitoring.Essentially, these methods answers questionssuch as, “what is being talked about, regardingX”, and “what do people feel, regarding X”.In this paper, we investigate another venue forsocial media monitoring, namely issue ownership and agenda setting, which are conceptsfrom political science that have been used toexplain voter choice and electoral outcomes.We argue that issue alignment and agenda setting can be seen as a kind of semantic sourcesimilarity of the kind “how similar is sourceA to issue owner P, when talking about issue X”, and as such can be measured usingword/document embedding techniques. Wepresent work in progress towards measuringthat kind of conditioned similarity, and introduce a new notion of similarity for predictive embeddings. We then test this methodby measuring the similarity between politically aligned media and political pparties, conditioned on bloc-specific issues.

  • 10.
    Hansen, Preben
    et al.
    RISE, Swedish ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Cooperation, bookmarking, and thesaurus in interactive bilingual question answering2004In: Multilingual Information Access for Text, Speech and Images (5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17, 2004, Revised Selected Papers), Springer , 2004, 1, , p. 5p. 343-347Chapter in book (Refereed)
    Abstract [en]

    The study presented involves several different contextual aspects and is the latest in a continuing series of exploratory experiments on information access behaviour in a multi-lingual context [1, 2]. This year’s interactive cross-lingual information access experiment was designed to measure three parameters we expected would affect the performance of users in cross-lingual tasks in languages in which the users are less than fluent. Firstly, introducing new technology, we measure the effect of topic-tailored term expansion on query formulation. Secondly, introducing a new component in the interactive interface, we investigate - without measuring by using a control group - the effect of a bookmark panel on user confidence in the reported result. Thirdly, we ran subjects pair-wise and allowed them to communicate verbally, to investigate how people may cooperate and collaborate with a partner during a search session performing a similar but non-identical search task.

  • 11. Holmlund, Jon
    et al.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Creating Bilingual Lexica Using Reference Wordlists for Alignment of Monolingual Semantic Vector Spaces2005Conference paper (Refereed)
    Abstract [en]

    This paper proposes a novel method for automatically acquiring multi-lingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small {\em reference word list} of manually chosen reference points taken from available bi-lingual dictionaries. Other words can then be related to these reference points first in the one language and then in the other. In the present experiments, we apply the proposed method to comparable but non-parallel English-German data. The resulting bi-lingual lexicon is evaluated using an online English-German lexicon as gold standard. The results clearly demonstrate the viability of the proposed methodology.

  • 12.
    Holst, Anders
    et al.
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Dispersing the conceptual confusion2001Conference paper (Refereed)
    Abstract [en]

    In few subjects it is as easy to talk past each other as when discussing consciousness. Not only is the subject elusive and everyone has their own opinion of what it is all about; different people also make quite different use of words and language when discussing consciousness. This contribution tries to exemplify some common misunderstanding between people with different starting points and different use of language. The suggestion is that 'the problem of consciousness' is after all quite similar to all of us, although this is muddled by the way we talk about it, and the way we have locked ourselves into our different slogans and world views.

  • 13.
    Kanerva, Pentti
    et al.
    RISE, Swedish ICT, SICS.
    Sjödin, Gunnar
    RISE, Swedish ICT, SICS.
    Kristoferson, Jan
    RISE, Swedish ICT, SICS.
    Karlsson, R.
    Levin, Björn
    RISE - Research Institutes of Sweden, ICT, SICS.
    Holst, Anders
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Computing with large random patterns2001In: Foundations of Real-World Intelligence, Stanford, California: CSLI Publications , 2001, 1, p. 251-311Chapter in book (Refereed)
    Abstract [en]

    We describe a style of computing that differs from traditional numeric and symbolic computing and is suited for modeling neural networks. We focus on one aspect of ``neurocomputing,'' namely, computing with large random patterns, or high-dimensional random vectors, and ask what kind of computing they perform and whether they can help us understand how the brain processes information and how the mind works. Rapidly developing hardware technology will soon be able to produce the massive circuits that this style of computing requires. This chapter develops a theory on which the computing could be based.

  • 14.
    Karlgren, Jussi
    et al.
    RISE, Swedish ICT, SICS.
    Eriksson, Gunnar
    RISE, Swedish ICT, SICS.
    Täckström, Oscar
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Between Bags and Trees - Constructional Patterns in Text Used for Attitude Identification2010Conference paper (Refereed)
    Abstract [en]

    This paper describes experiments to use non-terminological information to find attitudinal expressions in written English text. The experiments are based on an analysis of text with respect to not only the vocabulary of content terms present in it (which most other approaches use as a basis for analysis) but also with respect to presence of structural features of the text represented by constructional features (typically disregarded by most other analyses). In our analysis, following a construction grammar framework, structural features are treated as occurrences, similarly to the treatment of vocabulary features. The constructional features in play are chosen to potentially signify opinion but are not specific to negative or positive expressions. The framework is used to classify clauses, headlines, and sentences from three different shared collections of attitudinal data. We find that constructional features transfer well across different text collections and that the information couched in them integrates easily with a vocabulary based approach, yielding improvements in classification without complicating the application end of the processing framework.

  • 15.
    Karlgren, Jussi
    et al.
    RISE, Swedish ICT, SICS.
    Holst, Anders
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Filaments of Meaning in Word Space2008Conference paper (Refereed)
    Abstract [en]

    Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dimensionality of typical vector space models lead to unintuitive effects on modeling likeness of meaning and that the local structure of word spaces is where interesting semantic relations reside. We show that the local structure of word spaces has substantially different dimensionality and character than the global space and that this structure shows potential to be exploited for further semantic analysis using methods for local analysis of vector space structure rather than globally scoped methods typically in use today such as singular value decomposition or principal component analysis.

  • 16.
    Karlgren, Jussi
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    From Words to Understanding2001In: Foundations of Real-World Intelligence, Stanford, California: CSLI Publications , 2001, 1, p. 294-308Chapter in book (Refereed)
  • 17.
    Karlgren, Jussi
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Vector-based semantic analysis using random indexing and morphological analysis for cross-lingual information retrieval2002In: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, Darmstadt, Germany, September 3 - 4, 2001, Springer-Verlag , 2002, 1, p. 169-176Chapter in book (Refereed)
    Abstract [en]

    Meaning, the main object of study in information access, is most decidedly situation-dependent. While much of meaning appears to achieve consistency across usage situations -- a term will seem to mean much the same thing in many of its contexts -- most everything can be negotiated on the go. Human processing appears to be flexible in this respect, and oriented towards learning from prototypes rather than learning by definition: learning new words, and adding new meanings or shades of meaning to an existing word does not need a formal re-training process. We have built a query expansion and translation tool for information retrieval systems. When used in one single language it will expand the terms of a query using a thesaurus built for that purpose; when used across languages it will provide numerous translations and near translations for the source language terms. The underlying technology we are testing is that of vector-based semantic analysis, an analysis method related to latent semantic indexing based on stochastic pattern computing. This paper will briefly describe how we acquired training data, aligned it, analyzed it using morphological analysis tools, and finally built a thesaurus using the data, but will concentrate on an overview of vector-based semantic analysis and how stochastic pattern computing differs from latent semantic indexing in its current form.

  • 18.
    Karlgren, Jussi
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Cöster, Rickard
    RISE, Swedish ICT, SICS.
    Weighting Query Terms Based on Distributional Statistics2006In: Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005: Revised Papers, 2006, 1, , p. 5Conference paper (Refereed)
    Abstract [en]

    This year, the SICS team has concentrated on query processing and on the internal topical structure of the query, specifically compound translation. Compound translation is non-trivial due to dependencies between compound elements. This year, we have investigated topical dependencies between query terms: if a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. The two experiments described here are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms globally across the entire collection; the other using the likelihood of individual terms to appear topically in individual texts. Both -- complementary -- boosting schemes tested delivered improved results.

  • 19.
    Karlgren, Jussi
    et al.
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Järvinen, Timo
    Cöster, Rickard
    RISE, Swedish ICT, SICS.
    Dynamic lexica for query translation2005In: Multilingual Information Access for Text, Speech and Images, Third Workshop of the Cross-Language Evaluation Forum (CLEF), 2005, 1Conference paper (Refereed)
    Abstract [en]

    This experiment tests a simple, scalable, and effective approach to building a domain-specific translation lexicon using distributional statistics over parallellized bilingual corpora. A bilingual lexicon is extracted from aligned Swedish-French data, used to translate CLEF topics from Swedish to French, which resulting French queries are then in turn used to retrieve documents from the French language CLEF collection. The results give 34 of fifty queries on or above median for the ``precision at 1000 documents'' recall oriented score; with many of the errors possible to handle by the use of string-matching and cognate search. We conclude that the approach presented here is a simple and efficient component in an automatic query translation system.

  • 20.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Sweden.
    Paradis, Carita
    Lund University, Sweden.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Kerren, Andreas
    Linnaeus University, Sweden.
    Active learning and visual analytics for stance classification with ALVA2017In: Academic Journal of Research in Business and Accounting, ISSN 2160-6455, E-ISSN 1084-6654, Vol. 7, no 3, article id 14Article in journal (Refereed)
    Abstract [en]

    The automatic detection and classification of stance (e.g., certainty or agreement) in text data using natural language processing and machine-learning methods creates an opportunity to gain insight into the speakers' attitudes toward their own and other people's utterances. However, identifying stance in text presents many challenges related to training data collection and classifier training. To facilitate the entire process of training a stance classifier, we propose a visual analytics approach, called ALVA, for text data annotation and visualization. ALVA's interplay with the stance classifier follows an active learning strategy to select suitable candidate utterances for manual annotaion. Our approach supports annotation process management and provides the annotators with a clean user interface for labeling utterances with multiple stance categories. ALVA also contains a visualization method to help analysts of the annotation and training process gain a better understanding of the categories used by the annotators. The visualization uses a novel visual representation, called CatCombos, which groups individual annotation items by the combination of stance categories. Additionally, our system makes a visualization of a vector space model available that is itself based on utterances. ALVA is already being used by our domain experts in linguistics and computational linguistics to improve the understanding of stance phenomena and to build a st  ance classifier for applications such as social media monitoring.

  • 21. Moscoso del Prado Martin, Fermin
    et al.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    An integration of vector-based semantic analysis and simple recurrent networks for the automatic acquisition of lexical representations from unlabeled corpora2002Conference paper (Refereed)
    Abstract [en]

    This study presents an integration of Simple Recurrent Networks to extract grammatical knowledge and Vector-Based Semantic Analysis to acquire semantic information from large corpora. Starting from a large, untagged sample of English text, we use Simple Recurrent Networks to extract morpho-syntactic vectors in an unsupervised way. These vectors are then used in place of random vectors to perform Vector-Based Semantic Analysis. In this way, we obtain rich lexical representations in the form of high-dimensional vectors that integrate morpho-syntactic and semantic information about words. Apart from incorporating data from the different levels, we argue how these vectors can be used to account for the particularities of each different word token of a given word type. The amount of lexical knowledge acquired by the technique is evaluated both by statistical analyses comparing the information contained in the vectors with existing `hand-crafted' lexical resources such as CELEX and WordNet, and by performance in language proficiency tests. We conclude by outlining the cognitive implications of this model and its potential use in the bootstrapping of lexical resources.

  • 22.
    Olsson, Fredrik
    et al.
    RISE, Swedish ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Hansen, Preben
    RISE, Swedish ICT, SICS.
    Svensson, Martin
    Cöster, Rickard
    RISE, Swedish ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Consensus and opinions; quality and churn2006Conference paper (Refereed)
    Abstract [en]

    The role of the web user is under transformation from merely being an information consumer to also being a content provider, ``from information age to participation age'', in the words of Sun CEO Scott McNealy. This increase in participation is most obviously manifested by the growth of online communities, weblogs (blogs), and various forms of cooperative and participatory publication of information. One main factor in the shift towards participation is the advent of authoring tools for wikipedias and blogs. Such tools have decreased the threshold for publishing material online considerably --- it is no longer necessary to have knowledge about the technical workings of the web to be able to use it for making information available to a massive number of potential readers. (Although the lion's share of information produced will probably remain in text form in the foreseeable future, it should be noted that other modalities, such as podcasts, screencasts, films and images, are increasingly attracting interest.) The dynamic nature of blogs and wikipedias poses new challenges to the field of information access and refinement; new theories, methods, and tools for alleviating the burden of digesting information on behalf of the readers are clearly needed. This paper presents some issues on readership and participation we are currently considering.

  • 23. Recchia, Gabriel
    et al.
    Jones, Michael
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Kanerva, Pentti
    Encoding Sequential Information in Vector Space Models of Semantics: Comparing Holographic Reduced Representation and Random Permutation2010Conference paper (Refereed)
    Abstract [en]

    Encoding information about the order in which words typically appear has been shown to improve the performance of high-dimensional semantic space models. This requires an encoding operation capable of binding together vectors in an order-sensitive way, and efficient enough to scale to large text corpora. Although both circular convolution and random permutations have been enlisted for this purpose in semantic models, these operations have never been systematically compared. In Experiment 1 we compare their storage capacity and probability of correct retrieval; in Experiments 2 and 3 we compare their performance on semantic tasks when integrated into existing models. We conclude that random permutations are a scalable alternative to circular convolution with several desirable properties.

  • 24.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    An Introduction to Random Indexing2005Conference paper (Refereed)
  • 25.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Automatic bilingual lexicon acquisition using random indexing of aligned bilingual data2004Conference paper (Refereed)
    Abstract [en]

    This paper presents a very simple and effective approach to automatic bilingual lexicon acquisition. The approach is cooccurrence-based, and uses the Random Indexing vector space methodology applied to aligned bilingual data. The approach is simple, efficient and scalable, and generate promising results when compared to a manually compiled lexicon. The paper also discusses some of the methodological problems with the prefered evaluation procedure.

  • 26.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Concept-based text representations for categorization problems2006In: ERCIM News, no 64Article in journal (Refereed)
  • 27.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Content-based adaptivity in multilingual dialogue systems2003Conference paper (Refereed)
  • 28.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Random indexing of linguistic units for vector-based semantic analysis2002In: ERCIM News, ISSN 0926-4981, E-ISSN 1564-0094, no 50Article in journal (Other (popular science, discussion, etc.))
    Abstract [en]

    The Stochastic Pattern Computing project at SICS studied the mathematical foundations of humanlike flexible information processing methods that compute with high-dimensional random vectors. The project ended in 2001 and led to the development of the Random Indexing technique for acquiring and representing semantic information about linguistic units.

  • 29.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Representing word meanings based on random labels2001Conference paper (Refereed)
  • 30.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    The Distributional Hypothesis2008In: Italian Journal of Disability Studies, ISSN 1120-2726, E-ISSN 2036-590X, Vol. 20, p. 33-53Article in journal (Refereed)
  • 31.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces2006Doctoral thesis, monograph (Other academic)
  • 32.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Towards a flexible model of word meaning2002Conference paper (Refereed)
    Abstract [en]

    We would like to build a model of semantic knowledge that have the capacity to acquire and represent semantic information that is ambiguous, vague and incomplete. Furthermore, the model should be able to acquire this knowledge in an unsupervised fashion from unstructured text data. Such a model needs to be both highly adaptive and very robust. In this submission, we will first try to identify some fundamental principles that a flexible model of word meaning must adhere to, and then present a possible implementation of these principles in a technique we call Random Indexing. We will also discuss current limitations of the technique and set the direction for future research.

  • 33.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Towards pertinent evaluation methodologies for word-space models2006Conference paper (Refereed)
  • 34.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Vector-based semantic analysis: representing word meanings based on random labels2001Conference paper (Refereed)
    Abstract [en]

    Vector-based semantic analysis is the practice of representing word meanings as semantic vectors, calculated from the co-occurrence statistics of words in large text data. This paper discusses the theoretical presumptions behind this practice, and a representational scheme based on the Distributional Hypothesis is identified as the rationale for vector-based semantic analysis. A new method for calculating semantic word vectors is then described. The method uses random labelling of words in narrow context windows to calculate semantic context vectors for each word type in the text data. The method is evaluated with a standardised synonym test, and it is shown that incorporating linguistic information in the context vectors can enhance the results.

  • 35.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Cöster, Rickard
    Using bag-of-concepts to improve the performance of support vector machines in text categorization2004Conference paper (Refereed)
    Abstract [en]

    This paper investigates the use of concept-based representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations constitute a viable supplement to word-based ones. We also demonstrate how the performance of the Support Vector Machine can be improved by combining representations.

  • 36.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Hansen, Preben
    RISE, Swedish ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    English-Japanese cross-lingual query expansion using random indexing of aligned bilingual text data2002Conference paper (Refereed)
    Abstract [en]

    Vector-space techniques can be used for extracting semantically similar words from the co-occurrence statistics of words in large text data. In this paper, we report on experiments with using the Random Indexing vector-space technique for extracting a cross-lingual thesaurus from aligned English-Japanese bilingual data. The cross-lingual thesaurus has been used for automatic cross-lingual query expansion in the NTCIR patent retrieval task.

  • 37.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Holst, Anders
    RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.
    Kanerva, Pentti
    Permutations as a means to encode order in word space2008Conference paper (Refereed)
    Abstract [en]

    We show that sequence information can be encoded into high-dimensional fixed-width vectors using permutations of coordinates. Computational models of language often represent words with high-dimensional semantic vectors compiled from word-use statistics. A word's semantic vector usually encodes the contexts in which the word appears in a large body of text but ignores word order. However, word order often signals a word's grammatical role in a sentence and thus tells of the word's meaning. Jones and Mewhort (2007) show that word order can be included in the semantic vectors using holographic reduced representation and convolution. We show here that the order information can be captured also by permuting of vector coordinates, thus providing a general and computationally light alternative to convolution.

  • 38.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Automatic Bilingual Lexicon Acquisition Using Random Indexing of Parallel Corpora2005In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 11, no 3, p. 327-341Article in journal (Refereed)
    Abstract [en]

    This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60\%. We also discuss methodological problems with evaluating lexical resources of this kind.

  • 39.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS. Attityd.
    Buzz monitoring in word space2008Conference paper (Refereed)
    Abstract [en]

    This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source - a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration.

  • 40.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity2005Conference paper (Refereed)
    Abstract [en]

    This paper introduces a measure of corpus homogeneity that indicates the amount of topical dispersion in a corpus. The measure is based on the density of neighborhoods in semantic word spaces. We evaluate the measure by comparing the results for five different corpora. Our initial results indicate that the proposed density measure can indeed identify differences in topical dispersion.

  • 41.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS. Attityd.
    Terminology mining in social media2009Conference paper (Refereed)
    Abstract [en]

    The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exemplifies a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining.

  • 42.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Cöster, Rickard
    RISE, Swedish ICT, SICS.
    SICS at CLEF 2002: automatic query expansion using random indexing2002Conference paper (Refereed)
    Abstract [en]

    Vector-space techniques can be used for extracting semantically similar words from the co-occurrence statistics of words in large text data. We have used a technique called Random Indexing to accumulate context vectors for Swedish, French and Italian. We have then used the context vectors to perform automatic query expansion. In this paper, we report on our CLEF 2002 experiments on Swedish, French and Italian monolingual query expansion.

  • 43.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Cöster, Rickard
    RISE, Swedish ICT, SICS.
    Järvinen, Timo
    Automatic query expansion using random indexing2003In: Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002: Revised Papers, Springer-Verlag , 2003, 1, p. 311-320Chapter in book (Refereed)
    Abstract [en]

    Vector-space techniques can be used for extracting semantically similar words from the co-occurrence statistics of words in large text data. We have used a technique called Random Indexing to accumulate context vectors for Swedish, French and Italian. We have then used the context vectors to perform automatic query expansion. In this paper, we report on our CLEF 2002 experiments on Swedish, French and Italian monolingual query expansion.

  • 44.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Karlgren, Jussi
    RISE, Swedish ICT, SICS.
    Eriksson, Gunnar
    RISE, Swedish ICT, SICS.
    SICS: Valence annotation based on seeds in word space2007Conference paper (Refereed)
  • 45.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Knutsson, Ola
    Proceedings of the workshop on extracting and using constructions in NLP2009Report (Other academic)
    Abstract [en]

    This is a collection of papers presented at the Nodalida 2009 workshop on extracting and using constructions in NLP.

  • 46.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Knutsson, Ola
    Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 20072007Report (Other academic)
  • 47.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Knutsson, Ola
    Workshop on Extracting and Using Constructions in Computational Linguistics2010 (ed. 7)Book (Refereed)
  • 48.
    Sahlgren, Magnus
    et al.
    RISE - Research Institutes of Sweden, ICT, SICS.
    Swanberg, David
    Using linguistic information to improve the performance of vector-based semantic analysis2001In: NoDaLiDa '01: 13th Nordic Conference on Computational Linguistics, 2001, 1Conference paper (Refereed)
    Abstract [en]

    The use of vector-based models of information for the purpose of semantic analysis is an area of research that has gained substantial recognition over the last decade. However, the application of high-dimensional vector representations to linguistic data has, to a large extent, remained exclusively statistical and consequently paid minimal or no attention to the linguistic structures of the data used in the experiments. In this paper, we show that the performance of vector based semantic analysis can be improved by considering basic linguistic structures - e.g. morphology - in the data.

  • 49.
    Sandin, Fredrik
    et al.
    Luleå University of Technology, Sweden.
    Emruli, Blerim
    RISE - Research Institutes of Sweden, ICT, SICS.
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Random indexing of multidimensional data2017In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 52, no 1, p. 267-290Article in journal (Refereed)
    Abstract [en]

    Random indexing (RI) is a lightweight dimension reduction method, which is used, for example, to approximate vector semantic relationships in online natural language processing systems. Here we generalise RI to multidimensional arrays and therefore enable approximation of higher-order statistical relationships in data. The generalised method is a sparse implementation of random projections, which is the theoretical basis also for ordinary RI and other randomisation approaches to dimensionality reduction and data representation. We present numerical experiments which demonstrate that a multidimensional generalisation of RI is feasible, including comparisons with ordinary RI and principal component analysis. The RI method is well suited for online processing of data streams because relationship weights can be updated incrementally in a fixed-size distributed representation, and inner products can be approximated on the fly at low computational cost. An open source implementation of generalised RI is provided. © 2016, The Author(s).

  • 50.
    Täckström, Oscar
    et al.
    RISE, Swedish ICT, SICS.
    Bergh, Cecilia
    Sahlgren, Magnus
    RISE - Research Institutes of Sweden, ICT, SICS.
    Sjölinder, Marie
    RISE, Swedish ICT, SICS.
    Södersten, Per
    Zandian, Modjtaba
    An Embodied question answering system for use in the treatment of eating disorders2008In: Proceedings of The 4th International Workshop on Human-Computer Conversation, 2008, 1, , p. 4Conference paper (Refereed)
    Abstract [en]

    This paper presents work in progress on implementing an embodied question answering system, Dr. Cecilia, in the form of a virtual caregiver, for use in the treatment of eating disorders. The rationale for the system is grounded in one of the few effective treatments for anorexia and bulimia nervosa. The questions and answers database is encoded using natural language, and is easily updatable by human caregivers without any technical expertise. Matching of users' questions with database entries is performed using a weighted and normalized n-gram similarity function. In this paper we give a comprehensive background to and an overview of the system, with a focus on aspects pertaining to natural language processing and user interaction. The system is currently only implemented for Swedish.

1 - 50 of 50
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.35.7