Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Practical machine Learning and Data Analysis
RISE, Swedish ICT, SICS, Decisions, Networks and Analytics lab.ORCID iD: 0000-0001-8952-3542
2008 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis discusses and addresses some of the difficulties associated with practical machine learning and data analysis. Introducing data driven methods in e.g industrial and business applications can lead to large gains in productivity and efficiency, but the cost and complexity are often overwhelming. Creating machine learning applications in practise often involves a large amount of manual labour, which often needs to be performed by an experienced analyst without significant experience with the application area. We will here discuss some of the hurdles faced in a typical analysis project and suggest measures and methods to simplify the process. One of the most important issues when applying machine learning methods to complex data, such as e.g. industrial applications, is that the processes generating the data are modelled in an appropriate way. Relevant aspects have to be formalised and represented in a way that allow us to perform our calculations in an efficient manner. We present a statistical modelling framework, Hierarchical Graph Mixtures, based on a combination of graphical models and mixture models. It allows us to create consistent, expressive statistical models that simplify the modelling of complex systems. Using a Bayesian approach, we allow for encoding of prior knowledge and make the models applicable in situations when relatively little data are available. Detecting structures in data, such as clusters and dependency structure, is very important both for understanding an application area and for specifying the structure of e.g. a hierarchical graph mixture. We will discuss how this structure can be extracted for sequential data. By using the inherent dependency structure of sequential data we construct an information theoretical measure of correlation that does not suffer from the problems most common correlation measures have with this type of data. In many diagnosis situations it is desirable to perform a classification in an iterative and interactive manner. The matter is often complicated by very limited amounts of knowledge and examples when a new system to be diagnosed is initially brought into use. We describe how to create an incremental classification system based on a statistical model that is trained from empirical data, and show how the limited available background information can still be used initially for a functioning diagnosis system. To minimise the effort with which results are achieved within data analysis projects, we need to address not only the models used, but also the methodology and applications that can help simplify the process. We present a methodology for data preparation and a software library intended for rapid analysis, prototyping, and deployment. Finally, we will study a few example applications, presenting tasks within classification, prediction and anomaly detection. The examples include demand prediction for supply chain management, approximating complex simulators for increased speed in parameter optimisation, and fraud detection and classification within a media-on-demand system.

Place, publisher, year, edition, pages
2008, 1. , p. 214
Series
SICS dissertation series, ISSN 1101-1335 ; 49
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-22950OAI: oai:DiVA.org:ri-22950DiVA, id: diva2:1042515
Available from: 2016-10-31 Created: 2016-10-31 Last updated: 2020-12-02Bibliographically approved

Open Access in DiVA

fulltext(2167 kB)105 downloads
File information
File name FULLTEXT01.pdfFile size 2167 kBChecksum SHA-512
fb4dcc1a7bec4b9433ec3da951ce81ffa635c0b6eeda620b36ff6a151744401de7cceabcab03709413f63b63c82c8c8377bda7773eadfe3727812330dbf614d1
Type fulltextMimetype application/pdf

Authority records

Gillblad, Daniel

Search in DiVA

By author/editor
Gillblad, Daniel
By organisation
Decisions, Networks and Analytics lab
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 106 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 342 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf