Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Domain expertise–agnostic feature selection for the analysis of breast cancer data*
KTH Royal Institute of Technology, Sweden; Politecnico di Milano, Italy.
RISE Research Institutes of Sweden.
KTH Royal Institute of Technology, Sweden.
Karolinska Institute,Sweden.
Show others and affiliations
2020 (English)In: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 108, article id 101928Article in journal (Refereed) Published
Abstract [en]

Progress in proteomics has enabled biologists to accurately measure the amount of protein in a tumor. This work is based on a breast cancer data set, result of the proteomics analysis of a cohort of tumors carried out at Karolinska Institutet. While evidence suggests that an anomaly in the protein content is related to the cancerous nature of tumors, the proteins that could be markers of cancer types and subtypes and the underlying interactions are not completely known. This work sheds light on the potential of the application of unsupervised learning in the analysis of the aforementioned data sets, namely in the detection of distinctive proteins for the identification of the cancer subtypes, in the absence of domain expertise. In the analyzed data set, the number of samples, or tumors, is significantly lower than the number of features, or proteins; consequently, the input data can be thought of as high-dimensional data. The use of high-dimensional data has already become widespread, and a great deal of effort has been put into high-dimensional data analysis by means of feature selection, but it is still largely based on prior specialist knowledge, which in this case is not complete. There is a growing need for unsupervised feature selection, which raises the issue of how to generate promising subsets of features among all the possible combinations, as well as how to evaluate the quality of these subsets in the absence of specialist knowledge. We hereby propose a new wrapper method for the generation and evaluation of subsets of features via spectral clustering and modularity, respectively. We conduct experiments to test the effectiveness of the new method in the analysis of the breast cancer data, in a domain expertise–agnostic context. Furthermore, we show that we can successfully augment our method by incorporating an external source of data on known protein complexes. Our approach reveals a large number of subsets of features that are better at clustering the samples than the state-of-the-art classification in terms of modularity and shows a potential to be useful for future proteomics research.

Place, publisher, year, edition, pages
Elsevier B.V. , 2020. Vol. 108, article id 101928
Keywords [en]
Breast cancer, Clustering, Clustering performance evaluation, Dimensionality reduction, Feature selection, Proteomics, Unsupervised learning, Clustering algorithms, Diseases, Feature extraction, Proteins, Set theory, Tumors, Breast cancer data, High dimensional data, High-dimensional data analysis, Number of samples, Protein complexes, Proteomics research, Spectral clustering, Unsupervised feature selection, Quality control
National Category
Natural Sciences
Identifiers
URN: urn:nbn:se:ri:diva-45613DOI: 10.1016/j.artmed.2020.101928Scopus ID: 2-s2.0-85088878526OAI: oai:DiVA.org:ri-45613DiVA, id: diva2:1458198
Note

Funding text 1: The research project has partially received funding under the Marie Sk?odowska-Curie Actions (MCSA) funded project Real-Time Analytics for Internet of Sports (RAIS) (Grant Agreement No. 813162). The research was also partially funded under ?AI for Proteomics? initiative by RISE Research Institutes of Sweden.

Available from: 2020-08-14 Created: 2020-08-14 Last updated: 2025-09-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Girdzijauskas, Sarunas

Search in DiVA

By author/editor
Girdzijauskas, Sarunas
By organisation
RISE Research Institutes of SwedenData Science
In the same journal
Artificial Intelligence in Medicine
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 89 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf