Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
INPUTMIX: A STRATEGY TO REGULARIZE AND BALANCE MULTI-MODALITY AND MULTI-VIEW MODEL LEARNING
Umeå University, Sweden.
RISE Research Institutes of Sweden, Digital Systems, Mobility and Systems. Umeå University, Sweden.
2024 (English)In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ISSN 1520-6149, p. 5455-5459Article in journal (Refereed) Published
Abstract [en]

Real-world perception tasks often involve multiple modalities or views of input. While joint training of multiple modality classification models has been explored previously, it has not consistently outperformed the best single modality model. This paper aims to address one of the reasons for this: the difficulty in balancing the contributions of each input in the end-to-end training of multi-input models. Additionally, the increased capacity of multi-input networks can lead to overfitting. To solve these issues, we propose InputMix, a simple yet effective method for optimally mixing different inputs. Our method mixes a certain proportion p of input pairs to relieve the increased capacity problems and assigns a weighting factor λ for each input to generate a mixed target, allowing us to specify the contributions of each input. Experimental results on three multi-input classification tasks demonstrate that our method significantly improves the generalization performance of multi-input neural networks. Codes are available at https://github.com/JesseWong333/inputmix/. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2024. p. 5455-5459
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:ri:diva-74868DOI: 10.1109/ICASSP48485.2024.10446664Scopus ID: 2-s2.0-85195364390OAI: oai:DiVA.org:ri-74868DiVA, id: diva2:1895059
Conference
49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024. Seoul, South nKOrea. 14 April 2024 through 19 April 2024
Note

The computations were enabled by resources in project NAISS 2023/22-19] provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at Alvis, funded by the Swedish Research Council through grant agreement no. 2022-06725.

Available from: 2024-09-04 Created: 2024-09-04 Last updated: 2025-09-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus
By organisation
Mobility and Systems
In the same journal
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 32 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf