Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
DMEL: THE DIFFERENTIABLE LOG-MEL SPECTROGRAM AS A TRAINABLE LAYER IN NEURAL NETWORKS
RISE Research Institutes of Sweden, Digitala system, Datavetenskap. Lund University, Sweden.ORCID-id: 0000-0002-5032-4367
Lund University, Sweden.
2024 (Engelska)Ingår i: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2024, s. 5005-5009Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper we present the differentiable log-Mel spectrogram (DMEL) for audio classification. DMEL uses a Gaussian window, with a window length that can be jointly optimized with the neural network. DMEL is used as the input layer in different neural networks and evaluated on standard audio datasets. We show that DMEL achieves a higher average test accuracy for sub-optimal initial choices of the window length when compared to a baseline with a fixed window length. In addition, we analyse the computational cost of DMEL and compare to a standard hyperparameter search over different window lengths, showing favorable results for DMEL. Finally, an empirical evaluation on a carefully designed dataset is performed to investigate if the differentiable spectrogram actually learns the optimal window length. The design of the dataset relies on the theory of spectrogram resolution. We also empirically evaluate the convergence rate to the optimal window length. 

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers Inc. , 2024. s. 5005-5009
Nationell ämneskategori
Matematik
Identifikatorer
URN: urn:nbn:se:ri:diva-74873DOI: 10.1109/ICASSP48485.2024.10446816Scopus ID: 2-s2.0-85195408870OAI: oai:DiVA.org:ri-74873DiVA, id: diva2:1895056
Konferens
49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024. Seoul, South Korea. 14 April 2024 through 19 April 2024
Anmärkning

Thanks to the Swedish Foundation for Strategic Research for funding.

Tillgänglig från: 2024-09-04 Skapad: 2024-09-04 Senast uppdaterad: 2024-09-06Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Martinsson, John

Sök vidare i DiVA

Av författaren/redaktören
Martinsson, John
Av organisationen
Datavetenskap
Matematik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 43 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf