Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
FEW-SHOT BIOACOUSTIC EVENT DETECTION USING AN EVENT-LENGTH ADAPTED ENSEMBLE OF PROTOTYPICAL NETWORKS
RISE Research Institutes of Sweden, Digital Systems, Data Science. Lund University, Sweden.ORCID iD: 0000-0002-5032-4367
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0009-0004-1803-4193
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-5299-142X
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-9567-2218
Show others and affiliations
2022 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we study two major challenges in few-shot bioacoustic event detection: variable event lengths and false-positives. We use prototypical networks where the embedding function is trained using a multi-label sound event detection model instead of using episodic training as the proxy task on the provided training dataset. This is motivated by polyphonic sound events being present in the base training data. We propose a method to choose the embedding function based on the average event length of the few-shot examples and show that this makes the method more robust towards variable event lengths. Further, we show that an ensemble of prototypical neural networks trained on different training and validation splits of time-frequency images with different loudness normalizations reduces false-positives. In addition, we present an analysis on the effect that the studied loudness normalization techniques have on the performance of the prototypical network ensemble. Overall, per-channel energy normalization (PCEN) outperforms the standard log transform for this task. The method uses no data augmentation and no external data. The proposed approach achieves a F-score of 48.0% when evaluated on the hidden test set of the Detection and Classification of Acoustic Scenes and Events (DCASE) task 5

Place, publisher, year, edition, pages
2022.
Keywords [en]
Machine listening, bioacoustics, few-shot learning, ensemble
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:ri:diva-62540OAI: oai:DiVA.org:ri-62540DiVA, id: diva2:1727434
Conference
Detection and Classification of Acoustic Scenes and Events 2022. 3–4 November 2022, Nancy, France
Available from: 2023-01-16 Created: 2023-01-16 Last updated: 2024-07-28Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Full text

Authority records

Martinsson, JohnWillbo, MartinPirinen, AleksisMogren, Olof

Search in DiVA

By author/editor
Martinsson, JohnWillbo, MartinPirinen, AleksisMogren, Olof
By organisation
Data Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 208 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf