Continual Learning Under Language Shift
2024 (English)In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 15048 LNAI, p. 71-84Article in journal (Refereed) Published
Abstract [en]
The recent increase in data and model scale for language model pre-training has led to huge training costs. In scenarios where new data become available over time, updating a model instead of fully retraining it would therefore provide significant gains. We study the pros and cons of updating a language model when new data comes from new languages – the case of continual learning under language shift. Starting from a monolingual English language model, we incrementally add data from Danish, Icelandic and Norwegian to investigate how forward and backward transfer effects depend on pre-training order and characteristics of languages, for models with 126M, 356M and 1.3B parameters. Our results show that, while forward transfer is largely positive and independent of language order, backward transfer can be positive or negative depending on the order and characteristics of new languages. We explore a number of potentially explanatory factors and find that a combination of language contamination and syntactic similarity best fits our results.
Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH , 2024. Vol. 15048 LNAI, p. 71-84
Keywords [en]
Adversarial machine learning; Federated learning; Modeling languages; Continual learning; English languages; Forward-and-backward; Icelandics; Language model; Large language model; Model scale; Multilingual NLP; Pre-training; Training costs; Contrastive Learning
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-75655DOI: 10.1007/978-3-031-70563-2_6Scopus ID: 2-s2.0-85203595319OAI: oai:DiVA.org:ri-75655DiVA, id: diva2:1909799
Conference
27th International Conference on Text, Speech, and Dialogue, TSD 2024, Brno. 9 September 2024 through 13 September 2024
Note
The research presented in this paper was supported by the Swedish Research Council (grant no. 2022-02909). A significant part of the computations was enabled by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre at Link\u00F6ping University, Sweden (Berzelius-2023-178). In addition, the authors gratefully acknowledge the HPC RIVR consortium (www.hpc-rivr.si) and EuroHPC JU (eurohpc-ju.europa.eu) for funding this research by providing computing resources of the HPC system Vega at the Institute of Information Science (www.izum.si). Magnus Boman acknowledges funding from the Swedish Research Council on Scalable Federated Architectures.
2024-11-012024-11-012025-09-23Bibliographically approved