System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality
Silo AI, Finland.
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0003-3246-1664
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-9162-6433
RISE Research Institutes of Sweden, Digital Systems, Data Science.
Show others and affiliations
2024 (English)In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 14612 LNCS, p. 459-465Article in journal (Refereed) Published
Abstract [en]

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to bring together some high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The selected tasks for this first year of ELOQUENT are (1) probing a language model for topical competence; (2) assessing the ability of models to generate and detect hallucinations; (3) assessing the robustness of a model output given variation in the input prompts; and (4) establishing the possibility to distinguish human-generated text from machine-generated text.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH , 2024. Vol. 14612 LNCS, p. 459-465
Keywords [en]
Benchmarking; CLEF; Generative language model; Human assessment; Language model; LLM; Modeling quality; Multilinguality; Quality benchmark; Quality criteria; Shared task; Computational linguistics
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-72876DOI: 10.1007/978-3-031-56069-9_63Scopus ID: 2-s2.0-85189366495OAI: oai:DiVA.org:ri-72876DiVA, id: diva2:1854721
Conference
46th European Conference on Information Retrieval, ECIR 2024. Glasgow, UK. 24 March 2024 through 28 March 2024
Available from: 2024-04-26 Created: 2024-04-26 Last updated: 2024-05-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Dürlich, LuiseGogoulou, EvangeliaNivre, Joakim

Search in DiVA

By author/editor
Dürlich, LuiseGogoulou, EvangeliaNivre, Joakim
By organisation
Data Science
In the same journal
Lecture Notes in Computer Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 95 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf