Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Overview of the CLEF-2024 Eloquent Lab: Task 2 on HalluciGen
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0003-3246-1664
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-9162-6433
University of Edinburgh, UK.
RISE Research Institutes of Sweden, Digital Systems, Data Science.ORCID iD: 0000-0002-7873-3971
Show others and affiliations
2024 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2024, Vol. 3740, p. 691-702Conference paper, Published paper (Refereed)
Abstract [en]

In the HalluciGen task we aim to discover whether LLMs have an internal representation of hallucination. Specifically, we investigate whether LLMs can be used to both generate and detect hallucinated content. In the cross-model evaluation setting we take this a step further and explore the viability of using an LLM to evaluate output produced by another LLM. We include generation, detection, and cross-model evaluation steps for two scenarios: paraphrase and machine translation. Overall we find that performance of the baselines and submitted systems is highly variable, however initial results are promising and lessons learned from this year’s task will provide a solid foundation for future iterations of the task. In particular, we highlight that human validation of generated output is ideally necessary to ensure the robustness of the cross-model evaluation results. We aim to address this challenge in future iterations of HalluciGen. 

Place, publisher, year, edition, pages
CEUR-WS , 2024. Vol. 3740, p. 691-702
Keywords [en]
Computational linguistics; Computer aided language translation; Modeling languages; Cross model; Detection models; Evaluation; Generative language model; Hallucination; Internal representation; Language model; Machine translations; Model evaluation; Performance; Machine translation
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:ri:diva-75021Scopus ID: 2-s2.0-85201646530OAI: oai:DiVA.org:ri-75021DiVA, id: diva2:1895608
Conference
25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024. Grenoble. 9 September 2024 through 12 September 2024
Note

This lab has been partially supported by the Swedish Research Council (grant number 2022-02909) and by UK Research and Innovation (UKRI) under the UK government's Horizon Europe funding guarantee [grant number 10039436 (Utter)].

Available from: 2024-09-06 Created: 2024-09-06 Last updated: 2025-09-23Bibliographically approved

Open Access in DiVA

fulltext(1212 kB)129 downloads
File information
File name FULLTEXT01.pdfFile size 1212 kBChecksum SHA-512
2f9a1080f2d767c69b0973a20723ed777249ae0ae0628d7b546f86035dc6ee65b31082d72e0460b2630721909f51388850f73e6a7debc0084c73045601182a91
Type fulltextMimetype application/pdf

Other links

ScopusFull taxt

Authority records

Dürlich, LuiseGogoulou, EvangeliaNivre, JoakimZahra, Shorouq

Search in DiVA

By author/editor
Dürlich, LuiseGogoulou, EvangeliaNivre, JoakimZahra, Shorouq
By organisation
Data Science
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 129 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 542 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf