Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (4 of 4) Visa alla publikationer
Fahria, K., Mowla, N. & Stenberg, S. (2025). Human-Centric Ground Truth Evaluation and Acceptance (Hu-GTVA): An Oversight by Design process for RAG-LLM evaluation. Borås: RISE Research Institutes of Sweden
Öppna denna publikation i ny flik eller fönster >>Human-Centric Ground Truth Evaluation and Acceptance (Hu-GTVA): An Oversight by Design process for RAG-LLM evaluation
2025 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

We present Hu-GTVA, a framework for human-grounded test case generation, validation, and acceptance designed to create contextual ground truths for the evaluation of retrieval-augmented generation (RAG) systems in high-stakes public-sector contexts. The framework addresses the challenge of aligning RAG system evaluations with expert-grounded domain knowledge by combining automated test case generation, structured expert annotation, and dual-review protocols. We demonstrate its application in collaboration with the Swedish National Financial Management Authority (ESV), where it supports the evaluation of Konsekvenshjälpen, a RAG-LLM system for regulatory impact assessment assistance. Hu-GTVA takes conceptual motivation from both the principle of Oversight by Design and the regulatory requirement of Human Oversight under Article 14 of the EU AI Act. Oversight by Design emphasizes integrating oversight considerations already during the design phase, while Human Oversight defines who, when, and what must be governed to ensure accountable AI use. Drawing from both, Hu-GTVA introduces structured expert review, acceptance criteria, and quantitative agreement metrics to bring human judgment into the evaluation process before deployment. Designed for modularity and domain adaptability, the framework can be extended to other high-risk settings such as healthcare or critical infrastructure. Hu-GTVA offers a reproducible and human-centered pre-hoc RAG-LLM evaluation pipeline.

Ort, förlag, år, upplaga, sidor
Borås: RISE Research Institutes of Sweden, 2025. s. 30
Serie
RISE Rapport
Nyckelord
Human oversight, oversight by design, ground truth, RAG, RAG LLM, RAG evaluation, RAGChecker, RAGAS, TruLens
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:ri:diva-80077 (URN)978-91-90109-25-0 (ISBN)
Tillgänglig från: 2025-12-22 Skapad: 2025-12-22 Senast uppdaterad: 2026-01-22Bibliografiskt granskad
Fahria, K. & Mowla, N. (2025). Testing AI in relation to Traditional Software Testing: A Comparative Overview. RISE Research Institutes of Sweden
Öppna denna publikation i ny flik eller fönster >>Testing AI in relation to Traditional Software Testing: A Comparative Overview
2025 (Engelska)Rapport (Övrigt vetenskapligt)
Ort, förlag, år, upplaga, sidor
RISE Research Institutes of Sweden, 2025. s. 13
Serie
RISE Rapport ; 2025:40
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:ri:diva-78272 (URN)978-91-90036-27-3 (ISBN)
Tillgänglig från: 2025-03-27 Skapad: 2025-03-27 Senast uppdaterad: 2026-01-22Bibliografiskt granskad
Fahria, K., Kabir, F., Mowla, N. & Fakhrul Abedin, S. (2025). Towards Explainable Automotive Intrusion Detection: A Chunk-based Framework forCAN Traffic. In: : . Paper presented at Swedish National Computer Networking and Cloud Computing Workshop (SNCNW), arranged at University West in Trollhättan, June 10-11, 2025.
Öppna denna publikation i ny flik eller fönster >>Towards Explainable Automotive Intrusion Detection: A Chunk-based Framework forCAN Traffic
2025 (Engelska)Konferensbidrag, Publicerat paper (Övrigt vetenskapligt)
Abstract [en]

In this work, we propose an explainable intrusion detection framework for Controller Area Network bus traffic using the ROAD dataset. By segmenting raw traffic into fixed-size chunks, we extract features that capture timing behavior, entropy, payload statistics, and CAN ID survival rates. We evaluate three classifiers, Decision Tree, Random Forest (with TreeSHAP), and Feedforward Neural Network (with KernelSHAP). The framework extracts multi-level features from CAN traffic, revealing through explainability that tree models detect protocol anomalies while neural networks capture signal-level distortions, underscoring the role of model choice in explainable IDS design.

Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:ri:diva-78747 (URN)
Konferens
Swedish National Computer Networking and Cloud Computing Workshop (SNCNW), arranged at University West in Trollhättan, June 10-11, 2025
Anmärkning

This work is supported by the EU project Citcom.AI,Vinnova INTERSTICE project (reference number: 2024-00661), and VINNOVA FFI Project MAGIC (referencenumber: 2024-03687). This work is also partiallysupported by KKS Research Profile NIIT, and DataCommunication Security Laboratory at Ewha WomansUniversity, South Korea.

Tillgänglig från: 2025-08-15 Skapad: 2025-08-15 Senast uppdaterad: 2026-01-22Bibliografiskt granskad
Mowla, N. (2024). From AI Act to Structured Testing of AI Systems. RISE Research Institutes of Sweden
Öppna denna publikation i ny flik eller fönster >>From AI Act to Structured Testing of AI Systems
2024 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

The Citcom.AI RISE testing approach is a step towards structured AI system evaluation and testing under the AI Act's regulatory framework. It establishes a definition of context in the scenario of different AI application domains, AI subfields, and use cases. In particular, a systematic evaluation, from defining the context and application to detailed risk assessments, linking each AI application to corresponding testing standards and methodologies, is presented. The approach translates AI Act’s high level regulatory requirements for different AI system risk levels to appropriate technical testing techniques for achieving trustworthiness across different domains and AI subfields, promoting responsible AI deployment and fostering trust in AI applications. 

Ort, förlag, år, upplaga, sidor
RISE Research Institutes of Sweden, 2024. s. 12
Serie
RISE Rapport ; 2024:84
Nyckelord
Testing AI, AI Act, AI systems, Standards, Context
Nationell ämneskategori
Datorsystem Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:ri:diva-76073 (URN)9789189971462 (ISBN)
Tillgänglig från: 2024-11-13 Skapad: 2024-11-13 Senast uppdaterad: 2026-01-22Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0009-0004-8393-1683

Sök vidare i DiVA

Visa alla publikationer