We present Hu-GTVA, a framework for human-grounded test case generation, validation, and acceptance designed to create contextual ground truths for the evaluation of retrieval-augmented generation (RAG) systems in high-stakes public-sector contexts. The framework addresses the challenge of aligning RAG system evaluations with expert-grounded domain knowledge by combining automated test case generation, structured expert annotation, and dual-review protocols. We demonstrate its application in collaboration with the Swedish National Financial Management Authority (ESV), where it supports the evaluation of Konsekvenshjälpen, a RAG-LLM system for regulatory impact assessment assistance. Hu-GTVA takes conceptual motivation from both the principle of Oversight by Design and the regulatory requirement of Human Oversight under Article 14 of the EU AI Act. Oversight by Design emphasizes integrating oversight considerations already during the design phase, while Human Oversight defines who, when, and what must be governed to ensure accountable AI use. Drawing from both, Hu-GTVA introduces structured expert review, acceptance criteria, and quantitative agreement metrics to bring human judgment into the evaluation process before deployment. Designed for modularity and domain adaptability, the framework can be extended to other high-risk settings such as healthcare or critical infrastructure. Hu-GTVA offers a reproducible and human-centered pre-hoc RAG-LLM evaluation pipeline.