Existing cooperative perception systems often require joint training among participating agents. However, this assumption clashes with practical deployments where agents belong to diverse entities, each employing unique models and performing their own downstream tasks. Sharing model details for joint training is often hard to achieve since the concerns of intellectual property (IP) leakage and the centralized training may conflict with the downstream task of each agent. To address these challenges, we introduce IMCP, a robust Intermediate Model-Agnostic Cooperative Perception framework. IMCP enables universal agent fusion without the need for joint training or model sharing. Each agent undergoes independent initial training, followed by a cooperative fine-tuning stage where the feature encoder of each agent remains frozen. To effectively fuse features from diverse domains, we incorporate parameter-efficient hierarchical feature adaptation layers that map features into a common representation space. Furthermore, deformable attention is employed to selectively aggregate multiple Bird's-Eye View (BEV) features of varying sizes. Extensive experiments on two real-world cooperative perception datasets demonstrate that IMCP achieves comparable performance to existing joint training methods. Code is available at https://github.com/JesseWong333/IMCP
QC 20260306