Publications
Publications
(* Refers to the authors having the equal contribution, and should be considered as co-first authors.)
2025
- Huiyi Chen, Jiawei Peng, Dehai Min, Changchang Sun, Kaijie Chen, Yan Yan, Xu Yang, Lu Cheng. MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs (Preprinted, submitted to CVPR 2026)
The paper proposes MVI-Bench, the first comprehensive benchmark designed for evaluating how Misleading Visual Inputs undermine the robustness of LVLMs. Empirical results across 18 state-of-the-art LVLMs uncover pronounced vulnerabilities to misleading visual inputs, and in-depth analyses provide actionable insights that can guide the development of more reliable and robust LVLMs.
- Huiyi Chen, Jiawei Peng, Kaihua Tang, Xin Geng, Xu Yang. Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization(ACMMM 2025, Oral).
The paper proposes a novel coreset construction framework, KeCO, for image classification tasks, aimed at enhancing the in-context learning capabilities of Large Vision-Language Models. KeCO leverages untapped data from the support set to aggregate category-relevant information into the coreset via feature-level updates. Notably, KeCO achieves strong performance in a simulated online scenario, demonstrating its practical applicability.
2024
- Li Li*, Jiawei Peng*, Huiyi Chen*, Chongyang Gao, and Xu Yang. How to Configure Good In-Context Sequence for Visual Question Answering (CVPR 2024, Poster)
The paper explores how to configure effective in-context sequences for Visual Question Answering (VQA) tasks to enhance the in-context learning (ICL) capabilities of Large Visual Language Models (LVLM). The paper elaborates on the role of different in-context configuration in LVLM and designs new configuration methods, providing valuable insights for optimizing LVLM’s ICL performance in VQA tasks.
