Publications

You can also find my articles on my Google Scholar profile.

Publications

(* Refers to the authors having the equal contribution, and should be considered as co-first authors.)

2025

The paper proposes MVI-Bench, the first comprehensive benchmark designed for evaluating how Misleading Visual Inputs undermine the robustness of LVLMs. Empirical results across 18 state-of-the-art LVLMs uncover pronounced vulnerabilities to misleading visual inputs, and in-depth analyses provide actionable insights that can guide the development of more reliable and robust LVLMs.

The paper proposes a novel coreset construction framework, KeCO, for image classification tasks, aimed at enhancing the in-context learning capabilities of Large Vision-Language Models. KeCO leverages untapped data from the support set to aggregate category-relevant information into the coreset via feature-level updates. Notably, KeCO achieves strong performance in a simulated online scenario, demonstrating its practical applicability.

2024

The paper explores how to configure effective in-context sequences for Visual Question Answering (VQA) tasks to enhance the in-context learning (ICL) capabilities of Large Visual Language Models (LVLM). The paper elaborates on the role of different in-context configuration in LVLM and designs new configuration methods, providing valuable insights for optimizing LVLM’s ICL performance in VQA tasks.