中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

TIGP (SNHCC) -- Multimodal Prompting with Missing Modalities for Visual Recognition

:::

TIGP (SNHCC) -- Multimodal Prompting with Missing Modalities for Visual Recognition

  • 講者邱維辰 教授 (國立陽明交通大學資工系)
    邀請人:TIGP (SNHCC)
  • 時間2023-05-29 (Mon.) 14:00 ~ 16:00
  • 地點資訊所新館106演講廳
摘要
Our observation perceived in daily life is typically mulitmodal, such as visual, linguistic, and acoustic signals, thus modeling and coordinating multimodal information is of great interest and has broad application potentials. Recently, multimodal transformers emerge as the pre-trained backbone models in several multimodal downstream tasks, including genre classification, multimodal sentiment analysis, and cross-modal retrieval, etc. Though providing promising performance and generalization ability on various tasks, there are still challenges for multimodal transformers being applied in practical scenarios: 1) how to efficiently adapt the multimodal transformers without using heavy computation resource to finetune the entire model? 2) how to ensure the robustness when there are missing modalities, e.g., incomplete training data or observations in testing? In this talk, I will introduce our simple but efficient approach to utilize prompt learning and mitigate the above two challenges together. If time allows, I will also briefly introduce other recent works from my research group.