Page 38 - My FlipBook

P. 38

工
智
慧
計
畫

Arti cial Intelligence Projects

Natural language driven computer vision applications

We develop an attention-to-attention DNN framework network architecture considers bottom-up and top-down
to tackle the movie question answering (QA) problem. aggregations of visual-textual information to achieve the
Our proposed method was ranked as the top-1 model in segmentation task. This work is presented in ICCV 2019,
the MovieQA leaderboard until August 2018. The task and to the best of our knowledge, is the current SOTA
of referring image segmentation is to correctly segment technique for referring image segmentation on the main
the target region(s) in the image, according to a provided benchmark datasets.
sentence hint. As illustrated in Figure 2, our proposed

Few-shot learning for computer vision applications

The ability of humans to learn new concepts under limited a preliminary study, we tackle the problem of one-shot
guidance is remarkable. Even without prior knowledge object detection by proposing a co-attention and co-
about an object's category, the human visual system has excitation DNN module. Figure 3 shows how the module
evolved to handle such a task by performing different of squeeze and co-excitation enables one-shot learning to
functionalities that include grouping the pixels of an object seek common features between the query and the target
as a whole, extracting distinctive cues for comparison, image. Our work is published in NeurIPS 2019, and we are
and exhibiting attention or fixation for localization. In currently engaged in further developing this project.

Figure 3 : Non-local proposals and co-excitation for one-shot object detection.

36

33 34 35 36 37 38 39 40 41 42 43