The recent success in visual recognition using deep neural network has stimulated the research in exploring new network architectures. Building feature encoders on top of convolutional features is a popular approach for this goal. I will introduce the bilinear CNNs, an architecture that efficiently represents an image as a pooled outer product of two CNN features, and its applications on fine-grained classification and texture recognition. The talk will: (1) introduce a general formulation of bilinear model for classification, (2) derive a family of end-to-end trainable bilinear models that generalize classical image representations, (3) discuss the dimensionality reduction techniques on bilinear models, (4) evaluate the performance on fine-grained recognition and texture recognition tasks, and (5) visualize the attributes learned by the models. The source code for the complete system is available at http://vis-www.cs.umass.edu/bcnn
Tsung-Yu Lin is a third-year PhD student in in College of Information and Computer Sciences at University of Massachusetts Amherst working with Prof. Subhransu Maji. He received his BS and MS degrees in computer science from National Tsing Hua University in 2008 and 2010 respectively. Prior to his arrival at UMass, he worked at Academia Sinica as research assistant with Dr. Tyng-Luh Liu. His research interests include image recognition and analysing visual patterns on radar images.