=================================================================
Q:
You have done great work on image search with neural network. I am interested in your Yahoo-1M results.
1. Could you share the dataset with me for further test and research work?
2. Could you share with me a copy of your trained model on the Yahoo-1M dataset?
A:
Since the Yahoo-1M dataset and the clothing model are copyrighted and belong to Yahoo, we are unfortunately not authorized to release them to public. Sorry for the inconvenience.
However, our source codes and training guide are online available. You could train your deep hashing model on your own dataset. Please refer this link and you will find the source codes: https://github.com/kevinlin311tw/caffe-cvprw15
We hope this may help. Should you have any question, please feel free to contact me.
=================================================================
Q:
I have been recently reading your latest paper "Deep learning of binary hash codes for fast image retrieval." and I find it is interesting and well-written. Regarding the paper, I have some questions:
1. For the latent layer H distributed between the F7 and F8, could I know how to initialize it? In your paper, you mentioned about LSH randomized projection, you mean applying the LSH using the 4076 dimensional feature activated from the F7 feature?
2. Could I know some more details about the back-propagation SGD of the latent layer H, is there parameter in the H layer that are optimized in a different way as its neighboring layer F7 and F8? And if it is, could H be represented by H = wx + b and be optimized via w,b parameters?
A:
1) We apply Gaussian random initialization for the weights between F7 and the latent layer (i.e., the binary codes). Because LSH uses random projection to produce the codes, our network can be regarded as using LSH (i.e., random weights) to map the deep features learned in ImageNet (AlexNet feature F7) to binary codes in the initial stage. Then, the whole network is refined to find both better feature representations, binary codes, and the final classifier.
2) The activations of the latent layer can be represented by H = wx + b. The way we optimized the parameters w and b of the latent layer is the same as we did for the weights and biases of other layers. We applied SGD to fine-tune the entire network, that is, to optimize the weights and biases of every layer.
=================================================================
Q:
Could I know more detailed things about your hash paper?
1. For the latent layer in your CVPRW 15' paper, the latent layer you used here is the same as the fully connected layer however with the number of neurons equals to 48 or 128. Is it right?
2. Actually you have added two layers between fully-connected layer fc7 and fc8, corresponding to latent layer H and sigmoid layer to map the real values to [0,1], do i get it right? Do you need to insert an additional layer or threshold layer after the sigmoid layer?
A:
1) Yes, the latent layer mentioned in the paper is actually the fully connected layer with 48 or 128 neurons.
2) Yes, we added only two additional layers (fully connected layer + sigmoid layer) between fc7 and fc8.
=================================================================
Q:
Which caffe version r u using? I got problems in compiling the runtest and demo.m in your folder.
A:
Caffe has many updates. Since we started this work one year ago, it is hard to find the exact version we used. We thus advice you to download the caffe we used at https://github.com/kevinlin311tw/caffe-cvprw15.
=================================================================
Q:
I am interested in deep hashing recently. I read your cvprw15 paper, got and run your code from github. You really did a good job! Thanks very much for releasing the source code. Currently, I am doing some research on deep hashing, and want to know whether you can release the code of experimental part to reproduce your results, in order to save my time.
A:
We have released the evaluation scripts on github. You could now reproduce the experimental results presented in the paper. Since there are several improvements and updates, please download the latest source codes and compile it again.
=================================================================
Q:
I am very intrested in your recent work: Deep Learning of Binary Hash Codes for Fast Image Retrieval. I am trying the experiments in your paper, and I have two questions:
1. The syntax in train.prototxt, in which version of caffe do you use? I have not used this kind of prototxt with layer{ } in layer{ }
layers {
layer {
name: "conv1"
type: "conv"
num_output: 96
kernelsize: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.
}
blobs_lr: 1.
blobs_lr: 2.
weight_decay: 1.
weight_decay: 0.
}
bottom: "data"
top: "conv1"
}
2. What is the preprocessing of mnist data and cifar10 data, how two fit the 28*28 input or 32*32 input to 227*227 size? take cifar10 for example, is the train/test.leveldb file contain the original data download from internet or contain the data processed by yourself?
A:
1) Caffe has many updates. Since we started this work one year ago, it is hard to find the exact version we used. We thus advice you to download the caffe we used at https://github.com/kevinlin311tw/caffe-cvprw15.
2) Images of all datasets are normalized to 256*256 and then center-cropped to 227*227 as the network input. We first download the dataset and convert it to jpeg files. Then, we convert the images to leveldb format. Please refer this script and you will find our preprocessing: https://github.com/kevinlin311tw/caffe-cvprw15/blob/master/prepare.sh