DroneFace: An Open Dataset for Drone Research

Hwai-Jung Hsu
Kuan-Ta Chen

PDF Version | Contact Us

Abstract

In this paper, we present DroneFace, an open dataset for testing how well face recognition can work on drones. Because of the high mobility, drones, i.e. unmanned aerial vehicles (UAVs), are appropriate for surveillance, daily patrol or seeking lost people on the streets, and thus need the capability of tracking human targets' faces from the air. Under this context, drones' distances and heights from the targets influence the accuracy of face recognition. In order to test whether a face recognition technique is suitable for drones, we establish DroneFace composed of facial images taken from various combinations of distances and heights for evaluating how a face recognition technique works in recognizing designated faces from the air. Since Face recognition is one of the most successful application in image analysis and understanding, and there exist many face recognition database for various purposes. To the best of our knowledge, DroneFace is the only dataset including facial images taken from controlled distances and heights within unconstrained environment, and can be valuable for future study of integrating face recognition techniques onto drones.

1  Introduction

DroneFace is an open dataset for testing how well face recognition can work on drones. Because of the high mobility, drones, i.e. unmanned aerial vehicles (UAVs), are appropriate to be applied for surveillance, daily patrol or seeking lost people on the streets. Consequently, the capability for tracking human targets' faces from the air is essential to such applications. The distances between drones and targets bring the first challenge. Drones may not be able to recognize the targets correctly in long distances, and can be threatened if they approaches a malicious target too close. On the other hand, the altitudes of drones influence the pitch angle of the faces captured. Large depression angle resulted from high altitude diminishes the accuracy of face recognition. Testing the appropriate range in distances and altitudes for drones on tracking targets is therefore essential for the feasibility to apply drones in surveillance missions on the streets.
DroneFace contains a series of static human face images taken in uncontrolled outdoor environment with popular commercial sports camera from various fixed distances and heights. In [82015Hsu and ChenHsu and Chen], Hsu et al. used DroneFace to simulate the context that a drone seeks lost people on the streets, and tries to recognize the specific target from the air based on the face recognition model established from a few portrait photos. Two modern face recognition techniques at that time, ReKognition API [12017AmazonAmazon] and Face++ [152017Tech.Tech.], are verified using DroneFace. Although there exist many face recognition databases for various purposes [22015CalistraCalistra,62007Grgic and DelacGrgic and Delac]. To the best of our knowledge, DroneFace is the only dataset including facial images taken from controlled distances and heights within unconstrained environment. After [82015Hsu and ChenHsu and Chen] was published, many researchers have requested to access DroneFace for their own study work. We believe that DroneFace is valuable to face recognition research community because of the following reasons:
data_aquisition.png
Figure 1: How raw images within DroneFace are collected conceptually [82015Hsu and ChenHsu and Chen]
  1. The images are taken from various specific combinations of distances and heights for all the subjects. Therefore, DroneFace is suitable for testing the effective range of an approach in recognizing faces from various distances and heights. DroneFace can also be adopted for comparing the capability of different face recognition methods while applied on drones.
  2. The raw images which are taken in uncontrolled out-door environment are also included in the dataset, and therefore, DroneFace can also be adopted for testing the capability and effective range of face detection methods.
  3. Multiple subjects (2 to 3) are shown in the raw images, and can be used for verifying whether a face detection or recognition is able to track multiple target faces within one picture.
  4. DroneFace includes frontal and side portrait images of each subject for model building and training. The portrait images can also be used for constructing 3D face model of the subjects for studying how 3D models may help face recognition.
  5. 7 out of the 11 subjects provided their own portrait images, and the images are also contained in DroneFace. These images are helpful in simulating recognizing the target in the streets with the face recognition model constructed from the target's profile picture.
Thanks for the agreements from all the subjects, now, DroneFace is opened to research communities under Open Database License (ODbL) [32009CommonsCommons] which requires the researchers using DroneFace to attribute any public use of DroneFace and share any adopted database or works produced from DroneFace alike. Besides the regulation described in ODbL [32009CommonsCommons], DroneFace is released without further limits. All the information about DroneFace can be found on its github project page - ">https://hjhsu.github.io/DroneFace/">https://hjhsu.github.io/DroneFace/.
Figure 2: To simulate taking pictures from UAVs

2  Background

Face recognition is one of the most successful application in image analysis, understanding and computer vision [162003Zhao et al .Zhao, Chellappa, Phillips, and Rosenfeld]. Thus there are various open or limited-usage databases/datasets for all kinds of studies about face recognition [22015CalistraCalistra,62007Grgic and DelacGrgic and Delac]. "Labeled faces in the wild" (LFW) [112007Huang et al .Huang, Ramesh, Berg, and Learned-Miller,102014Huang and Learned-MillerHuang and Learned-Miller] is one of the most famous datasets for study of unconstrained face recognition. LFW contains more than 13,000 images of faces collected from the web, and each of the image are labeled with the name of the person pictured [112007Huang et al .Huang, Ramesh, Berg, and Learned-Miller,102014Huang and Learned-MillerHuang and Learned-Miller]. The FERET database is another well-known face recognition database [141998Phillips et al .Phillips, Wechsler, Huang, and Rauss]. The FERET database established by National Institute of Standards and Technology is a large database with a total 14,126 images taken from 1,199 individuals [141998Phillips et al .Phillips, Wechsler, Huang, and Rauss].
Besides the well-famous face recognition database mentioned above, there are some other database established with similar purpose as DroneFace. The FiA, "Face-in-Action", dataset consists of 20-seconds videos of face data from 180 individuals captured by 6 synchronized cameras from 3 different angles [42005Goh et al .Goh, Liu, Liu, and Chen]. SCface, a Surveillance Cameras Face Database, collected facial images of 130 subjects with 8 different surveillance cameras from various distances in both bright and dark (near infrared photography) environment [72011Grgic et al .Grgic, Delac, and Grgic]. Maeng et al. collected visible and near-infrared facial images from 100 subjects at distances 60m, 100m, and 150m outdoors, and 1m distance indoors [132012Maeng et al .Maeng, Liao, Kang, Lee, and Jain].
faces.png
Figure 3: The extracted facial images taken from various distances and heights
sample_portraits.png
Figure 4: The sample portraits in which (a), (b) and (c) are the frontal, left, and right faces of subject a, and (d) is the portrait image handed by subject a
To the best of our knowledge, there is still no face recognition database like DroneFace containing images captured from various controlled combination of distances and depression angles using commercial sports camera. SCFace [72011Grgic et al .Grgic, Delac, and Grgic] is a limited-access face recognition database which is established with facial images captured by surveillance camera located at in-door environment with outdoor light sources from the windows around the environment. SCFace is unique because it applies commercial surveillance camera and simulate the real world contexts for study. Although DroneFace does not include so many subjects as SCFace does, DroneFace includes facial images captured from various specific combinations of distances and heights in fine-grained intervals (0.5 meters interval in ground distances, and 1 meter interval in heights). SCFace is suitable for verifying face recognition approaches for real-world in-door scenario, and on the other hand, DroneFace is good in testing the limits of face recognition methods for moving cameras in out-door environment. The work made by Maeng et al. [132012Maeng et al .Maeng, Liao, Kang, Lee, and Jain] provides facial images captured in various distances. However, Maeng's work is quite different to DroneFace in distance scale (60-150 meters versus 2-17 meters). Besides, DroneFace includes four kinds of portrait images for training face recognition model, the portrait images of subjects' frontal, left, and right faces, and the portraits handed by the subjects themselves. These portraits are good for simulating training face recognition model in real world, and make DroneFace unique in various face recognition database.

3  Database Description

DroneFace was designed for testing whether modern face recognition techniques can be properly applied for UAV surveillance missions in real world. For UAV surveillance, the distances and heights from drones to targets influence the accuracy of face recognition differently. While UAV is far from the target, the distances make face recognition difficult. However, while UAV approaches the target, although the distance between UAV and the target get close such that face recognition becomes easier, the depression angle brought by the flying height of UAV affects the pitch angle of target's face image, and thus raises the difficulties in recognizing the target correspondingly. We simulated the scenario of looking for some designated target (e.g. lost people, etc.) in the streets using UAVs with high mobility. The face recognition model is built based on the target's profile portrait picture(s) or the daily pictures obtained from the target's relatives. DroneFace is established to answer that under the contexts, how far and how well drones equipped with commercial sports camera (GoPro Hero3+ Silver Edition [52015GoProGoPro]) can recognize the targets in uncontrolled environments.
fd_plus.png
Figure 5: Heat map for comparing face detection performance among various methods [82015Hsu and ChenHsu and Chen]

3.1  Image Acquisition

First, 11 subjects including 7 males and 4 females are recruited. All of them are Taiwanese with ages from 23 to 36 years old and heights from 157 to 183 centimeters. All the subjects are asked to take frontal and side portrait images before we start collecting raw images. The portrait pictures are taken 1 meters away from the subject using our GoPro and the camera built in a smart phone (HTC One M8 [92014HTCHTC]). Three portrait pictures (front, left, and right) are taken for each subject with both the cameras accordingly, and totally 66 portrait pictures are collected. In addition, 7 out of the 11 subjects handed in their own portrait pictures for model training. After taking the portrait images, the subjects are divided into 5 groups (4 group of 2 subjects and 1 group of 3 subjects), and are asked to stand side by side right behind a baseline awaiting for being pictured group after group.
As sketched in Figure 1, while picturing raw images, we tried to simulate the context that the UAV approaches the targets, and take the frontal picture of the targets with the sports camera attached on the UAV. The direction of the sports camera equipped on the UAV is fixed toward the direction that the UAV is heading. We simulate a UAV flying in three different heights 3, 4, and 5 meters from the ground, and we also take pictures from 1.5 meters high which is about the height of the subjects' head for comparison. However, controlling a UAV to repeat its flight for taking pictures at exact the same position and heights among different subjects is difficult. As Figure 2 illustrates, we use a long stick, attach our GoPro on the top of the stick for picturing, and change the length of the stick to mimic the UAV flying in different heights. The photographer took pictures for 4 rounds. Each round, the GoPro is setup on different heights (1.5, 3, 4, and meters from ground), we took pictures from 17 meters away from the subjects to 2 meters with 0.5 meters ahead in each step. 31 pictures of resolution 3,680x2,760 are taken in each round. Therefore, 124 pictures are taken for one group, and totally we took 620 raw images of the 11 subjects from various distances and heights.
The raw images are then labeled with the corresponding distance and height. After the labeling, we apply several face detection methods in OpenCV  citeopencv to ease the hard working in extracting faces from the raw images. The alternative Haar-based method  citeopencv helped recognizing most of the faces (1,050 out of the 1,364 faces), and the rest unrecognized faces are extracted manually. The faces automatically recognized are also cropped based on the face area indicated by the algorithm with 10
11 subjects including 7 males and 4 females.
2,057 pictures including 620 raw images, 1,364 frontal face images, and 73 portrait images
The raw images are taken in 3,680x2,760 resolution with ultra-wide field of view (170°) under daylights.
The resolutions of the facial images are between 23x31 and 384x384.
The raw images are taken from 1.5, 3, 4, and 5 meters high.
The raw images are taken 2 to 17 meters away from the subjects with 0.5 meters interval.
The 3-direction portrait images are taken by sports and phone cameras for comparison.

3.2  The Dataset Forming

All the images in DroneFace are named in the following manner:
 
subjectID_cameraType_heightID_imageType_distanceID.jpg  
subjectID [[a-k] - ab - cd - ef - gh - ijk]
cameraType [gp - cam - na]
heightID [0 - 3 - 4 - 5 - na]
imageType [eo - ef - por - por[F - L - R]
distanceID [00-30 - na]
roc_both_photo.png
Figure 6: Heat map for comparing face recognition performance among Face++ and ReKognition API [82015Hsu and ChenHsu and Chen]
11 subjects are named with English letters, a to k. The subject a, b, c, e, g, j, and k are males, and the remainders are females. If the subjectID part contains merely one letter means only one subject is in the image; on the other hand, there are multiple ones. The code "gp" in cameraType means the picture is taken using our sports camera (GoPro Hero3+ Silver Edition), and "cam" indicates that the pictures is taken using the HTC One M8 smart phone. heightID 0, 3, 4, and 5 represents that the camera is 1.5, 3, 4, and 5 meters high from the ground accordingly while the picture is taken. imageType ëo" means that the picture is a raw image (e.g. the original picture), ëf" means that the image is a frontal facial image extracted from a raw image, "por" means that the picture is the portrait handed by the subject, and "porF", "porL", or "porR" means the pictures is the portrait images of the subjects' front, left, or right faces. The distance ID is a two digit number, and the actual distance from the subject to the camera equals to 17-(distanceID/2) meters. For any of the components in the filename, "na" represents that the corresponding information is not available.

4  How DroneFace can be applied for study work

In this section, we summarize our previous work done in [82015Hsu and ChenHsu and Chen] to demonstrate how DroneFace can be adopted for evaluation of face recognition on drones.
The evaluation is separated into two parts. First, the performance of face detection among Face++, ReKognition API, and the methods in OpenCV [121999IntelIntel] (including four Haar and one LBP-based methods) are evaluated using the 620 raw images. Table 1 shows the corresponding results composed of (1) the total number of faces detected (# of faces), (2) the true positive rate (TPR, number of correctly detected faces/total number of the target faces), and (3) the false positive rate (FPR, number of wrongly detected faces/total number of faces detected).
Table 1: Performance of face detection [82015Hsu and ChenHsu and Chen]
Method # of faces TPR FPR
Face++ 20 0.14 0.05
ReKognition API 37 0.27 0.13
Haar (default) 14,777 0.71 0.93
Haar (alt) 1,700 0.77 0.37
Haar (alt2) 2,545 0.78 0.57
Haar (alt tree) 510 0.37 0.002
LBP 2,964 0.63 0.70
[para,flushleft] *The methods are asked to detect 1,364 target faces from 620 raw image.
As Table 1 shows, the alternative Haar-based method (Haar alt) performs the best with relatively high TPR and low FPR. On the contrary, both Face++ and ReKognition API perform poorly in detecting faces directly from the raw images because of the build-in resize mechanism. The sports camera we used for picturing produces pictures much larger than Face++ and ReKognition API can handle(10 mega pixels). Face++ and ReKognition API accepted the raw images after resizing, and their performance in face detection is thus diminished accordingly.
We further compare the face detection capability between Face++, ReKognition API and OpenCV methods using the 1,364 facial images extracted from the raw images. As a result, Face++ and ReKognition API detect 885 and 984 faces from all the facial images accordingly. Figure 5 shows the heat map about the face detection rate (# detected faces / # faces in the images) of Face++, ReKognition API and two OpenCV methods among various settings in heights and distances. The influences introduced by distances and heights are obvious. Haar (alt2) performs better for distances beyond 12 meters, while Face++ and ReKognition API give a better detection in heights of 3 and 4 meters. All the methods suffer poor performance in recognizing faces from short distances (less than 4 meters) and the largest heights (5 meters), i.e., with large depression angles.
To evaluate the face recognition performance of Face++ and ReKognition API, we define matched and mismatched cases as following. A matched case represents the face under recognition belonging to the owner of the model used for recognition. On the contrary, a mismatched case is recognitions between a face and the models belonging to the subjects other than the face owner. The rating for a mismatched case is the mean value of the ratings for all such recognitions. Because Face++ and ReKognition API gave different standards in rating the faces under recognition, different thresholds for considering whether a face is recognized or not are set for Face++ and ReKognition API according to our study. The details are described in [82015Hsu and ChenHsu and Chen], and here we presents the results for how good Face++ and ReKognition API in distinguish matched case and mismatched case in Figure 6.
Area under ROC curve is used as the metric of distinguishability. We assumed both the methods rate the undetected faces 0 for both matched and mismatched cases, and took 0.75 as the standard of acceptable distinguishability. Face++ is applicable on drones while the distances are within 12 meters. As for ReKognition API, it is 14 meters. Both the methods show no distinguishability in large angles of depression (with 5 meters in heights and ground distances less than 3 meters). Both the methods need distances away from the targets to prevent the influences introduced by angle of depression. Face++ needs about 3 and 5 meters on the ground for heights in 4 and 5 meters correspondingly, and ReKognition API needs 3 meters on the ground for heights in 5 meters.

5  The potential applications of DroneFace

Because all the images are pictured at out-door with specific distances and heights away from the subjects, DroneFace especially suits for testing the limits while deploying face recognition approaches at places with certain heights like drones or fixed surveillance cameras hung on the streets. For the same reason, the raw images are also appropriate to detect the limits of face detection algorithms for detecting faces in various distances and pitch angles. Besides, with the side portraits attached in the dataset, DroneFace can also be helpful in evaluating how 3D facial models may improve the performance of face recognition. On the other hand, the facial image can also be adopted for training face tracking algorithms to see whether a tracking algorithm can continually track targets' faces appear in different distances with various pitch angles.

6  Concluding Remarks

In this paper, we represent DroneFace, a dataset composed of facial images taken from various distances and heights for testing the applicability of face recognition techniques on drones. DroneFace is shown useful in exposing the limits of face recognition approaches applied on drones [82015Hsu and ChenHsu and Chen]. Nevertheless, since DroneFace is unique in containing pictures taken with controlled distances and heights in unconstrained environment, it can also be applied for other kinds of application such as training the face tracking algorithms, testing the face detection approaches etc. Besides, DroneFace includes the frontal and side portrait images from the subjects for training face recognition model or 3D face model. By get DroneFace opened, we hope the dataset can be beneficial for future study in integrating face recognition onto drones.

References

[12017AmazonAmazon] Amazon. 2017. Amazon Rekognition. https://aws.amazon.com/rekognition/">https://aws.amazon.com/rekognition/. (2017).
[22015CalistraCalistra] Cole Calistra. 2015. 60 Facial Recognition Databases. https://www.kairos.com/blog/60-facial-recognition-databases">https://www.kairos.com/blog/60-facial-recognition-databases. (May 2015).
[32009CommonsCommons] Open Data Commons. 2009. Open Data Commons Open Database License (ODbL). https://opendatacommons.org/licenses/odbl/">https://opendatacommons.org/licenses/odbl/. (2009).
[42005Goh et al .Goh, Liu, Liu, and Chen] Rodney Goh, Lihao Liu, Xiaoming Liu, and Tsuhan Chen. 2005. The CMU face in action (FIA) database. In International Workshop on Analysis and Modeling of Faces and Gestures. Springer, 255-263.
[52015GoProGoPro] GoPro. 2015. GoPro Hero3+ Silver Edition. https://gopro.com/update/hero3_plus">https://gopro.com/update/hero3_plus. (2015).
[62007Grgic and DelacGrgic and Delac] Mislav Grgic and Kresimir Delac. 2007. Face Rcognition Homepage. http://www.face-rec.org/databases/">http://www.face-rec.org/databases/. (2007).
[72011Grgic et al .Grgic, Delac, and Grgic] Mislav Grgic, Kresimir Delac, and Sonja Grgic. 2011. SCface-surveillance cameras face database. Multimedia tools and applications 51, 3 (2011), 863-879.
[82015Hsu and ChenHsu and Chen] Hwai-Jung Hsu and Kuan-Ta Chen. 2015. Face Recognition on Drones: Issues and Limitations. In Proceedings of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications for Civilian Use (DroNet '15). 39-44.
[92014HTCHTC] HTC. 2014. HTC One M8. http://www.htc.com/us/support/htc-one-m8/">http://www.htc.com/us/support/htc-one-m8/. (2014).
[102014Huang and Learned-MillerHuang and Learned-Miller] Gary B. Huang and Erik Learned-Miller. 2014. Labeled Faces in the Wild: Updates and New Reporting Procedures. Technical Report UM-CS-2014-003. University of Massachusetts, Amherst.
[112007Huang et al .Huang, Ramesh, Berg, and Learned-Miller] Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.
[121999IntelIntel] Intel. 1999. Open Source Computer Vision Library. http://opencv.org/">http://opencv.org/. (1999).
[132012Maeng et al .Maeng, Liao, Kang, Lee, and Jain] Hyunju Maeng, Shengcai Liao, Dongoh Kang, Seong-Whan Lee, and Anil K Jain. 2012. Nighttime face recognition at long distance: Cross-distance and cross-spectral matching. In Asian Conference on Computer Vision. Springer, 708-721.
[141998Phillips et al .Phillips, Wechsler, Huang, and Rauss] P Jonathon Phillips, Harry Wechsler, Jeffery Huang, and Patrick J Rauss. 1998. The FERET database and evaluation procedure for face-recognition algorithms. Image and vision computing 16, 5 (1998), 295-306.
[152017Tech.Tech.] Megvii Tech. 2017. Face++. https://www.faceplusplus.com.cn/">https://www.faceplusplus.com.cn/. (2017).
[162003Zhao et al .Zhao, Chellappa, Phillips, and Rosenfeld] Wenyi Zhao, Rama Chellappa, P Jonathon Phillips, and Azriel Rosenfeld. 2003. Face recognition: A literature survey. ACM computing surveys (CSUR) 35, 4 (2003), 399-458.


Sheng-Wei Chen (also known as Kuan-Ta Chen)
http://www.iis.sinica.edu.tw/~swc 
Last Update September 19, 2017