Abstract
Current face recognition usually faces problems with the training dataset due to the insufficient size and potential manual labelling errors. The project introduces a dataset construction and filtering process to deal the problem with less cost. FaceNet[35] and Sphereface[29] are harnessed for the purpose of filtering the dataset scratched from Google. Results show the impressive effectiveness of automatic filtering and purity enhancement after filtering with considerable attention on labeling errors in the view of web search. Except exclusively self-constructed dataset, filtered and merged dataset from CASIA-WebFace[54] and VGG Face [32] were also tested and analyzed. Subsequent research and experiment can target at the further improvement of filtering process with lower false negative rate as well as getting rid of labeling errors due to web search. And those further improvements are expected to contribute more to the unsupervised learning in the general fine-grained object recognition.