Deep convolutional neural networks have recently demonstrated incredible capabilities in areas such as image classification and object detection, but they require large datasets of quality pre-labeled data to achieve high levels of performance. Almost all data is not properly labeled when it is captured, and the process of manually labeling large enough datasets for effective learning is impractical in many real-world applications. New studies have shown that synthetic data, generated from a simulated environment, can be effective training data for DCNNs. However, synthetic data is only as effective as the simulation from which it is gathered, and there is often a significant trade-off between designing a simulation that properly models real-world conditions and simply gathering better real-world data. Using generative network architectures, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), it is possible to produce new synthetic samples based on the features of real-world data. This data can be used to augment small datasets to increase DCNN performance, similar to traditional augmentation methods such as scaling, translation, rotation, and adding noise. In this paper, we compare the advantages of synthetic data from GANs and VAEs to traditional data augmentation techniques. Initial results are promising, indicating that using synthetic data for augmentation can improve the accuracy of DCNN classifiers.
Aaron ChoiAlbert GiangSajit JumaniDavid LuongFabio Di Troia
Daniil D. DevyatkinIvan Sergeevich Trenev
Lars MeschederSebastian NowozinAndreas Geiger