Eyal HorowiczTal ShapiraYuval Shavitt
Internet traffic classification has been intensively studied over the past decade due to its importance for traffic engineering and cyber security. A promising approach to several traffic classification problems is the FlowPic approach, where histograms of packet sizes in consecutive time slices are transformed into a picture that is fed into a Convolution Neural Network (CNN) model for classification. However, CNNs (and the FlowPic approach included) require a relatively large labeled flow dataset, which is not always easy to obtain. In this paper, we show that we can overcome this obstacle by using Contrastive Representation Learning in order to learn from an unlabeled flow dataset a flow representation that can be embedded in a latent space, enabling clustering of flows belonging to the same class together. We then show that by using just a few labeled flows (a few shots) from each class, we can achieve high accuracy in flow classification. We show that common picture augmentation techniques can help, but accuracy improves further when introducing augmentation techniques that mimic network behavior, such as changes in the RTT (Round-trip time). Finally, we show that we can replace the large FlowPics suggested in the past with much smaller mini-FlowPics and achieve two advantages: improved model performance and easier engineering. Interestingly, this even improves accuracy in some cases.
Anh-Khoa Tho NguyenTin T. TranPhuc Hong NguyenVinh Quang Dinh
Ji LiChunxiang GuLuan LuanFushan WeiWenfen Liu
Ojas Kishore ShirekarHadi Jamali‐Rad
Yuexuan AnHui XueXingyu ZhaoLu Zhang
Tang Xu-wenTeng ZhuBaopeng ZhangJianping Fan