Abstract: Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learning-based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and non-adaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.
Bio: Jiaqi Yan is a PhD candidate in Computer Science at the Illinois Institute of Technology, advised by Dong (Kevin) Jin. His research spans from modeling, simulation and emulation of software-defined network and blockchain system, to deep learning technologies for network intrusion detection and malware classification.