This talk will explore a scalable data analytics pipeline for real-time attack detection through the use of customized honeypots at the National Center for Supercomputing Applications (NCSA). Attack detection tools are common and are constantly improving, but validating these tools is challenging. You must: (i) identify data (e.g., system-level events) that is essential for detecting attacks, (ii) extract this data from multiple data logs collected by runtime monitors, and (iii) present the data to the attack detection tools. On top of this, such an approach must scale with an ever-increasing amount of data, while allowing integration of new monitors and attack detection tools. All of these require an infrastructure to host and validate the developed tools before deployment into a production environment.
We will present a generalized architecture that aims for a real-time, scalable, and extensible pipeline that can be deployed in diverse infrastructures to validate arbitrary attack detection tools. To motivate our approach, we will show an example deployment of our pipeline based on open-sourced tools. The example deployment uses as its data sources: (i) a customized honeypot environment at NCSA and (ii) a container-based testbed infrastructure for interactive attack replay. Each of these data sources is equipped with network and host-based monitoring tools such as Bro (a network-based intrusion detection system) and OSSEC (a host-based intrusion detection system) to allow for the runtime collection of data on system/user behavior. Finally, we will present an attack detection tool that we developed and that we look to validate through our pipeline. In conclusion, the talk will discuss the challenges of transitioning attack detection from theory to practice and how the proposed data analytics pipeline can help that transition.