AICS 2017 Workshop Mini-Challenge Problem

Classification algorithms have been known to suffer from poor learning performance when applied to imbalanced data. A dataset is imbalanced if the classification categories are not approximately equally represented. Imbalanced data becomes especially problematic when the proportion of target to non-target observations is very small. For problems in the field of cyber security, such significantly imbalanced data is common. Detecting system intrusions from network traffic data and identifying malicious insiders who are utilizing cyber means to gain access to restricted network resources are two examples of this phenomenon.

Recently, techniques from the field of Machine Learning (ML) have been developed to improve the performance of classifier learning algorithms for problems with imbalanced data distributions. These techniques are general in nature and do not leverage domain-specific aspects of the processes that generate the data.

The challenge problem focuses on the intersection of ML techniques and domain-based knowledge unique to cyber security. It is of interest to explore how general ML techniques for classifying imbalanced data can be combined with domain-specific knowledge of the network environment to optimize learning performance for cyber security classification problems.

Below is an example solution for the cyber event classification problem given above. The example solution describes results from experiments using the Los Almos dataset. This mini challenge is the first in a series of planned challenges that seek to stimulate research into the emerging field of artificial intelligence applied to cyber security.