knitr::opts_chunk$set(echo = TRUE)

Build Status AppVeyor Build Status

1.1 Name

IDPS Log Autoencoder Anomaly Detector (IDPS-LAAD)

1.2: IDPS-LAAD Title

This analytic data product is designed to construct an Autoencoder Neural Network (ANN) to detect anomalous data observations within Intrusion Detection Prevention System (IDPS) log file data. This analytic will test multiple ANN hyperparameters to determine the optimal settings supporting anomaly detection assuming that anomalous observations correspond to potentially adverse network traffic

1.3: IDPS Log Autoencoder Anomaly Detector Description

IDPS-LAAD is designed aid expeditious discovery of potentially hostile actions on a computer network protected by various intrusion detection sensors. The IDPS log files are consolidated by independent software systems to be analyzed by human cyber security experts. Most cyber security agencies have a limited cyber security staff, making the review of the number of cyber logs collected during normal day-to-day operations problematic. Additionally, cyber security staffs require a spectrum of analytic tools to analyze cyber security files to limit the probability of hostile cyber actions going undetected.

IDPS-LAAD is a tool to be used by cyber security experts with limited programming experience, and little-to-no statistical and/or machine learning expertise. Users will interact with the program via graphical user interface using keyboard and mouse inputs a with which a standard computer user is familiar. There are no known security concerns nor are there appearance constrains to which the program must adhere.

IDPS-LAAD is designed to read in a .csv containing the IDPS cyber log. The program will allow users to select what features (variables) in the data they wish to analyze. The program will then automatically prepare the data for ANN analysis. Once the data is prepared for ANN analysis, the IDPS-LAAD will automatically generate a designed experiment to test ANN hyperparameters, and subsequently run ANNs testing each of the hyperparameters. Once the 'optimal' hyperparameters are found the the analytic will analyze the IDPS cyber data and graphically display the results as well as output a user defined top 'n' anomalous observations.

Section 2

| Analytic Feature | Priority | Status | Value to User | User Inputs | Outputs | Purpose of Output | Sufficient Time to Deadline? | Required in Version? |
|:----------------------------------------------------------------------|---------:|:---------------------|:------------------------------------------------------------------|:---------------------------------|:-----------------------|:---------------------|:-----------------------------|:---------------------| | Import IDPS Cyber Log File | 7 | Not Started | GUI based file import | .csv IDPS Log | data frame | Information to User | No | No | | User feature Selector | 8 | Not Started | GUI based data preparation | GUI Input Selection | data frame | Information to User | No | No | | Adjust Data Frame for Missing Categorical Observations | 9 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Adjust Data Frame for Missing Port Number Observations | 10 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Adjust Data Frame for Missing IP Address Observations | 11 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Adjust Data Frame for Missing Continuous Observations | 12 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | One-Hot Encode Categorical Features | 13 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Convert IP Addresses to 32 Binary Columns | 14 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Convert Port Number to 16 Binary Columns | 15 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Scale each column of the data frame | 16 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Split data frame into train and test subsets | 17 | Not Started | automatically format data for ANN analysis | None | data frame | Internal to Analytic | No | No | | Query User for Custom or Default Hyperparameter Designed Experiment | 18 | Not Started | GUI based method to select novice/expert modes of operation | GUI Input Selection | None | Internal to Analytic | No | No | | Build Default or Import Custom Experimental Design | 1 | Default Complete | GUI based method to select novice/expert modes of operation | .csv Designed Experiment or None | DOE data frame | Internal to Analytic | Yes | Yes (Default Only) | | Run Designed Experiment | 2 | Complete | Identify best ANN hyperparameter settings | GUI Execute Button | Status Bar | Information to User | Yes | Yes | | Present Designed Experiment Results | 3 | Complete | Display Hyperparameter experimental design results | None | Test Design Results | Information to User | Yes | Yes | | Construct ANN using Best Hyperparameters Found In Designed Experiment | 4 | Complete | ANN trained to detect anomalies | GUI Execute Button | Status Bar | Information to User | Yes | Yes | | Use ANN to Detect Anomalies | 5 | Complete | Anomaly Detection | GUI Execute Button | Histogram of Anomalies | Identify Anomalies | Yes | Yes | | Display Top 'n' Anomalies | 6 | Complete | List of most likely anomales to evaluate for hostile cyber action | Number (n) anomalies to display | List/Table | Identify Anomalies | Yes | Yes |

# IDPS-LAAD will perform the following functions:
# 
# 1. Read in a .csv file composed of individual IDPS cyber log observations described by features
# 
# 2. Allow the user to select which features the user wants to retain for analysis
# 
# 3. Automatically convert the selected features into a format compatible with Autoencoder Neural Networks
#     i) Adjust for missing data
#         A.  Add a missing column for Port Numbers and IP Addresses with a value of '1' where that data is missing
#         B.  Missing categorical data will be flagged with a 'missing' label
#     ii) One-Hot Encode Categorical Features
#     iii) Convert IP Addresses into 32-bit binary number, each column containing one bit
#     iiii) Scale the data frame
#     iiiii) Split the data frame into training and test data frames
# 
# 4. Construct a designed experiment to test ANN hyperparameters
# 
# 5. Evaluate the ANNs using the hyperparameters in 4.
# 
# 6. Automatically select the optimal hyperparameters and build an ANN using those hyperparameters
# 
# 7. Graphically display via histogram anomalous observations using the ANN built in 6. 
# 
# 8. Output a user specified top 'n' anomalous observations


SpencerButt/IDPS-LAAD documentation built on April 20, 2020, 8:45 p.m.