2.6 KiB
DGA detection testing workflow
Overview
nDPI provides a set of threat detection features available through NDPI_RISK detection.
As part of these features, we provide DGA detection.
Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers.
DGA detection heuristic is implemented here.
DGA performance tests and tracking allow us to detect automatically if a modification is harmful.
The modification can be a simple threshold change or a future lightweight ML approach.
Developers interested in DGA detection using ML should also visit this folder.
Used data
Original used dataset is a collection of legit and DGA domains (balanced) that can be obtained as follows:
wget https://raw.githubusercontent.com/chrmor/DGA_domains_dataset/master/dga_domains_full.csv
We split the dataset into DGA and NON-DGA and we keep 10% of each as test set and 90% as training set.
python3 -m pip install pandas
python3 -m pip install sklearn
Instruction using python3
from sklearn.model_selection import train_test_split
import pandas as pd
df = pd.read_csv("dga_domains_full.csv", header=None, names=["type", "family", "domain"])
df_dga = df[df.type=="dga"]
df_non_dga = df[df.type=="legit"]
train_non_dga, test_non_dga = train_test_split(df_non_dga, test_size=0.1, shuffle=True, random_state=27)
train_dga, test_dga = train_test_split(df_dga, test_size=0.1, shuffle=True, random_state=27)
test_dga["domain"].to_csv("test_dga.csv", header=False, index=False)
test_non_dga["domain"].to_csv("test_non_dga.csv", header=False, index=False)
train_dga["domain"].to_csv("train_dga.csv", header=False, index=False)
test_non_dga["domain"].to_csv("test_non_dga.csv", header=False, index=False)
Detection approach must be built on top of training set only, test set must be kept as unseen cases for testing
dga_evaluate
After nDPI compilation, you can use dga_evaluate helper to check number of detections out of an input file.
dga_evaluate <file name>
You can evaluate your modifications performances before submitting it as follows:
./do-dga.sh
If your modifications decreases baseline performances, test will fail. If not (well done), test passed, and you must update the baseline metrics with your obtained ones.