nDPI/tests/dga
Ivan Nardi d76007054c
Build system: Fix --disable-shared and --disable-static flags being ignored (#3044)
The configure flags --disable-shared and --disable-static were properly
recognized by libtool but ignored by nDPI's custom src/lib/Makefile.in,
which always built both static and shared libraries regardless of the
flags specified.

This commit fixes the issue by:

1. Exporting enable_shared and enable_static variables from configure.ac
   via AC_SUBST so they're available in Makefiles

2. Adding configure-time error checks:
   - Prevent both --disable-shared and --disable-static simultaneously
   - Require static library for --enable-fuzztargets (fuzz targets need
     static linking for proper instrumentation)

3. Modifying src/lib/Makefile.in to conditionally build libraries

4. Updating all build targets to support dynamic linking when static
   library is disabled.
   These targets now:
   - Use static library when available (preferred, default behavior)
   - Fall back to dynamic linking with -lndpi when --disable-static

5. Adding configuration summary output showing which libraries will be
   built (enabled/disabled status for both shared and static)

fuzz: disable creation of (unused) shared library

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-25 15:32:29 +01:00
..
dga_evaluate.c Check ndpi_finalize_initialization() return value (#2884) 2025-06-14 11:31:23 +02:00
Makefile.in Build system: Fix --disable-shared and --disable-static flags being ignored (#3044) 2025-11-25 15:32:29 +01:00
README.md Added reference to the new DGA model 2024-10-26 21:20:48 +02:00
test_dga.csv Improved DGA detection 2021-03-03 19:30:01 +01:00
test_non_dga.csv Implement DGA detection performances tracking workflow. (#1064) 2020-11-16 21:17:16 +01:00

DGA detection testing workflow

Overview

nDPI provides a set of threat detection features available through NDPI_RISK detection.

As part of these features, we provide DGA detection.

Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers.

DGA detection heuristic is implemented here.

DGA performance tests and tracking allow us to detect automatically if a modification is harmful.

The modification can be a simple threshold change or a future lightweight ML approach.

Developers interested in DGA detection using ML should also visit this folder.

Used data

Original used dataset is a collection of legit and DGA domains (balanced) that can be obtained as follows:

wget https://raw.githubusercontent.com/chrmor/DGA_domains_dataset/master/dga_domains_full.csv

We split the dataset into DGA and NON-DGA and we keep 10% of each as test set and 90% as training set.

python3 -m pip install pandas
python3 -m pip install sklearn

Instruction using python3

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("dga_domains_full.csv", header=None, names=["type", "family", "domain"])
df_dga = df[df.type=="dga"]
df_non_dga = df[df.type=="legit"]
train_non_dga, test_non_dga = train_test_split(df_non_dga, test_size=0.1, shuffle=True, random_state=27)
train_dga, test_dga = train_test_split(df_dga, test_size=0.1, shuffle=True, random_state=27)

test_dga["domain"].to_csv("test_dga.csv", header=False, index=False)
test_non_dga["domain"].to_csv("test_non_dga.csv", header=False, index=False)
train_dga["domain"].to_csv("train_dga.csv", header=False, index=False)
test_non_dga["domain"].to_csv("test_non_dga.csv", header=False, index=False)

Detection approach must be built on top of training set only, test set must be kept as unseen cases for testing

dga_evaluate

After nDPI compilation, you can use dga_evaluate helper to check number of detections out of an input file.

dga_evaluate <file name>

You can evaluate your modifications performances before submitting it as follows:

./do-dga.sh

If your modifications decreases baseline performances, test will fail. If not (well done), test passed, and you must update the baseline metrics with your obtained ones.