Commit graph

907 commits

Author SHA1 Message Date
Ivan Nardi
d577508727
fuzz: extend fuzzing coverage (#2281) 2024-01-24 21:16:58 +01:00
Ivan Nardi
9b26e74bb7
example: rework code between ndpiReader.c and reader_util.c (#2273) 2024-01-22 18:12:06 +01:00
Ivan Nardi
82e8bf91dd
Improve handling of custom rules (#2276)
Avoid collisions between user-ids and internal-ids protocols in the
`example/protos.txt` file.
Add a new value for the classification confidence:
`NDPI_CONFIDENCE_CUSTOM_RULE`

With `./example/ndpiReader -p example/protos.txt -H` we now see also the
custom protocols and their internal/external ids:

```
nDPI supported protocols:
 Id Userd-id Protocol               Layer_4    Nw_Proto Breed        Category
  0        0 Unknown                TCP        X        Unrated      Unspecified

...

387      387 Mumble                 UDP        X        Fun          VoIP
388      388 iSCSI                  TCP                 Acceptable   Unspecified
389      389 Kibana                 TCP                 Acceptable   Unspecified
390      390 TestProto              TCP                 Acceptable   Unspecified
391      391 HomeRouter             TCP                 Acceptable   Unspecified
392      392 CustomProtocol         TCP                 Acceptable   Unspecified
393      393 AmazonPrime            TCP                 Acceptable   Unspecified
394      394 CustomProtocolA        TCP                 Acceptable   Unspecified
395      395 CustomProtocolB        TCP                 Acceptable   Unspecified
396      800 CustomProtocolC        TCP                 Acceptable   Unspecified
397     1024 CustomProtocolD        TCP                 Acceptable   Unspecified
398     2048 CustomProtocolE        TCP                 Acceptable   Unspecified
399     2049 CustomProtocolF        TCP                 Acceptable   Unspecified
400     2050 CustomProtocolG        TCP                 Acceptable   Unspecified
401    65535 CustomProtocolH        TCP                 Acceptable   Unspecified
```

We likely need to take a better look in general at the iteration between
internal and external protocols ids...

This PR fixes the issue observed in
https://github.com/ntop/nDPI/pull/2274#discussion_r1460674874 and in
https://github.com/ntop/nDPI/pull/2275.
2024-01-21 19:53:32 +01:00
Ivan Nardi
42d23cff6a
config: follow-up (#2268)
Some changes in the parameters names.
Add a fuzzer to fuzz the configuration file format.
Add the infrastructure to configuratin callbacks.
Add an helper to map LRU cache indexes to names.
2024-01-20 16:14:41 +01:00
Nardi Ivan
0712d496fe config: allow configuration of guessing algorithms 2024-01-18 10:21:24 +01:00
Nardi Ivan
6c85f10cd5 config: move debug/log configuration to the new API 2024-01-18 10:21:24 +01:00
Nardi Ivan
88720331ae config: remove enum ndpi_prefs 2024-01-18 10:21:24 +01:00
Nardi Ivan
1289951b32 config: remove ndpi_set_detection_preferences() 2024-01-18 10:21:24 +01:00
Nardi Ivan
311d8b6dae config: move cfg of aggressiviness and opportunistic TLS to the new API 2024-01-18 10:21:24 +01:00
Nardi Ivan
f55358973f config: move LRU cache configurations to the new API 2024-01-18 10:21:24 +01:00
Nardi Ivan
3107a95881 Make ndpi_finalize_initialization() returns an error code
We should check if the initialization was fine or not
2024-01-18 10:21:24 +01:00
Nardi Ivan
d72a760ac3 New API for library configuration
This is the first step into providing (more) configuration options in nDPI.

The idea is to have a simple way to configure (most of) nDPI: only one
function (`ndpi_set_config()`) to set any configuration parameters
(in the present or on in the future) and we try to keep this function
prototype as agnostic as possible.

You can configure the library:
* via API, using `ndpi_set_config()`
* via a configuration file, in a text format

This way, anytime we need to add a new configuration parameter:
* we don't need to add two public functions (a getter and a setter)
* we don't break API/ABI compatibility of the library; even changing
the parameter type (from integer to a list of integer, for example)
doesn't break the compatibility.

The complete list of configuration options is provided in
`doc/configuration_parameters.md`.

As a first example, two configuration knobs are provided:
* the ability to enable/disable the extraction of the sha1 fingerprint of
the TLS certificates.
* the upper limit on the number of packets per flow that will be subject
to inspection
2024-01-18 10:21:24 +01:00
Luca
ca7df1db82 Improved ndpi_get_host_domain 2024-01-16 07:25:03 +01:00
Luca
1637a991a4 Added ndpi_get_host_domain() for returning the host domain
vs ndpi_get_host_domain_prefix() that instead returnd the host TLD
2024-01-16 06:56:51 +01:00
Ivan Nardi
111015b872
ndpiReader: improve the check on max number of pkts processed per flow (#2261)
Allow to disable this check.

I don't know how much sense these limits have in the application
(especially with those default values...) since we have always had a
hard limit on the library itself (`max_packets_to_process` set to 32).
The only value might be that they provide different limits for TCP and
UDP traffic.

Keep them for the time being...
2024-01-15 20:12:57 +01:00
Nardi Ivan
b22fa558ff ndpiReader: fix memory leak
Change the working directory of `ndpiReader` in the Github Actions so
that it can load the domain suffix list during `domainsUnitTest()`
2024-01-15 19:49:27 +01:00
Luca
162c38f18f Added new API calls
- ndpi_load_domain_suffixes()
- ndpi_get_host_domain_suffix()

whose goal is to find the domain name of a hostname. Example:

www.bbc.co.uk   -> co.uk
mail.apple.com  -> com
2024-01-15 19:03:46 +01:00
Ivan Nardi
dd8be1fcb1
Fix some warnings reported by CODESonar (#2227)
Remove some unreached/duplicated code.

Add error checking for `atoi()` calls.

About `isdigit()` and similar functions. The warning reported is:
```
Negative Character Value help
isdigit() is invoked here with an argument of signed type char, but only
has defined behavior for int arguments that are either representable
as unsigned char or equal to the value of macro EOF(-1).
Casting the argument to unsigned char will avoid the undefined behavior.
In a number of libc implementations, isdigit() is implemented using lookup
tables (arrays): passing in a negative value can result in a read underrun.
```
Switching to our macros fix that.
Add a check to `check_symbols.sh` to avoid using the original functions
from libc.
2024-01-12 13:30:43 +01:00
Toni
6c3d162cd6
Add realtime protocol output to ndpiReader. (#2197)
* support for using a new flow callback invoked before the flow memory is free'd
 * minor fixes

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2024-01-09 00:39:59 +01:00
Ivan Nardi
40797521af
ndpiReader: add breed stats on output used for CI (#2236) 2024-01-05 13:02:39 +01:00
Ivan Nardi
f23e9dc7bb
Add an implementation of the BSD function strtonum (#2238)
The main difference with the original function is that we allow to
specify the base.
Credit for the original idea and the first implementation to @0xA50C1A1
2024-01-04 13:16:39 +01:00
Luca
2f657cb8f9 Implemented ndpi_is_outlier() for detecting outliers using z-score 2023-12-28 19:59:54 +01:00
Luca Deri
1366518bff Implements ndpi_pearson_correlation for measuring how correlated are two series 2023-12-27 22:42:37 +01:00
Luca Deri
8285fffdae Implements JA4 Support (#2191) 2023-12-22 20:40:42 +01:00
Ivan Nardi
a5595d16c0
CI: update list of compilers (#2223)
Try using latest gcc and clang versions.
We still care about RHEL7: since handling a RHEL7 runner on GitHub is
quite complex, let try to use a similar version of gcc, at least
2023-12-20 19:22:22 +01:00
Ivan Nardi
8e14aac5e0
ndpiReader: avoid creating two detection modules when processing traffic/traces (#2209) 2023-12-12 19:44:29 +01:00
Ivan Nardi
241c42ad7e
ndpiReader: fix guessed_flow_protocols statistic (#2203)
Increment the counter only if the flow has been guessed
2023-12-12 19:44:03 +01:00
Ivan Nardi
b3f2b1bb7f
STUN: rework extra dissection (#2202)
Keep looking for RTP packets but remove the monitoring concept.
We will re-introduce a more general concept of "flow in monitoring
state" later.
The function was disabled by default.
Some configuration knobs will be provided when/if #2190 is merged.
2023-12-11 14:53:12 +01:00
Ivan Nardi
adf8982d8e
fuzz: extend fuzzing coverage (#2205) 2023-12-11 12:48:50 +01:00
rl1987
59d476195c
Fix typos (#2204)
* Fix typo in ndpiSimpleIntegration.c

* Fix misspelling in a comment
2023-12-10 19:58:22 +01:00
Ivan Nardi
7b0c16a70d
TLS: remove JA3+ fingerprints. (#2192)
See: #2191
2023-12-05 08:05:44 +01:00
Toni
0cb6f4cb75
Fixed hash buffer size in ndpiSimpleIntegration. (#2143)
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-11-10 10:23:37 +01:00
Toni
0673da54b5
Fixed implicit u32 cast in ndpi_data_min() / ndpi_data_max(). (#2139)
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-11-09 10:16:57 +01:00
Toni
6dcecd73d3
Added malicious sites from the polish cert. (#2121)
* added handling of parsing errors

Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-11-02 09:04:04 +01:00
Luca Deri
76829b413f Implements support for symbolic host names (#2123) 2023-10-29 22:54:45 +01:00
Ivan Nardi
03fd155ae3
IPv6: add support for custom categories (#2126) 2023-10-29 12:56:44 +01:00
Ivan Nardi
32b50f5aa4
IPv6: add support for IPv6 risk exceptions (#2122) 2023-10-29 12:14:20 +01:00
Ivan Nardi
c711251578
IPv6: add support for custom rules (#2120) 2023-10-29 11:26:35 +01:00
Ivan Nardi
e8e4b9e8ff
IPv6: add support for IPv6 risk tree (#2118)
Fix the script to download crawler addressess
2023-10-27 13:58:15 +02:00
Ivan Nardi
611c3b66f0
ipv6: add support for ipv6 addresses lists (#2113) 2023-10-26 20:15:44 +02:00
Nardi Ivan
4a0eda69ad QUIC: export QUIC version as metadata 2023-10-11 15:15:20 +02:00
Nardi Ivan
86115a8a65 fuzz: extend fuzzing coverage 2023-10-07 13:34:37 +02:00
Luca
77e5daf03e Cleaned up mining datastructure 2023-09-27 17:05:12 +02:00
Toni
ef3adb9830
Added printf/fprintf replacement for some internal modules. (#1974)
* logging is instead redirected to `ndpi_debug_printf`

Signed-off-by: lns <matzeton@googlemail.com>
Signed-off-by: Toni Uhlig <matzeton@googlemail.com>
2023-09-26 23:10:57 +02:00
Luca Deri
1bf7e5face Fixes matches with domain name strings that start with a dot 2023-09-11 22:50:03 +02:00
Ivan Nardi
2a0052f25e
fuzz: add fuzzers to test reader_util code (#2080) 2023-09-10 15:07:52 +02:00
Luca Deri
1d480c18e3 Reworked domain classification based on binary filters 2023-09-02 19:16:40 +02:00
Luca Deri
854c2d80f1 Improvement for reducing false positives 2023-09-01 10:26:07 +02:00
Luca Deri
36abf06c6f Swap from Aho-Corasick to an experimental/home-grown algorithm that uses a probabilistic
approach for handling Internet domain names.

For switching back to Aho-Corasick it is necessary to edit
ndpi-typedefs.h and uncomment the line
// #define USE_LEGACY_AHO_CORASICK

[1] With Aho-Corasick
$ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory
nDPI Memory statistics:
nDPI Memory (once):      37.34 KB
Flow Memory (per flow):  960 B
Actual Memory:           33.09 MB
Peak Memory:             33.09 MB

[2] With the new algorithm
$ ./example/ndpiReader -G ./lists/ -i tests/pcap/ookla.pcap | grep Memory
nDPI Memory statistics:
nDPI Memory (once):      37.31 KB
Flow Memory (per flow):  960 B
Actual Memory:           7.42 MB
Peak Memory:             7.42 MB

In essence from ~33 MB to ~7 MB

This new algorithm will enable larger lists to be loaded (e.g. top 1M domans
https://s3-us-west-1.amazonaws.com/umbrella-static/index.html)

In ./lists there are file names that are named as <category>_<string>.list
With -G ndpiReader can load all of them at startup
2023-08-29 17:34:04 +02:00
Luca Deri
34986de297 Search fixes 2023-08-26 19:47:50 +02:00