Commit graph

232 commits

Author SHA1 Message Date
Ivan Nardi
c37937a211
fuzz: improve fuzzing coverage (#3020)
We should pay attention to tell ndpiReader configuration files and
libnDPI configuration files!! Better solution?

Be sure that configuration files are located where they are expected.
In oss-fuzz enviroment we can't make any assumptions about the current
working directory of your fuzz target.
2025-11-04 21:04:29 +01:00
Ivan Nardi
83d85775a8
Provide an explicit state for the flow classification process (#2942)
Application should keep calling nDPI until flow state became
`NDPI_STATE_CLASSIFIED`.

The main loop in the application is simplified to something like:
```
res = ndpi_detection_process_packet(...);
if(res->state == NDPI_STATE_CLASSIFIED) {
  /* Done: you can get finale classification and all metadata.
     nDPI doesn't need more packets for this flow */
} else {
  /* nDPI needs more packets for this flow. The provided
     classification is not final and more metadata might be
     extracted.
     If `res->state` is `NDPI_STATE_PARTIAL`, partial/initial
     classification is available in `res->proto`
     as usual but it can be updated later.
  */
}

/*
    Example A (QUIC flow):
     pkt 1: proto QUIC state NDPI_STATE_PARTIAL
     pkt 2: proto QUIC/Youtube  state NDPI_STATE_CLASSIFIED
    Example B (GoogleMeet call):
     pkt 1:   proto STUN state NDPI_STATE_PARTIAL
     pkt N:   proto DTLS state NDPI_STATE_PARTIAL
     pkt N+M: proto DTLS/GoogleCall state NDPI_STATE_CLASSIFIED
    Example C (standard TLS flow):
     pkt 1:   proto Unknown state NDPI_STATE_INSPECTING
     pkt 2:   proto Unknown state NDPI_STATE_INSPECTING
     pkt 3:   proto Unknown state NDPI_STATE_INSPECTING
     pkt 4:   proto TLS/Facebook state NDPI_STATE_PARTIAL
     pkt N:   proto TLS/Facebook state NDPI_STATE_CLASSIFIED
 */
}
```
You can take a look at `ndpiReader` for a slightly more complex example.

API changes:
* remove the third parameter from `ndpi_detection_giveup()`. If you need
to know if the classification flow has been guessed, you can access
`flow->protocol_was_guessed`
* remove `ndpi_extra_dissection_possible()`
* change some prototypes from accepting `ndpi_protocol foo` to
`ndpi_master_app_protocol bar`. The update is trivial: from `foo` to
`foo.proto`
2025-11-03 12:08:15 +01:00
Ivan Nardi
6ab338928c
Add support for out-of-tree builds (#2993)
Initial work to support out-of-tree builds
```
./autogen.sh
mkdir build
cd build
../configure
make
make check
```
IMPORTANT: `autogen.sh` doesn't call `configure` automatically anymore!!

You have to do: `./autogen.sh && ./configure --$OPTIONS`.
A little bit annoying but the pattern `autogen && configure && make` is
very common on Linux.

Known issues:
* `make doc` doesn't work in out-of-tree builds, yet
* Windows/MinGW/DPDK (out-of-tree) builds have not been tested, so it is unlikely they work

See: #2992
2025-11-03 11:58:59 +01:00
Ivan Nardi
20892cf4fc
Extend values saved in hash data structure to u_int64_t (#3013)
Move from `u_int32_t` to `u_int64_t`.
We want to be able to save protocol + category + breed in the same
entry.
2025-10-24 17:58:08 +02:00
Ivan Nardi
95aae105f9
fuzz: keep only real/interesting corpora (#3009) 2025-10-23 14:18:11 +02:00
Ivan Nardi
dae135151e Rework parsing of protocol parameters from custom rules
Note that you can specify custom id mappings for internal protocols, yet
2025-10-22 20:14:43 +02:00
Ivan Nardi
9d22805954
Add statistics about hash data structures (#2995) 2025-10-17 20:39:15 +02:00
Ivan Nardi
cc799c1872
fuzz: fix makefile (#2996) 2025-10-17 19:38:07 +02:00
Ivan Nardi
b99d942d89
fuzz: simplify Makefile (#2991)
Add proper `clean` target
2025-10-13 21:49:09 +02:00
Ivan Nardi
a07d55005d
fuzz: try to improve fuzzing coverage (#2981) 2025-10-06 20:44:31 +02:00
Ivan Nardi
3a06d2037f
ndpiReader: create a wrapper to configure nDPI (local) context (#2979)
Use it to better test domains, too
2025-10-05 11:39:46 +02:00
Ivan Nardi
ddd277fc44
HTTP: add further configuration to enable/disable metadata extraction (#2972)
Rename existing configuration knobs, to better separate metadata from
requests, from metadata from responses
2025-09-23 15:11:25 +02:00
Ivan Nardi
2619729661
fuzz: improve per-fuzzer introspector statistics (#2970)
See: f2bccee04
This is clearly a workaround for a introspector bug/limittaions. It
seems that we need separate files for every fuzzers to get per-fuzzer
coverage stats
2025-09-21 17:20:45 +02:00
Ivan Nardi
f2bccee04e
fuzz: an attempt to get better introspector stats (#2968)
The idea: one c file for each fuzzer.
If it works, we can extend the same logic to every `fuzz_ndpoi_reader*`
fuzzers, otherwise we can revert that in a few days...
2025-09-16 16:57:05 +02:00
Ivan Nardi
efccc7d5e4
Rework flow breed (#2926)
Right now, there is, in essence, a static mapping between flow protocols
and flow breeds.
Make it dynamic: allow to have different flows, with the same
classification but differents breeds. This is the same logic that we
already have for categories....

Preliminary work to support breed in category lists.

API change from the app POV: to get the flow breed don't use anymore
`ndpi_get_proto_breed()`, but access directly `struct ndpi_proto->breed`

The functions `ndpi_domain_classify_*()` and
`ndpi_get_host_domain_suffix()` now have a `u_int32_t` parameter as
`class_id` (instead of `u_int_16_t`), with the following logic:
```
class_id = (breed << 16) | category
```
instead of the old:
```
class_id = category
```
Please note that this change is back-compatible: if you are not
interested into breeds, you don't need to update the application code.
2025-09-02 16:54:34 +02:00
Ivan Nardi
8640bd6d76
fuzz: add new fuzzers for bitmask and filter data structures (#2937) 2025-09-02 16:54:08 +02:00
Ivan Nardi
44c94e924f
fuzz: extend fuzzing coverage (#2951) 2025-08-31 20:12:53 +02:00
Ivan Nardi
b7cb6cf408
Follow-up of 8e1b17215: NDPI_UNRESOLVED_HOSTNAME (#2933)
Add fuzzing, documentation and unit tests
2025-08-05 11:32:29 +02:00
Ivan Nardi
eb5f8a037c
fuzz: improve coverage (#2931)
Sync `pl7m` code with upstream.
Add a new fuzzer to test the same flows with different L4 ports
2025-08-04 12:52:51 +02:00
Ivan Nardi
8dd2220116
Add the concept of protocols stack: more than 2 protocols per flow (#2913)
The idea is to remove the limitation of only two protocols ("master" and
"app") in the flow classifcation.
This is quite handy expecially for STUN flows and, in general, for any
flows where there is some kind of transitionf from a cleartext protocol
to TLS: HTTP_PROXY -> TLS/Youtube; SMTP -> SMTPS (via STARTTLS msg).

In the vast majority of the cases, the protocol stack is simply
Master/Application.

Examples of real stacks (from the unit tests)  different from the standard
"master/app":
* "STUN.WhatsAppCall.SRTP": a WA call
* "STUN.DTLS.GoogleCall": a Meet call
* "Telegram.STUN.DTLS.TelegramVoip": a Telegram call
* "SMTP.SMTPS.Google": a SMTP connection to Google server started in
  cleartext and updated to TLS
* "HTTP.Google.ntop": a HTTP connection to a Google domain (match via
  "Host" header) and to a ntop server (match via "Server" header)

The logic to create the stack is still a bit coarse: we have a decade of
code try to push everything in only ywo protocols... Therefore, the
content of the stack is still **highly experimental** and might change
in the next future; do you have any suggestions?

It is quite likely that the legacy fields "master_protocol" and
"app_protocol" will be there for a long time.

Add some helper to use the stack:
```
ndpi_stack_get_upper_proto();
ndpi_stack_get_lower_proto();
bool ndpi_stack_contains(struct ndpi_proto_stack *s, u_int16_t proto_id);
bool ndpi_stack_is_tls_like(struct ndpi_proto_stack *s);
bool ndpi_stack_is_http_like(struct ndpi_proto_stack *s);

```

Be sure new stack logic is compatible with legacy code:
```
assert(ndpi_stack_get_upper_proto(&flow->detected_protocol.protocol_stack) ==
       ndpi_get_upper_proto(flow->detected_protocol));
assert(ndpi_stack_get_lower_proto(&flow->detected_protocol.protocol_stack) ==
       ndpi_get_lower_proto(flow->detected_protocol));
```
2025-08-01 10:05:50 +02:00
Ivan Nardi
c216c09e2c fuzz: extend fuzzing coverage
Remove some unused code
2025-06-24 15:04:35 +02:00
Ivan Nardi
978ca1ba1a
New API to enable/disable protocols. Removed NDPI_LAST_IMPLEMENTED_PROTOCOL (#2894)
Change the API to enable/disable protocols: you can set that via the
standard `ndpi_set_config()` function, as every configuration
parameters. By default, all protocols are enabled.

Split the (local) context initialization into two phases:
* `ndpi_init_detection_module()`: generic part. It does not depend on the
configuration and on the protocols being enabled or not. It also
calculates the real number of internal protocols
* `ndpi_finalize_initialization()`: apply the configuration. All the
initialization stuff that depend on protocols being enabled or not
must be put here

This is the last step to have the protocols number fully calculated at
runtime

Remove a (now) useless fuzzer.

Important API changes:
* remove `NDPI_LAST_IMPLEMENTED_PROTOCOL` define
* remove `ndpi_get_num_internal_protocols()`. To get the number of
configured protocols (internal and custom) you must use
`ndpi_get_num_protocols()` after having called `ndpi_finalize_initialization()`
2025-06-23 11:24:18 +02:00
Ivan Nardi
6cbc8d1471
fuzz: fuzz loading of external protocols lists (#2897) 2025-06-22 20:43:16 +02:00
Ivan Nardi
c319509abf fuzz: fix compilation 2025-06-18 08:12:32 +02:00
Ivan Nardi
458b658eec
Prelimary work to remove NDPI_LAST_IMPLEMENTED_PROTOCOL (#2885) 2025-06-16 20:22:45 +02:00
Ivan Nardi
2b3fdb4f8a
fuzz: try to improve coverage (#2883)
Revert of 2b14b46df3
2025-06-14 10:48:16 +02:00
Ivan Nardi
2b14b46df3 fuzz: make allocation failures a bit more unlikely 2025-06-12 16:57:50 +02:00
Ivan Nardi
6da6991320
Rework sanity checks and remove some functions from API (#2882) 2025-06-12 16:07:56 +02:00
Ivan Nardi
e07fc3dfb8
fuzz: improve coverage (#2878) 2025-06-10 13:51:57 +02:00
Ivan Nardi
bcfa3f5477 Rename ndpi_bitmask_dealloc into ndpi_bitmask_free 2025-06-09 09:30:30 +02:00
Ivan Nardi
cbd7136b34
Remove NDPI_PROTOCOL_BITMASK; add a new generic bitmask data structure (#2871)
The main difference is that the memory is allocated at runtime

Typical usercase:
```
struct ndpi_bitmask b;

ndpi_bitmask_alloc(&b, ndpi_get_num_internal_protocols());

ndpi_bitmask_set(&b, $BIT);
ndpi_bitmask_is_set(&b, $BIT);
[...]

ndpi_bitmask_dealloc(&b);

```

See #2136
2025-06-09 09:00:17 +02:00
Vladimir Gavrilov
75395cb264
Add category and breed support for custom rules (#2872)
Close #2594
2025-06-08 17:34:21 +02:00
Ivan Nardi
f287a6e7f8
Add a configuration to test a huge number of custom protocols (#2865)
File taken from #2136
2025-06-03 20:46:58 +02:00
Ivan Nardi
5e54531282
Remove ndpi_set_proto_defaults() from the API (#2863)
Add an explicit field to indicate if the protocol is custom or internal
2025-06-03 17:43:28 +02:00
Ivan Nardi
ed21057710
First step into a dynamic number of protocols (#2857)
We want to get rid of the defines `NDPI_MAX_SUPPORTED_PROTOCOLS` and
`NDPI_MAX_NUM_CUSTOM_PROTOCOLS`.

You can use:
```
ndpi_get_num_protocols()
```

See #2136

Removed some unused functions from public API
2025-06-03 10:22:15 +02:00
Ivan Nardi
70a72f1638
New API to enable/disable protocols; remove ndpi_set_protocol_detection_bitmask2() (#2853)
The main goal is not to have the bitmask depending on the total number
of protocols anymore: `NDPI_INTERNAL_PROTOCOL_BITMASK` depends only on
internal protocols, i.e. on `NDPI_MAX_INTERNAL_PROTOCOLS`, i.e.
custom-defined protocols are not counted.
See #2136

Keep the old data structure `NDPI_PROTOCOL_BITMASK` with the old
semantic.

Since we need to change the API (and all the application code...)
anyway, simplify the API: by default all the protocols are enabled.
If you need otherwise, please use `ndpi_init_detection_module_ext()`
instead of `ndpi_init_detection_module()` (you can find an example in
the `ndpiReader` code).

To update the application code you likely only need to remove these 3
lines from your code:
```
- NDPI_PROTOCOL_BITMASK all;
- NDPI_BITMASK_SET_ALL(all);
- ndpi_set_protocol_detection_bitmask2(ndpi_str, &all);
```

Removed an unused field and struct definition.
2025-06-03 09:45:46 +02:00
Ivan Nardi
8df79a7354
Follow-up of c1d372860 (TCP fingerprint format) (#2850) 2025-05-26 12:32:47 +02:00
Ivan Nardi
c7b71d9e55
UBNTAC2,Ookla: improve detection (#2793) 2025-04-10 13:18:44 +02:00
Ivan Nardi
3e2d69b92a Follow-up of latest Signal call change (see: 4d41588a7) 2025-04-05 14:22:05 +02:00
Ivan Nardi
f4691c518a
fuzz: extend coverage (#2786) 2025-03-31 17:54:14 +02:00
Ivan Nardi
8bb3a9faf7 fuzz: fix configuration 2025-03-26 11:38:24 +01:00
Ivan Nardi
4a66f95808 fuzz: fix configuration after latest updates 2025-03-26 10:02:50 +01:00
Ivan Nardi
29eb89a88f
Improved configuration to enable/disable export of flow risk info (#2780)
Follow-up of f568313363: now the
configuration is for flow-risk, not global
2025-03-25 21:35:01 +01:00
Ivan Nardi
0cf735b12a
fuzz: try to run one (ndpiReader-) fuzzer with a slight different cfg (#2771) 2025-03-18 17:26:23 +01:00
Leonardo Teixeira Alves
c49d126d36
Add Autonomous System Organization to geoip (#2763)
Co-authored-by: Leonardo Teixeira Alves <leonardo.alves@zerum.com>
2025-03-06 14:47:17 +01:00
Ivan Nardi
f568313363
Add configuration parameter to enable/disable export of flow risk info (#2761)
For the most common protocols, avoid creating the string message if we
are not going to use it
2025-03-05 16:14:03 +01:00
Ivan Nardi
e786472f0d Address cache: fix some bugs on cache traversal
Add a new fuzzer to test it
2025-03-01 19:03:35 +01:00
Ivan Nardi
8ee59bb9b9
fuzz: extend fuzzing coverage (#2750) 2025-02-28 12:38:15 +01:00
Leonardo Teixeira Alves
3d0bfc7bfe
Add city as a geoip possibility (#2746) 2025-02-24 19:41:02 +01:00
Ivan Nardi
2d3f08362e
RTP: payload type info should be set only for real RTP flows (#2742) 2025-02-22 13:35:40 +01:00