Application should keep calling nDPI until flow state became
`NDPI_STATE_CLASSIFIED`.
The main loop in the application is simplified to something like:
```
res = ndpi_detection_process_packet(...);
if(res->state == NDPI_STATE_CLASSIFIED) {
/* Done: you can get finale classification and all metadata.
nDPI doesn't need more packets for this flow */
} else {
/* nDPI needs more packets for this flow. The provided
classification is not final and more metadata might be
extracted.
If `res->state` is `NDPI_STATE_PARTIAL`, partial/initial
classification is available in `res->proto`
as usual but it can be updated later.
*/
}
/*
Example A (QUIC flow):
pkt 1: proto QUIC state NDPI_STATE_PARTIAL
pkt 2: proto QUIC/Youtube state NDPI_STATE_CLASSIFIED
Example B (GoogleMeet call):
pkt 1: proto STUN state NDPI_STATE_PARTIAL
pkt N: proto DTLS state NDPI_STATE_PARTIAL
pkt N+M: proto DTLS/GoogleCall state NDPI_STATE_CLASSIFIED
Example C (standard TLS flow):
pkt 1: proto Unknown state NDPI_STATE_INSPECTING
pkt 2: proto Unknown state NDPI_STATE_INSPECTING
pkt 3: proto Unknown state NDPI_STATE_INSPECTING
pkt 4: proto TLS/Facebook state NDPI_STATE_PARTIAL
pkt N: proto TLS/Facebook state NDPI_STATE_CLASSIFIED
*/
}
```
You can take a look at `ndpiReader` for a slightly more complex example.
API changes:
* remove the third parameter from `ndpi_detection_giveup()`. If you need
to know if the classification flow has been guessed, you can access
`flow->protocol_was_guessed`
* remove `ndpi_extra_dissection_possible()`
* change some prototypes from accepting `ndpi_protocol foo` to
`ndpi_master_app_protocol bar`. The update is trivial: from `foo` to
`foo.proto`
Right now, there is, in essence, a static mapping between flow protocols
and flow breeds.
Make it dynamic: allow to have different flows, with the same
classification but differents breeds. This is the same logic that we
already have for categories....
Preliminary work to support breed in category lists.
API change from the app POV: to get the flow breed don't use anymore
`ndpi_get_proto_breed()`, but access directly `struct ndpi_proto->breed`
The functions `ndpi_domain_classify_*()` and
`ndpi_get_host_domain_suffix()` now have a `u_int32_t` parameter as
`class_id` (instead of `u_int_16_t`), with the following logic:
```
class_id = (breed << 16) | category
```
instead of the old:
```
class_id = category
```
Please note that this change is back-compatible: if you are not
interested into breeds, you don't need to update the application code.
The idea is to remove the limitation of only two protocols ("master" and
"app") in the flow classifcation.
This is quite handy expecially for STUN flows and, in general, for any
flows where there is some kind of transitionf from a cleartext protocol
to TLS: HTTP_PROXY -> TLS/Youtube; SMTP -> SMTPS (via STARTTLS msg).
In the vast majority of the cases, the protocol stack is simply
Master/Application.
Examples of real stacks (from the unit tests) different from the standard
"master/app":
* "STUN.WhatsAppCall.SRTP": a WA call
* "STUN.DTLS.GoogleCall": a Meet call
* "Telegram.STUN.DTLS.TelegramVoip": a Telegram call
* "SMTP.SMTPS.Google": a SMTP connection to Google server started in
cleartext and updated to TLS
* "HTTP.Google.ntop": a HTTP connection to a Google domain (match via
"Host" header) and to a ntop server (match via "Server" header)
The logic to create the stack is still a bit coarse: we have a decade of
code try to push everything in only ywo protocols... Therefore, the
content of the stack is still **highly experimental** and might change
in the next future; do you have any suggestions?
It is quite likely that the legacy fields "master_protocol" and
"app_protocol" will be there for a long time.
Add some helper to use the stack:
```
ndpi_stack_get_upper_proto();
ndpi_stack_get_lower_proto();
bool ndpi_stack_contains(struct ndpi_proto_stack *s, u_int16_t proto_id);
bool ndpi_stack_is_tls_like(struct ndpi_proto_stack *s);
bool ndpi_stack_is_http_like(struct ndpi_proto_stack *s);
```
Be sure new stack logic is compatible with legacy code:
```
assert(ndpi_stack_get_upper_proto(&flow->detected_protocol.protocol_stack) ==
ndpi_get_upper_proto(flow->detected_protocol));
assert(ndpi_stack_get_lower_proto(&flow->detected_protocol.protocol_stack) ==
ndpi_get_lower_proto(flow->detected_protocol));
```
Change the API to enable/disable protocols: you can set that via the
standard `ndpi_set_config()` function, as every configuration
parameters. By default, all protocols are enabled.
Split the (local) context initialization into two phases:
* `ndpi_init_detection_module()`: generic part. It does not depend on the
configuration and on the protocols being enabled or not. It also
calculates the real number of internal protocols
* `ndpi_finalize_initialization()`: apply the configuration. All the
initialization stuff that depend on protocols being enabled or not
must be put here
This is the last step to have the protocols number fully calculated at
runtime
Remove a (now) useless fuzzer.
Important API changes:
* remove `NDPI_LAST_IMPLEMENTED_PROTOCOL` define
* remove `ndpi_get_num_internal_protocols()`. To get the number of
configured protocols (internal and custom) you must use
`ndpi_get_num_protocols()` after having called `ndpi_finalize_initialization()`
The main difference is that the memory is allocated at runtime
Typical usercase:
```
struct ndpi_bitmask b;
ndpi_bitmask_alloc(&b, ndpi_get_num_internal_protocols());
ndpi_bitmask_set(&b, $BIT);
ndpi_bitmask_is_set(&b, $BIT);
[...]
ndpi_bitmask_dealloc(&b);
```
See #2136
We want to get rid of the defines `NDPI_MAX_SUPPORTED_PROTOCOLS` and
`NDPI_MAX_NUM_CUSTOM_PROTOCOLS`.
You can use:
```
ndpi_get_num_protocols()
```
See #2136
Removed some unused functions from public API
The main goal is not to have the bitmask depending on the total number
of protocols anymore: `NDPI_INTERNAL_PROTOCOL_BITMASK` depends only on
internal protocols, i.e. on `NDPI_MAX_INTERNAL_PROTOCOLS`, i.e.
custom-defined protocols are not counted.
See #2136
Keep the old data structure `NDPI_PROTOCOL_BITMASK` with the old
semantic.
Since we need to change the API (and all the application code...)
anyway, simplify the API: by default all the protocols are enabled.
If you need otherwise, please use `ndpi_init_detection_module_ext()`
instead of `ndpi_init_detection_module()` (you can find an example in
the `ndpiReader` code).
To update the application code you likely only need to remove these 3
lines from your code:
```
- NDPI_PROTOCOL_BITMASK all;
- NDPI_BITMASK_SET_ALL(all);
- ndpi_set_protocol_detection_bitmask2(ndpi_str, &all);
```
Removed an unused field and struct definition.
In some scenarios, you might not be interested in flow metadata or
flow-risks at all, but you might want only flow (sub-)classification.
Examples: you only want to forward the traffic according to the
classification or you are only interested in some protocol statistics.
Create a new configuration file (for `ndpiReader`, but you can trivially
adapt it for the library itself) allowing exactly that. You can use it
via: `ndpiReader --conf=example/only_classification.conf ...`
Note that this way, the nDPI overhead is lower because it might need
less packets per flow:
* TLS: nDPI processes only the CH (in most cases) and not also the SH
and certificates
* DNS: only the request is processed (instead of both request and
response)
We might extend the same "shortcut-logic" (stop processing the flow
immediately when there is a final sub-classification) for others
protocols.
Add the configuration options to enable/disable the extraction of some
TLS metadata.
Last step of removing JA3C fingerprint
Remove some duplicate tests: testing with ja4c/ja3s disabled is already
performed by `disable_metadata_and_flowrisks` configuration.
Close:#2551
It might be usefull to be able to match traffic against a list of
suspicious JA4C fingerprints
Use the same code/logic/infrastructure used for JA3C (note that we are
going to remove JA3C...)
See: #2551
Allow nDPI to process the entire flows and not only the first N packets.
Usefull when the application is interested in some metadata spanning the
entire life of the session.
As initial step, only STUN flows can be put in monitoring.
See `doc/monitoring.md` for further details.
This feature is disabled by default.
Close#2583
Add configurable options for whether to include client port or client IP
in the flow's protocol guesses. This defaults to include both client
port/IP if the protocol is not guessed with the server IP/port.
This is intended for when flow direction detection is enabled, so we
know that sport = client port, dport = server port.
Based on the paper: "Fingerprinting Obfuscated Proxy Traffic with
Encapsulated TLS Handshakes".
See: https://www.usenix.org/conference/usenixsecurity24/presentation/xue-fingerprinting
Basic idea:
* the packets/bytes distribution of a TLS handshake is quite unique
* this fingerprint is still detectable if the handshake is
encrypted/proxied/obfuscated
All heuristics are disabled by default.
Based on the paper: "OpenVPN is Open to VPN Fingerprinting"
See: https://www.usenix.org/conference/usenixsecurity22/presentation/xue-diwen
Basic idea:
* the distribution of the first byte of the messages (i.e. the distribution
of the op-codes) is quite unique
* this fingerprint might be still detectable even if the OpenVPN packets are
somehow fully encrypted/obfuscated
The heuristic is disabled by default.
Allow sub-classification of OpenVPN/Wireguard flows using their server IP.
That is useful to detect the specific VPN application/app used.
At the moment, the supported protocols are: Mullvad, NordVPN, ProtonVPN.
This feature is configurable.