Use `NDPI_OBFUSCATED_TRAFFIC` instead; this way, all the obfuscated
traffic is identified via `NDPI_OBFUSCATED_TRAFFIC` flow risk.
Disable fully-encryption detection by default, like all the obfuscation
heuristics.
That flow risk was introduced in 79b89d2866
but we can now use the generic `NDPI_TLS_SUSPICIOUS_EXTENSION` instead:
ESNI is quite suspicious nowadays in itself (i.e. even without SNI).
Note that ESNI support has been removed in cae9fb9989
Avoid forcing `DLT_EN10MB` but use the same data link type of the input
pcap.
This way, we can use extcap functionality with input traces having Linux
"cooked" capture encapsulation, i.e. traces captured on "any" interface
Export some metadata (for the moment, SNI and TLS fingerprints) to
Wireshark/tshark via extcap.
Note that:
* metadata are exported only once per flow
* metadata are exported (all together) when nDPI stopped processing
the flow
Still room for a lot of improvements!
In particular:
* we need to add some boundary checks (if we are going to export other
attributes)
* we should try to have a variable length trailer
A fully encrypted session is a flow where every bytes of the
payload is encrypted in an attempt to “look like nothing”.
The heuristic needs only the very first packet of the flow.
See: https://www.usenix.org/system/files/sec23fall-prepub-234-wu-mingshi.pdf
A basic, but generic, inplementation of the popcpunt alg has been added
RFC 6066 3: "Literal IPv4 and IPv6 addresses are not permitted in
"HostName"."
Don't set this risk if we have a valid sub-classification (example:
via certificate)
Since a similar risk already exists for HTTP hostnames, reuse it, with a
more generic name.
The main goal of a DPI engine is usually to determine "what", i.e. which
types of traffic flow on the network.
However the applications using DPI are often interested also in "who",
i.e. which "user/subscriber" generated that traffic.
The association between a flow and a subscriber is usually done via some
kind of DHCP/GTP/RADIUS/NAT mappings. In all these cases the key element
of the flow used to identify the user is the source ip address.
That usually happens for the vast majority of the traffic.
However, depending on the protocols involved and on the position on the net
where the traffic is captured, the source ip address might have been
changed/anonymized. In that case, that address is useless for any
flow-username association.
Example: iCloud Private Relay traffic captured between the exit relay and
the server.
See the picture at page 5 on:
https://www.apple.com/privacy/docs/iCloud_Private_Relay_Overview_Dec2021.PDF
This commit adds new generic flow risk `NDPI_ANONYMOUS_SUBSCRIBER` hinting
that the ip addresses shouldn't be used to identify the user associated
with the flow.
As a first example of this new feature, the entire list of the relay ip
addresses used by Private Relay is added.
A key point to note is that list is NOT used for flow classification
(unlike all the other ip lists present in nDPI) but only for setting this
new flow risk.
TODO: IPv6