Commit graph

42 commits

Author SHA1 Message Date
Ivan Nardi
faca0a6565 ndpiReader: improve statistics 2025-10-22 20:34:29 +02:00
Luca Deri
d69446893d Added NDPI_MISMATCHING_PROTOCOL_WITH_IP flow risk
Fixed host protocol matching
Added NDPI_PROTOCOL_AKAMAI protocol
2025-10-17 23:48:44 +02:00
Ivan Nardi
9d22805954
Add statistics about hash data structures (#2995) 2025-10-17 20:39:15 +02:00
Ivan Nardi
ceb9a4e69c Workarounf for breed configuration with categories lists 2025-10-05 11:41:59 +02:00
Ivan Nardi
113170cca4
New protocols for Amazon/AWS sub-classification (#2975)
Add:
* Cognito
* API Gateway
* Kinesis
* EC2
* EMR
* S3
* Cloudfront
* DynamoDB

Keep `NDPI_PROTOCOL_AMAZON_AWS` for generic AWS traffic
2025-10-02 11:48:25 +02:00
Ivan Nardi
5aaab7f354
Fix ndpi_is_valid_hostname() (#2974)
It was completly broken.
Pay some attention to HTTP case where we might have Host header in the
"$DOMAIN:$PORT" form: we usually want to strip the port part

`memrchr` is not available on macOS and on Windows: create a wrapper
2025-09-29 12:27:21 +02:00
Ivan Nardi
efccc7d5e4
Rework flow breed (#2926)
Right now, there is, in essence, a static mapping between flow protocols
and flow breeds.
Make it dynamic: allow to have different flows, with the same
classification but differents breeds. This is the same logic that we
already have for categories....

Preliminary work to support breed in category lists.

API change from the app POV: to get the flow breed don't use anymore
`ndpi_get_proto_breed()`, but access directly `struct ndpi_proto->breed`

The functions `ndpi_domain_classify_*()` and
`ndpi_get_host_domain_suffix()` now have a `u_int32_t` parameter as
`class_id` (instead of `u_int_16_t`), with the following logic:
```
class_id = (breed << 16) | category
```
instead of the old:
```
class_id = category
```
Please note that this change is back-compatible: if you are not
interested into breeds, you don't need to update the application code.
2025-09-02 16:54:34 +02:00
Ivan Nardi
f4995e5d5f Revert "Always compute nDPI fingerprint (#2950)"
This reverts commit 2531c2555e.
2025-08-31 19:07:13 +02:00
Ivan Nardi
2531c2555e
Always compute nDPI fingerprint (#2950) 2025-08-31 16:11:56 +02:00
Luca Deri
0aca481a0a Tests update 2025-08-29 11:53:35 +02:00
Luca Deri
d403d900de
nDPI Fingerprint Changes (#2946)
* Modified boundary check
nDPI fingeprint now defaults on client only (it can be changed via runtime configuration)

* Undated testcases

* Added lenght check

* Typo
2025-08-21 14:58:20 +02:00
Luca Deri
11d74ea286 Implemented nDPI fingerprint that is computed using
- TCP fingerprint
- JA4 fingepriint
- TLS SHA1 certificate (if present), or JA3S fingerprint (is SHA1 is missing)

By default the fingerprint uses the client and server fingerprints (format 0)
and combines them. However you can chnge it format (eg. use only the client info,
format 1) with

--cfg NULL,metadata.ndpi_fingerprint_format,X

where X is the fingerprint format.

By default nDPI fingerprint is enabled but you can enable/disble it as follows

--cfg NULL,metadata.ndpi_fingerprint,0
2025-08-21 10:34:49 +02:00
Ivan Nardi
087c558202 Rework calling check_tcp_flags() and check_probing_attempt() 2025-08-13 10:58:40 +02:00
Ivan Nardi
c5c309708b Sync unit tests results 2025-08-06 20:07:40 +02:00
Ivan Nardi
8dd2220116
Add the concept of protocols stack: more than 2 protocols per flow (#2913)
The idea is to remove the limitation of only two protocols ("master" and
"app") in the flow classifcation.
This is quite handy expecially for STUN flows and, in general, for any
flows where there is some kind of transitionf from a cleartext protocol
to TLS: HTTP_PROXY -> TLS/Youtube; SMTP -> SMTPS (via STARTTLS msg).

In the vast majority of the cases, the protocol stack is simply
Master/Application.

Examples of real stacks (from the unit tests)  different from the standard
"master/app":
* "STUN.WhatsAppCall.SRTP": a WA call
* "STUN.DTLS.GoogleCall": a Meet call
* "Telegram.STUN.DTLS.TelegramVoip": a Telegram call
* "SMTP.SMTPS.Google": a SMTP connection to Google server started in
  cleartext and updated to TLS
* "HTTP.Google.ntop": a HTTP connection to a Google domain (match via
  "Host" header) and to a ntop server (match via "Server" header)

The logic to create the stack is still a bit coarse: we have a decade of
code try to push everything in only ywo protocols... Therefore, the
content of the stack is still **highly experimental** and might change
in the next future; do you have any suggestions?

It is quite likely that the legacy fields "master_protocol" and
"app_protocol" will be there for a long time.

Add some helper to use the stack:
```
ndpi_stack_get_upper_proto();
ndpi_stack_get_lower_proto();
bool ndpi_stack_contains(struct ndpi_proto_stack *s, u_int16_t proto_id);
bool ndpi_stack_is_tls_like(struct ndpi_proto_stack *s);
bool ndpi_stack_is_http_like(struct ndpi_proto_stack *s);

```

Be sure new stack logic is compatible with legacy code:
```
assert(ndpi_stack_get_upper_proto(&flow->detected_protocol.protocol_stack) ==
       ndpi_get_upper_proto(flow->detected_protocol));
assert(ndpi_stack_get_lower_proto(&flow->detected_protocol.protocol_stack) ==
       ndpi_get_lower_proto(flow->detected_protocol));
```
2025-08-01 10:05:50 +02:00
Ivan Nardi
44b9a2da81
ndpiReader: add breed to flow information (#2924) 2025-07-30 18:46:28 +02:00
Ivan Nardi
5db40f7598
Classify Tracking/ADS/Analytics traffic only via category (#2900)
See 3a243bb40 for similar work about porn and LLM
2025-06-23 16:03:44 +02:00
Ivan Nardi
aa6dcad15e
ndpiReader: print categories summary (#2895) 2025-06-21 12:41:00 +02:00
Vladimir Gavrilov
6d0a891d1e
Normalize breed/category names: use _ instead of spaces and slashes (#2873) 2025-06-07 12:38:14 +02:00
Ivan Nardi
6ae0eee5f0
Update all IP/domain lists (#2795)
ProtonVPN script have been not working in the last week.
```
Error	"Invalid access token"
```

ProtonVPN is doing a major upgrade in its infrastructure:
```
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 09, 2025 - 11:30 CEST
Scheduled - In the following period from the 9th of April up to the 30th of April, various Proton VPN dedicated servers will be in temporary maintenance mode, for a short duration period, in order to allow us to perform a major infrastructure upgrade, paving the way for overall increased performance and efficiency of our Proton VPN infrastructure.

We apologize for the occasional inconvenience.
Apr 9, 2025 11:30 - Apr 30, 2025 23:30 CEST
```

Let's wait if it works again in the future...
2025-04-16 13:50:22 +02:00
Ivan Nardi
dca1e54cf6
Extend list of domains for SNI matching (#2791) 2025-04-05 13:15:18 +02:00
Ivan Nardi
85fb7eb2e5 Flow risk infos are always exported "in order" (by flow risk id)
This way, the `ndpiReader` output doesn't change if we change the
internal logic about the order we set/check the various flow risks.

Note that the flow risk *list* is already printed by `ndpiReader`
in order.
2025-03-04 13:23:58 +01:00
Ivan Nardi
72fd940301
Remove JA3C output from ndpiReader (#2667)
Removing JA3C is an big task. Let's start with a simple change having an
huge impact on unit tests: remove printing of JA3C information from
ndpiReader.

This way, when we will delete the actual code, the unit tests diffs
should be a lot simpler to look at.

Note that the information if the client/server cipher is weak or
obsolete is still available via flow risk

See: #2551
2025-01-12 13:24:27 +01:00
Ivan Nardi
c3d19be26f
ndpiReader: update JA statistics (#2646)
Show JA4C and JA3S information (instead of JA3C and JA3S)
See #2551 for context
2025-01-06 15:09:25 +01:00
Ivan Nardi
1140d28c3d Sync unit tests results 2024-11-21 09:53:10 +01:00
Luca Deri
14b076a58b Improved TCP fingerprint 2024-10-20 22:25:55 +02:00
Luca Deri
0cc84e4fdd Improved TCP fingepring calculation
Adde basidc OS detection based on TCP fingerprint
2024-10-18 23:47:34 +02:00
Luca Deri
0ef0752c80
Increased struct ndpi_flow_struct size (#2596)
Build fix
2024-10-18 07:17:03 +02:00
Luca Deri
fc4fb4d409 Fixed probing attempt risk that was creating false positives 2024-08-07 11:38:41 +02:00
Ivan Nardi
65e31b0ea3
FPC: small improvements (#2512)
Add printing of fpc_dns statistics and add a general cconfiguration option.
Rework the code to be more generic and ready to handle other logics.
2024-07-22 17:42:23 +02:00
Ivan Nardi
843e487270
Add infrastructure for explicit support of Fist Packet Classification (#2488)
Let's start with some basic helpers and with FPC based on flow addresses.

See: #2322
2024-07-03 18:02:07 +02:00
Nardi Ivan
526cf6f291 Zoom: remove "stun_zoom" LRU cache
Since 070a0908b we are able to detect P2P calls directly from the packet
content, without any correlation among flows
2024-06-17 10:19:55 +02:00
Luca
44a290286b More NDPI_PROBING_ATTEMPT changes 2024-05-22 18:04:33 +02:00
Ivan Nardi
95fe21015d
Remove "zoom" cache (#2420)
This cache was added in b6b4967aa, when there was no real Zoom support.
With 63f349319, a proper identification of multimedia stream has been
added, making this cache quite useless: any improvements on Zoom
classification should be properly done in Zoom dissector.

Tested for some months with a few 10Gbits links of residential traffic: the
cache pretty much never returned a valid hit.
2024-05-06 12:51:45 +02:00
Luca Deri
57ecbf38c0 Updated JA4 test results 2024-05-02 17:40:24 +02:00
Ivan Nardi
21da53d3a0
ahocorasick: improve matching with subdomains (#2331)
The basic idea is to have the following logic:
* pattern "DOMAIN" matches the domain itself (i.e exact match) *and* any
subdomains (i.e. "ANYTHING.DOMAIN")
* pattern "DOMAIN." matches *also* any strings for which is a prefix
[please, note that this kind of match is handy but it is quite
dangerous...]
* pattern "-DOMAIN" matches *also* any strings for which is a postfix

Examples:
* pattern "wikipedia.it":
  * "wikipiedia.it" -> OK
  * "foo.wikipedia.it -> OK
  * "foowikipedia.it -> NO MATCH
  * "wikipedia.it.com -> NO MATCH
* pattern "wikipedia.":
  * "wikipedia.it" -> OK
  * "foo.wikipedia.it -> OK
  * "foowikipedia.it -> NO MATCH
  * "wikipedia.it.com -> OK
* pattern "-wikipedia.it":
  * "wikipedia.it" -> NO MATCH
  * "foo.wikipedia.it -> NO MATCH
  * "0001-wikipedia.it -> OK
  * "foo.0001-wikipedia.it -> OK

Bottom line:
* exact match
* prefix with "." (always, implicit)
* prefix with "-" (only if esplicitly set)
* postfix with "." (only if esplicitly set)

That means that the patterns cannot start with '.' anymore.

Close #2330
2024-03-06 19:25:59 +01:00
Ivan Nardi
40797521af
ndpiReader: add breed stats on output used for CI (#2236) 2024-01-05 13:02:39 +01:00
Luca Deri
8285fffdae Implements JA4 Support (#2191) 2023-12-22 20:40:42 +01:00
Ivan Nardi
32b50f5aa4
IPv6: add support for IPv6 risk exceptions (#2122) 2023-10-29 12:14:20 +01:00
Ivan Nardi
e8e4b9e8ff
IPv6: add support for IPv6 risk tree (#2118)
Fix the script to download crawler addressess
2023-10-27 13:58:15 +02:00
Ivan Nardi
611c3b66f0
ipv6: add support for ipv6 addresses lists (#2113) 2023-10-26 20:15:44 +02:00
Ivan Nardi
7714507f81
Test multiple ndpiReader configurations (#1931)
Extend internal unit tests to handle multiple configurations.
As some examples, add tests about:
* disabling some protocols
* disabling Ookla aggressiveness

Every configurations data is stored in a dedicated directory under
`tests\cfgs`
2023-04-06 11:30:36 +02:00
Renamed from tests/result/reddit.pcap.out (Browse further)