vrr/nDPI

mirror of https://github.com/vel21ripn/nDPI.git synced 2026-05-04 18:00:17 +00:00

Author	SHA1	Message	Date
Nardi Ivan	b250f9d86c	Extend content match lists	2022-09-22 09:30:09 +02:00
Ivan Nardi	a7c2734b38	Remove classification "by-ip" from protocol stack (#1743 ) Basically: * "classification by-ip" (i.e. `flow->guessed_protocol_id_by_ip` is NEVER returned in the protocol stack (i.e. `flow->detected_protocol_stack[]`); * if the application is interested into such information, it can access `ndpi_protocol->protocol_by_ip` itself. There are mainly 4 points in the code that set the "classification by-ip" in the protocol stack: the generic `ndpi_set_detected_protocol()`/ `ndpi_detection_giveup()` functions and the HTTP/STUN dissectors. In the unit tests output, a print about `ndpi_protocol->protocol_by_ip` has been added for each flow: the huge diff of this commit is mainly due to that. Strictly speaking, this change is NOT an API/ABI breakage, but there are important differences in the classification results. For examples: * TLS flows without the initial handshake (or without a matching SNI/certificate) are simply classified as `TLS`; * similar for HTTP or QUIC flows; * DNS flows without a matching request domain are simply classified as `DNS`; we don't have `DNS/Google` anymore just because the server is 8.8.8.8 (that was an outrageous behaviour...); * flows previusoly classified only "by-ip" are now classified as `NDPI_PROTOCOL_UNKNOWN`. See #1425 for other examples of why adding the "classification by-ip" in the protocol stack is a bad idea. Please, note that IPV6 is not supported :( (long standing issue in nDPI) i.e. `ndpi_protocol->protocol_by_ip` wil be always `NDPI_PROTOCOL_UNKNOWN` for IPv6 flows. Define `NDPI_CONFIDENCE_MATCH_BY_IP` has been removed. Close #1687	2022-09-20 22:24:47 +02:00
Ivan Nardi	0a47f745cc	Avoid useless host automa lookup (#1724 ) The host automa is used for two tasks: * protocol sub-classification (obviously); * DGA evaluation: the idea is that if a domain is present in this automa, it can't be a DGA, regardless of its format/name. In most dissectors both checks are executed, i.e. the code is something like: ``` ndpi_match_host_subprotocol(..., flow->host_server_name, ...); ndpi_check_dga_name(..., flow->host_server_name,...); ``` In that common case, we can perform only one automa lookup: if we check the sub-classification before the DGA, we can avoid the second lookup in the DGA function itself.	2022-09-05 13:59:51 +02:00
Ivan Nardi	405a52ed65	Patricia tree, Ahocarasick automa, LRU cache: add statistics (#1683 ) Add (basic) internal stats to the main data structures used by the library; they might be usefull to check how effective these structures are. Add an option to `ndpiReader` to dump them; enabled by default in the unit tests. This new option enables/disables dumping of "num dissectors calls" values, too (see `b4cb14ec`).	2022-07-29 15:25:00 +02:00
Ivan Nardi	d8d525fff2	Update the protocol bitmask for some protocols (#1675 ) Tcp retransmissions should be ignored. Remove some unused protocol bitmasks. Update script to download Whatsapp IP list.	2022-07-27 11:46:45 +02:00
Toni	ae2bedce3a	Improved Jabber/XMPP detection. (#1661 ) Signed-off-by: Toni Uhlig <matzeton@googlemail.com>	2022-07-13 17:55:33 +02:00
Ivan Nardi	b4cb14ec19	Keep track of how many dissectors calls we made for each flow (#1657 )	2022-07-11 09:47:47 +02:00
Ivan Nardi	7645909460	Fix handling of NDPI_UNIDIRECTIONAL_TRAFFIC risk (#1636 )	2022-07-05 17:01:00 +02:00
Luca Deri	ab09b8ce2e	Added unidirectional traffic flow risk	2022-06-20 00:22:13 +02:00
Ivan Nardi	3a087e951d	Add a "confidence" field about the reliability of the classification. (#1395 ) As a general rule, the higher the confidence value, the higher the "reliability/precision" of the classification. In other words, this new field provides an hint about "how" the flow classification has been obtained. For example, the application may want to ignore classification "by-port" (they are not real DPI classifications, after all) or give a second glance at flows classified via LRU caches (because of false positives). Setting only one value for the confidence field is a bit tricky: more work is probably needed in the next future to tweak/fix/improve the logic.	2022-01-11 15:23:39 +01:00
Ivan Nardi	7153b8933c	Improve/add several protocols (#1383 ) Improve Microsoft, GMail, Likee, Whatsapp, DisneyPlus and Tiktok detection. Add Vimeo, Fuze, Alibaba and Firebase Crashlytics detection. Try to differentiate between Messenger/Signal standard flows (i.e chat) and their VOIP (video)calls (like we already do for Whatsapp and Snapchat). Add a partial list of some ADS/Tracking stuff. Fix Cassandra, Radius and GTP false positives. Fix DNS, Syslog and SIP false negatives. Improve GTP (sub)classification: differentiate among GTP-U, GTP_C and GTP_PRIME. Fix 3 LGTM warnings.	2021-12-18 13:24:51 +01:00
Ivan Nardi	b1e9245d94	ndpiReader: slight simplificaton of the output (#1378 )	2021-11-27 17:32:23 +01:00
Luca Deri	ea435c46f5	Reworked HTTP protocol dissection including HTTP proxy and HTTP connect	2021-11-25 22:53:46 +01:00
Ivan Nardi	a8ffcd8bb0	Rework how hostname/SNI info is saved (#1330 ) Looking at `struct ndpi_flow_struct` the two bigger fields are `host_server_name[240]` (mainly for HTTP hostnames and DNS domains) and `protos.tls_quic.client_requested_server_name[256]` (for TLS/QUIC SNIs). This commit aims to reduce `struct ndpi_flow_struct` size, according to two simple observations: 1) maximum one of these two fields is used for each flow. So it seems safe to merge them; 2) even if hostnames/SNIs might be very long, in practice they are rarely longer than a fews tens of bytes. So, using a (single) large buffer is a waste of memory for all kinds of flows. If we need to truncate the name, we keep the last characters, easing domain matching. Analyzing some real traffic, it seems safe to assume that the vast majority of hostnames/SNIs is shorter than 80 bytes. Hostnames/SNIs are always converted to lowercase. Attention was given so as to be sure that unit-tests outputs are not affected by this change. Because of a bug, TLS/QUIC SNI were always truncated to 64 bytes (the first 64 ones): as a consequence, there were some "Suspicious DGA domain name" and "TLS Certificate Mismatch" false positives.	2021-11-24 10:46:48 +01:00
Ivan Nardi	b6d9536533	Fixed cleartext protocol assignment (#1357 )	2021-10-25 15:04:04 +02:00
Luca Deri	9e97d20c25	Refreshed results list	2021-10-16 12:03:16 +02:00
Luca Deri	e8455236bd	Updated output	2021-08-07 17:38:33 +02:00
Ivan Nardi	cccf794265	ndpiReader: add statistics about nDPI performance (#1240 ) The goal is to have a (roughly) idea about how many packets nDPI needs to properly classify a flow. Log this information (and guessed flows number too) during unit tests, to keep track of improvements/regressions across commits.	2021-07-13 12:28:39 +02:00
Luca Deri	732bcecd17	Added flow risk score	2021-05-18 21:05:47 +02:00
Luca Deri	ac1eaca8a6	Added browser TLS heuristic	2021-05-13 20:00:27 +02:00
Luca Deri	0f8a994841	Improved DGA detection Before Accuracy 66%, Precision 86%, Recall 38% After Accuracy 71%, Precision 89%, Recall 49%	2021-03-03 19:30:01 +01:00
Luca Deri	1a37595de0	Removed check for knowns protocols (major and app protocols)	2021-03-03 00:57:56 +01:00
Luca Deri	56bfb439f8	Improved DGA detection with trigrams. Disadvantage: slower startup time Reworked Tor dissector embedded in TLS (fixes #1141) Removed false positive on HTTP User-Agent	2021-03-03 00:41:07 +01:00
Luca Deri	23a15bae5f	Fixes #1029	2020-11-27 18:51:56 +01:00
Zied Aouini	22780da8d5	Add Reddit support. (#1060 ) * Add Reddit protocol. * Add Reddit test file and result. Co-authored-by: Luca Deri <lucaderi@users.noreply.github.com>	2020-11-16 21:13:01 +01:00

25 commits