vrr/nDPI

mirror of https://github.com/vel21ripn/nDPI.git synced 2026-04-28 23:19:42 +00:00

Author	SHA1	Message	Date
Ivan Nardi	faca0a6565	ndpiReader: improve statistics	2025-10-22 20:34:29 +02:00
Ivan Nardi	9d22805954	Add statistics about hash data structures (#2995 )	2025-10-17 20:39:15 +02:00
Luca Deri	11d74ea286	Implemented nDPI fingerprint that is computed using - TCP fingerprint - JA4 fingepriint - TLS SHA1 certificate (if present), or JA3S fingerprint (is SHA1 is missing) By default the fingerprint uses the client and server fingerprints (format 0) and combines them. However you can chnge it format (eg. use only the client info, format 1) with --cfg NULL,metadata.ndpi_fingerprint_format,X where X is the fingerprint format. By default nDPI fingerprint is enabled but you can enable/disble it as follows --cfg NULL,metadata.ndpi_fingerprint,0	2025-08-21 10:34:49 +02:00
Ivan Nardi	8dd2220116	Add the concept of protocols stack: more than 2 protocols per flow (#2913 ) The idea is to remove the limitation of only two protocols ("master" and "app") in the flow classifcation. This is quite handy expecially for STUN flows and, in general, for any flows where there is some kind of transitionf from a cleartext protocol to TLS: HTTP_PROXY -> TLS/Youtube; SMTP -> SMTPS (via STARTTLS msg). In the vast majority of the cases, the protocol stack is simply Master/Application. Examples of real stacks (from the unit tests) different from the standard "master/app": * "STUN.WhatsAppCall.SRTP": a WA call * "STUN.DTLS.GoogleCall": a Meet call * "Telegram.STUN.DTLS.TelegramVoip": a Telegram call * "SMTP.SMTPS.Google": a SMTP connection to Google server started in cleartext and updated to TLS * "HTTP.Google.ntop": a HTTP connection to a Google domain (match via "Host" header) and to a ntop server (match via "Server" header) The logic to create the stack is still a bit coarse: we have a decade of code try to push everything in only ywo protocols... Therefore, the content of the stack is still highly experimental and might change in the next future; do you have any suggestions? It is quite likely that the legacy fields "master_protocol" and "app_protocol" will be there for a long time. Add some helper to use the stack: ``` ndpi_stack_get_upper_proto(); ndpi_stack_get_lower_proto(); bool ndpi_stack_contains(struct ndpi_proto_stack s, u_int16_t proto_id); bool ndpi_stack_is_tls_like(struct ndpi_proto_stack s); bool ndpi_stack_is_http_like(struct ndpi_proto_stack *s); ``` Be sure new stack logic is compatible with legacy code: ``` assert(ndpi_stack_get_upper_proto(&flow->detected_protocol.protocol_stack) == ndpi_get_upper_proto(flow->detected_protocol)); assert(ndpi_stack_get_lower_proto(&flow->detected_protocol.protocol_stack) == ndpi_get_lower_proto(flow->detected_protocol)); ```	2025-08-01 10:05:50 +02:00
Ivan Nardi	44b9a2da81	ndpiReader: add breed to flow information (#2924 )	2025-07-30 18:46:28 +02:00
Adrian Pekar	5f312c0cd6	Fix JA4 fingerprinting (#2915 ) * Fix JA4 ALPN fingerprint to use first and last characters According to the JA4 specification (line 2139), the ALPN field should contain the first and last characters of the first ALPN extension value. Currently, nDPI uses the first and second characters (alpn[0] and alpn[1]), which produces incorrect fingerprints that don't match other JA4 implementations like Wireshark. For example, with ALPN 'http/1.1': - Current (incorrect): 'ht' (first + second char) - Fixed (correct): 'h1' (first + last char) This change ensures nDPI's JA4 implementation conforms to the official specification and maintains interoperability with other JA4 tools. Fixes: Incorrect JA4 ALPN fingerprint generation * Fix JA4 ALPN implementation to correctly parse first ALPN protocol The previous fix attempted to use strlen(ja->client.alpn)-1 but this was insufficient because nDPI modifies the ALPN string by: 1. Adding null terminators that truncate the last character 2. Converting semicolons to dashes, affecting multi-protocol ALPNs This complete fix: - Adds alpn_original_last field to store the true last character - Captures the last character of the FIRST ALPN protocol only (before ;/,) - Preserves the original character before nDPI's string modifications Now correctly implements JA4 spec: first + last characters of first ALPN protocol Examples: - ALPN 'h2;http/1.1' -> 'h2' (not 'h.' or 'h1') - ALPN 'http/1.1' -> 'h1' (not 'ht' or 'h.') Fixes: #2914 * Fix JA4 SNI detection to properly handle missing SNI extensions Previously, nDPI incorrectly set JA4 SNI flag to 'd' (domain present) for flows without any SNI extension. This was because the logic only checked for NDPI_NUMERIC_IP_HOST risk (set when SNI contains IP) but didn't distinguish between missing SNI and domain SNI. Now properly detects: - No SNI extension → 'i' flag - SNI with IP address → 'i' flag - SNI with domain → 'd' flag This matches the JA4 specification.	2025-07-10 14:03:27 +02:00
Ivan Nardi	aa6dcad15e	ndpiReader: print categories summary (#2895 )	2025-06-21 12:41:00 +02:00
Ivan Nardi	34dcf18128	Add a new internal function `internal_giveup()` This function is always called once for every flow, as last code processing the flow itself. As a first usage example, check here if the flow is unidirectional (instead of checking it at every packets)	2025-03-05 20:51:06 +01:00
Ivan Nardi	72fd940301	Remove JA3C output from ndpiReader (#2667 ) Removing JA3C is an big task. Let's start with a simple change having an huge impact on unit tests: remove printing of JA3C information from ndpiReader. This way, when we will delete the actual code, the unit tests diffs should be a lot simpler to look at. Note that the information if the client/server cipher is weak or obsolete is still available via flow risk See: #2551	2025-01-12 13:24:27 +01:00
Ivan Nardi	4756904222	QUIC: remove extraction of user-agent (#2650 ) In very old (G)QUIC versions by Google, the user agent was available on plain text. That is not true anymore, since about end of 2021. See: `f282c934f4`	2025-01-07 19:58:43 +01:00
Ivan Nardi	c3d19be26f	ndpiReader: update JA statistics (#2646 ) Show JA4C and JA3S information (instead of JA3C and JA3S) See #2551 for context	2025-01-06 15:09:25 +01:00
Ivan Nardi	2e20f670dd	QUIC: extract "max idle timeout" parameter (#2649 ) Even if it is only the proposed value by the client (and not the negotiated one), it might be use as hint for timeout by the (external) flows manager	2025-01-06 13:45:12 +01:00
Luca Deri	2b40611082	Fixed JA4 invalid computation due to code bug and uninitialized values	2024-10-13 20:45:20 +02:00
Ivan Nardi	85501c9aaa	FPC: add DPI information (#2514 ) If the flow is classified (via DPI) after the first packet, we should use this information as FPC	2024-07-23 08:50:27 +02:00
Ivan Nardi	65e31b0ea3	FPC: small improvements (#2512 ) Add printing of fpc_dns statistics and add a general cconfiguration option. Rework the code to be more generic and ready to handle other logics.	2024-07-22 17:42:23 +02:00
Ivan Nardi	843e487270	Add infrastructure for explicit support of Fist Packet Classification (#2488 ) Let's start with some basic helpers and with FPC based on flow addresses. See: #2322	2024-07-03 18:02:07 +02:00
Nardi Ivan	526cf6f291	Zoom: remove "stun_zoom" LRU cache Since `070a0908b` we are able to detect P2P calls directly from the packet content, without any correlation among flows	2024-06-17 10:19:55 +02:00
Ivan Nardi	95fe21015d	Remove "zoom" cache (#2420 ) This cache was added in `b6b4967aa`, when there was no real Zoom support. With `63f349319`, a proper identification of multimedia stream has been added, making this cache quite useless: any improvements on Zoom classification should be properly done in Zoom dissector. Tested for some months with a few 10Gbits links of residential traffic: the cache pretty much never returned a valid hit.	2024-05-06 12:51:45 +02:00
Ivan Nardi	7a83a8dc91	QUIC: fix decryption with CH fragments with different Destination CID (#2278 ) QUIC decryption fails when the Client Hello is split into multiple UDP packets and these packets have different Destination Connection IDs (because the server told the client to switch to a different CID; see RFC 9000 7.2) ``` The Destination Connection ID field from the first Initial packet sent by a client is used to determine packet protection keys for Initial packets. [..] Upon first receiving an Initial or Retry packet from the server, the client uses the Source Connection ID supplied by the server as the Destination Connection ID for subsequent packets ``` From a logical point of view, the ciphers used for decryption should be initialized only once, with the first Initial pkt sent by the client and kept for later usage with the following packets (if any). However it seems that we can safely initialize them at each packet, if we keep using the DCID of the first packet sent by the client. Keep initializing the ciphers at each packet greatly simplifie this patch. This issue has been undetected for so long because: * in the vast majority of the cases we only decrypt one packet per flow; * the available traces with the Client Hello split into multiple packets (i.e. cases where we need to decrypt at least two packets per flow) were created in a simple test environment to simulate Post-Quantum handshake, and in that scenario the client sent all the packets (with the same DCID) before any reply from the server. However, in the last months all major browsers started supporting PQ key, so it is now common to have split CH in real traffic. Please note that in the attached example, the CH is split into 2 (in-order) fragments (in different UDP packets) and the second one in turn is divided into 9 (out-of-order) CRYPTO frames; the reassembler code works out-of-the-box even in this (new) scenario.	2024-01-24 09:57:28 +01:00

19 commits