Commit graph

1100 commits

Author SHA1 Message Date
Ivan Nardi
c37937a211
fuzz: improve fuzzing coverage (#3020)
We should pay attention to tell ndpiReader configuration files and
libnDPI configuration files!! Better solution?

Be sure that configuration files are located where they are expected.
In oss-fuzz enviroment we can't make any assumptions about the current
working directory of your fuzz target.
2025-11-04 21:04:29 +01:00
Ivan Nardi
433f708951
Fix compilation when using external libgcrypt (#3018)
ndpiReader: fix encodeDomainsUnitTest test
2025-11-04 10:41:00 +01:00
Ivan Nardi
a9e38cc504 ndpiReader: fix typo
Credits to @s4n-cz.
Close #3015
2025-11-03 12:36:12 +01:00
Ivan Nardi
83d85775a8
Provide an explicit state for the flow classification process (#2942)
Application should keep calling nDPI until flow state became
`NDPI_STATE_CLASSIFIED`.

The main loop in the application is simplified to something like:
```
res = ndpi_detection_process_packet(...);
if(res->state == NDPI_STATE_CLASSIFIED) {
  /* Done: you can get finale classification and all metadata.
     nDPI doesn't need more packets for this flow */
} else {
  /* nDPI needs more packets for this flow. The provided
     classification is not final and more metadata might be
     extracted.
     If `res->state` is `NDPI_STATE_PARTIAL`, partial/initial
     classification is available in `res->proto`
     as usual but it can be updated later.
  */
}

/*
    Example A (QUIC flow):
     pkt 1: proto QUIC state NDPI_STATE_PARTIAL
     pkt 2: proto QUIC/Youtube  state NDPI_STATE_CLASSIFIED
    Example B (GoogleMeet call):
     pkt 1:   proto STUN state NDPI_STATE_PARTIAL
     pkt N:   proto DTLS state NDPI_STATE_PARTIAL
     pkt N+M: proto DTLS/GoogleCall state NDPI_STATE_CLASSIFIED
    Example C (standard TLS flow):
     pkt 1:   proto Unknown state NDPI_STATE_INSPECTING
     pkt 2:   proto Unknown state NDPI_STATE_INSPECTING
     pkt 3:   proto Unknown state NDPI_STATE_INSPECTING
     pkt 4:   proto TLS/Facebook state NDPI_STATE_PARTIAL
     pkt N:   proto TLS/Facebook state NDPI_STATE_CLASSIFIED
 */
}
```
You can take a look at `ndpiReader` for a slightly more complex example.

API changes:
* remove the third parameter from `ndpi_detection_giveup()`. If you need
to know if the classification flow has been guessed, you can access
`flow->protocol_was_guessed`
* remove `ndpi_extra_dissection_possible()`
* change some prototypes from accepting `ndpi_protocol foo` to
`ndpi_master_app_protocol bar`. The update is trivial: from `foo` to
`foo.proto`
2025-11-03 12:08:15 +01:00
Ivan Nardi
6ab338928c
Add support for out-of-tree builds (#2993)
Initial work to support out-of-tree builds
```
./autogen.sh
mkdir build
cd build
../configure
make
make check
```
IMPORTANT: `autogen.sh` doesn't call `configure` automatically anymore!!

You have to do: `./autogen.sh && ./configure --$OPTIONS`.
A little bit annoying but the pattern `autogen && configure && make` is
very common on Linux.

Known issues:
* `make doc` doesn't work in out-of-tree builds, yet
* Windows/MinGW/DPDK (out-of-tree) builds have not been tested, so it is unlikely they work

See: #2992
2025-11-03 11:58:59 +01:00
Luca Deri
e9751cec26
Added TLS Block Analysis (#3016)
* Enabled TLS block analysis via --cfg=tls,blocks_analysis,1

* Added comment and optimization

* Updated output format

* Code cleanup
2025-10-27 10:21:26 +01:00
Ivan Nardi
71033e0370
Extend http-url custom rules: support for category and breed (#3014) 2025-10-24 19:17:48 +02:00
Ivan Nardi
20892cf4fc
Extend values saved in hash data structure to u_int64_t (#3013)
Move from `u_int32_t` to `u_int64_t`.
We want to be able to save protocol + category + breed in the same
entry.
2025-10-24 17:58:08 +02:00
Ivan Nardi
01836e0071
Proper handling of internal/external ids in FPC; fix FPC with custom rules (#3007) 2025-10-22 21:28:12 +02:00
Ivan Nardi
faca0a6565 ndpiReader: improve statistics 2025-10-22 20:34:29 +02:00
Ivan Nardi
dae135151e Rework parsing of protocol parameters from custom rules
Note that you can specify custom id mappings for internal protocols, yet
2025-10-22 20:14:43 +02:00
Luca Deri
5abe185e2c Added support for urlXXXX@proto in protos.txt
Fixed varisous protocol mapping in custom protocols definition
2025-10-22 09:00:58 +02:00
Ivan Nardi
b9c847a176 config: fix "only_classification" configuration 2025-10-21 20:19:56 +02:00
Luca Deri
79b74115d2 Fixes invalid initialization that caused the two commands below to return different results
./example/ndpiReader -t -i ./tests/pcap/bets.pcapng -L ./lists/public_suffix_list.dat -G ./lists/
 ./example/ndpiReader -t -i ./tests/pcap/bets.pcapng -G ./lists/
2025-10-21 15:10:28 +02:00
Ivan Nardi
9c27c2df3a
Allow to overwrite domain matching via custom rules (#2999)
This is basically the revert of 0db12b1390 and 43d9caac00.
Add some tests about this feature
2025-10-20 15:28:16 +02:00
Ivan Nardi
6eb63d9cf9
tests: fixed protocol ids for all custom rules (#3000)
To ease PR/Commit comparisons
2025-10-20 14:59:15 +02:00
Luca Deri
735e0df40c Updated test 2025-10-18 00:22:14 +02:00
Ivan Nardi
9d22805954
Add statistics about hash data structures (#2995) 2025-10-17 20:39:15 +02:00
Ivan Nardi
523fe3ebc4
doc: improve public API header documentation (#2985)
This commit significantly improves the documentation quality in ndpi_api.h,
the main public API header file for nDPI.

Changes include:

1. Fixed 11 typos:
   - "fucntion" → "function"
   - "ckeck" → "check"
   - "guesses" → "guessed"
   - "searhing" → "searching"
   - "@paw" → "@par" (incorrect Doxygen tag)
   - "addeed" → "added"
   - "readeable" → "readable" (function name)
   - "creaign" → "creating"
   - "lenght" → "length" (3 occurrences)
   - "hosti tself" → "host itself"

2. Added comprehensive documentation for memory management functions:
   - ndpi_malloc(), ndpi_calloc(), ndpi_realloc()
   - ndpi_strdup(), ndpi_strndup()
   - ndpi_free()
   - ndpi_flow_malloc(), ndpi_flow_free()
   - ndpi_get_tot_allocated_memory()

   These critical functions were previously undocumented, which could
   confuse users about custom allocator support and memory tracking.

3. Documented high-priority utility functions:
   - ndpi_match_string_value() - automaton string matching
   - ndpi_strip_leading_trailing_spaces() - string trimming
   - ndpi_handle_risk_exceptions() - risk exception handling
   - set_ndpi_malloc(), set_ndpi_free() - custom allocator setup
   - set_ndpi_flow_malloc(), set_ndpi_flow_free() - flow allocator setup
   - set_ndpi_debug_function() - custom debug logging

4. Added detailed documentation for Community ID hash functions:
   - ndpi_flowv4_flow_hash() - IPv4 flow hashing
   - ndpi_flowv6_flow_hash() - IPv6 flow hashing
   - Added reference to Community ID specification
   - Clarified parameter byte ordering and buffer requirements

All documentation follows Doxygen format with @param and @return tags.
Build and tests verified: all tests pass (3/3).

Stats: +173 lines of documentation, -19 lines (typo fixes)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 21:07:16 +02:00
Ivan Nardi
a9cc75d634
ndpiReader: fix memory accounting (#2988)
We don't know how much memory we are currently using: we only know the
amount of total memory allocated. Use proper label to report this
information in a correct way
2025-10-12 18:12:01 +02:00
Ivan Nardi
dc5214b764
We are not interested into entropy for encrypted flows (#2983)
Update `only_classification.conf` configuration
2025-10-09 14:35:26 +02:00
Ivan Nardi
a07d55005d
fuzz: try to improve fuzzing coverage (#2981) 2025-10-06 20:44:31 +02:00
Ivan Nardi
3a06d2037f
ndpiReader: create a wrapper to configure nDPI (local) context (#2979)
Use it to better test domains, too
2025-10-05 11:39:46 +02:00
Ivan Nardi
8ad62d7e7f
ndpiReader: quick test for a list of domains (#2978) 2025-10-03 20:06:51 +02:00
Ivan Nardi
c9dfc946ff example: fix some proto ids in custom rules to ease unit test differences 2025-10-02 11:06:43 +02:00
Ivan Nardi
5aaab7f354
Fix ndpi_is_valid_hostname() (#2974)
It was completly broken.
Pay some attention to HTTP case where we might have Host header in the
"$DOMAIN:$PORT" form: we usually want to strip the port part

`memrchr` is not available on macOS and on Windows: create a wrapper
2025-09-29 12:27:21 +02:00
Luca Deri
15f8dad9e8 Modified ndpi_ranking_add_epoch() API 2025-09-27 22:16:25 +02:00
Ivan Nardi
ddd277fc44
HTTP: add further configuration to enable/disable metadata extraction (#2972)
Rename existing configuration knobs, to better separate metadata from
requests, from metadata from responses
2025-09-23 15:11:25 +02:00
Ivan Nardi
1c1535738f ndpiReader: ranking unit tests: disable logging 2025-09-23 14:38:25 +02:00
Luca
52ce501355 Improved ndpi_ranling calculation for
- keeping track of the number of updates without rank changes
- not creating new slots (but overwriting the last one) when a new update with no rank changes is computed. This way in the ranking atastructure there are only entries that caused ranking chnages
2025-09-17 19:45:04 +02:00
Ivan Nardi
8c81859467 Update "only_classification" configuration 2025-09-09 16:08:20 +02:00
Ivan Nardi
6a3228388b ndpiReader: improve debug option '-x' to test category matches 2025-09-05 19:58:25 +02:00
Luca Deri
52d4607bbd Extended ndpi_ranking_add_epoch() API 2025-09-05 07:33:08 +02:00
Ivan Nardi
efccc7d5e4
Rework flow breed (#2926)
Right now, there is, in essence, a static mapping between flow protocols
and flow breeds.
Make it dynamic: allow to have different flows, with the same
classification but differents breeds. This is the same logic that we
already have for categories....

Preliminary work to support breed in category lists.

API change from the app POV: to get the flow breed don't use anymore
`ndpi_get_proto_breed()`, but access directly `struct ndpi_proto->breed`

The functions `ndpi_domain_classify_*()` and
`ndpi_get_host_domain_suffix()` now have a `u_int32_t` parameter as
`class_id` (instead of `u_int_16_t`), with the following logic:
```
class_id = (breed << 16) | category
```
instead of the old:
```
class_id = category
```
Please note that this change is back-compatible: if you are not
interested into breeds, you don't need to update the application code.
2025-09-02 16:54:34 +02:00
Ivan Nardi
c25c1be778 tests: add an example of custom rule with nDPI fingerprint 2025-08-31 19:10:05 +02:00
Ivan Nardi
1da8b85ee7
Fix compilation and unit tests (#2948)
```
ndpi_analyze.c: In function ‘ndpi_deserialize_ranking’:
ndpi_analyze.c:2244:3: warning: ignoring return value of ‘fread’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
 2244 |   fread(&rank->header, sizeof(ndpi_ranking_header), 1, fd);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ndpi_main.c: In function ‘ndpi_match_host_subprotocol’:
ndpi_main.c:11798:9: warning: ‘__builtin_strncpy’ output may be truncated copying between 0 and 63 bytes from a string of length 255 [-Wstringop-truncation]
11798 |         strncpy(str, string_to_match, ndpi_min(string_to_match_len, sizeof(str)-1));
      |         ^
ndpi_main.c:11811:7: warning: ‘__builtin_strncpy’ output may be truncated copying between 0 and 63 bytes from a string of length 255 [-Wstringop-truncation]
11811 |       strncpy(str, string_to_match, ndpi_min(string_to_match_len, sizeof(str)-1));

```
2025-08-30 21:05:40 +02:00
Luca Deri
a6e2b4e252 Initial (WiP/basic) implementation of the ranking detection API used to
determine rank changes

  void ndpi_init_ranking(ndpi_ranking *rank, u_int16_t max_num_items, u_int16_t num_epochs);
  void ndpi_term_ranking(ndpi_ranking *rank);
  bool ndpi_serialize_ranking(ndpi_ranking *rank, const char *path);
  bool ndpi_deserialize_ranking(ndpi_ranking *rank, const char *path);
  void ndpi_print_ranking(ndpi_ranking *rank);
  u_int16_t ndpi_ranking_add_epoch(ndpi_ranking *rank, u_int32_t epoch,
                                  ndpi_ranking_epoch_entry *entries,
                                  u_int16_t num_epoch_entries,
                                  ndpi_ranking_change *changes /* Out */);
2025-08-28 16:26:44 +02:00
Luca Deri
7c53fcde85 Code cleanup
Added check in fingeprinting code
2025-08-21 12:30:40 +02:00
Luca Deri
11d74ea286 Implemented nDPI fingerprint that is computed using
- TCP fingerprint
- JA4 fingepriint
- TLS SHA1 certificate (if present), or JA3S fingerprint (is SHA1 is missing)

By default the fingerprint uses the client and server fingerprints (format 0)
and combines them. However you can chnge it format (eg. use only the client info,
format 1) with

--cfg NULL,metadata.ndpi_fingerprint_format,X

where X is the fingerprint format.

By default nDPI fingerprint is enabled but you can enable/disble it as follows

--cfg NULL,metadata.ndpi_fingerprint,0
2025-08-21 10:34:49 +02:00
Luca Deri
087726d12d Added support for JA4 in protos.txt
Format: ja4:XXXXX@CustomProtoJA4
2025-08-20 21:31:10 +02:00
fanxb
7a2ca82c9d
ndpiReader: Fix the crash issue during protocol guessing in multi-core scenarios. (#2939) 2025-08-08 11:58:17 +02:00
Ivan Nardi
eb5f8a037c
fuzz: improve coverage (#2931)
Sync `pl7m` code with upstream.
Add a new fuzzer to test the same flows with different L4 ports
2025-08-04 12:52:51 +02:00
Ivan Nardi
8dd2220116
Add the concept of protocols stack: more than 2 protocols per flow (#2913)
The idea is to remove the limitation of only two protocols ("master" and
"app") in the flow classifcation.
This is quite handy expecially for STUN flows and, in general, for any
flows where there is some kind of transitionf from a cleartext protocol
to TLS: HTTP_PROXY -> TLS/Youtube; SMTP -> SMTPS (via STARTTLS msg).

In the vast majority of the cases, the protocol stack is simply
Master/Application.

Examples of real stacks (from the unit tests)  different from the standard
"master/app":
* "STUN.WhatsAppCall.SRTP": a WA call
* "STUN.DTLS.GoogleCall": a Meet call
* "Telegram.STUN.DTLS.TelegramVoip": a Telegram call
* "SMTP.SMTPS.Google": a SMTP connection to Google server started in
  cleartext and updated to TLS
* "HTTP.Google.ntop": a HTTP connection to a Google domain (match via
  "Host" header) and to a ntop server (match via "Server" header)

The logic to create the stack is still a bit coarse: we have a decade of
code try to push everything in only ywo protocols... Therefore, the
content of the stack is still **highly experimental** and might change
in the next future; do you have any suggestions?

It is quite likely that the legacy fields "master_protocol" and
"app_protocol" will be there for a long time.

Add some helper to use the stack:
```
ndpi_stack_get_upper_proto();
ndpi_stack_get_lower_proto();
bool ndpi_stack_contains(struct ndpi_proto_stack *s, u_int16_t proto_id);
bool ndpi_stack_is_tls_like(struct ndpi_proto_stack *s);
bool ndpi_stack_is_http_like(struct ndpi_proto_stack *s);

```

Be sure new stack logic is compatible with legacy code:
```
assert(ndpi_stack_get_upper_proto(&flow->detected_protocol.protocol_stack) ==
       ndpi_get_upper_proto(flow->detected_protocol));
assert(ndpi_stack_get_lower_proto(&flow->detected_protocol.protocol_stack) ==
       ndpi_get_lower_proto(flow->detected_protocol));
```
2025-08-01 10:05:50 +02:00
Ivan Nardi
44b9a2da81
ndpiReader: add breed to flow information (#2924) 2025-07-30 18:46:28 +02:00
Luca Deri
8f661f9aa3 Cosmetic changes 2025-07-18 21:46:43 +02:00
Fábio Depin
4eff2cdb99
Refactor: make src_name/dst_name dynamically allocated to reduce RAM usage (#2908)
- Changed ndpi_flow_info: replaced fixed-size char arrays (always INET6_ADDRSTRLEN) for src_name and dst_name with char* pointers.
- Now IPv4 flows use only INET_ADDRSTRLEN when needed, instead of always reserving IPv6 size.
2025-07-02 07:41:55 +02:00
Fábio Depin
8987a2c184
Fix logic: reset stats once per thread after clearing all flow roots (#2905)
Call ndpi_stats_reset() once per thread instead of once per flow root

Moved ndpi_stats_reset() outside the loop that destroys ndpi_flows_root[]
to avoid redundant resets. The stats structure is shared per thread and
should only be reset once after all roots are cleared.
2025-06-24 15:07:20 +02:00
Fábio Depin
c2526cffc1
Fix stats memory reuse and cleanup across duration loops in ndpiReader (#2903) (#2904)
Refactored stats allocation and reset logic to avoid segmentation faults
when running ndpiReader in live_capture mode with the -m (duration) option.

- Introduced ndpi_stats_init(), ndpi_stats_reset(), and ndpi_stats_free()
  to encapsulate lifecycle management of stats.
- Applied these functions in ndpiReader.c and reader_util.{c,h}.
- Prevented multiple allocations and ensured safe reuse of cumulative_stats
  and per-thread stats structures between capture iterations.

Fixes: https://github.com/ntop/nDPI/issues/2903
2025-06-24 09:48:34 +02:00
Ivan Nardi
06a49b4086 ndpiReader: fix check on max number of packets per flow 2025-06-23 17:27:39 +02:00
Ivan Nardi
978ca1ba1a
New API to enable/disable protocols. Removed NDPI_LAST_IMPLEMENTED_PROTOCOL (#2894)
Change the API to enable/disable protocols: you can set that via the
standard `ndpi_set_config()` function, as every configuration
parameters. By default, all protocols are enabled.

Split the (local) context initialization into two phases:
* `ndpi_init_detection_module()`: generic part. It does not depend on the
configuration and on the protocols being enabled or not. It also
calculates the real number of internal protocols
* `ndpi_finalize_initialization()`: apply the configuration. All the
initialization stuff that depend on protocols being enabled or not
must be put here

This is the last step to have the protocols number fully calculated at
runtime

Remove a (now) useless fuzzer.

Important API changes:
* remove `NDPI_LAST_IMPLEMENTED_PROTOCOL` define
* remove `ndpi_get_num_internal_protocols()`. To get the number of
configured protocols (internal and custom) you must use
`ndpi_get_num_protocols()` after having called `ndpi_finalize_initialization()`
2025-06-23 11:24:18 +02:00