* added feature to extract filename from http attachment
* fixed some issues
* added check for filename format
* added check for filename format
* remove an unnecessary print
* changed the size from 952 to 960
* modified some test result files
* small changes string size
* comment removed and mallocs checked
The two fields `flow->flow_type` and `flow->protos.rtp.stream_type` are
pretty much identical: rename the former in `flow->flow_multimedia_type`
and remove the latter.
Remove `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` define from
`fuzz/Makefile.am`; it is already included by the main configure script
(when fuzzing).
Add a knob to force disabling of AESNI optimizations: this way we can
fuzz also no-aesni crypto code.
Move CRC32 algorithm into the library.
Add some fake traces to extend fuzzing coverage. Note that these traces
are hand-made (via scapy/curl) and must not be used as "proof" that the
dissectors are really able to identify this kind of traffic.
Some small updates to some dissectors:
CSGO: remove a wrong rule (never triggered, BTW). Any UDP packet starting
with "VS01" will be classified as STEAM (see steam.c around line 111).
Googling it, it seems right so.
XBOX: XBOX only analyses UDP flows while HTTP only TCP ones; therefore
that condition is false.
RTP, STUN: removed useless "break"s
Zattoo: `flow->zattoo_stage` is never set to any values greater or equal
to 5, so these checks are never true.
PPStream: `flow->l4.udp.ppstream_stage` is never read. Delete it.
TeamSpeak: we check for `flow->packet_counter == 3` just above, so the
following check `flow->packet_counter >= 3` is always false.
All dissector callbacks should not be exported by the library; make static
some other local functions.
The callback logic in `ndpiReader` has never been used.
With internal libgcrypt, `gcry_control()` should always return no
errors.
We can check `categories` length at compilation time.
Tell "Advertised" ALPN list from "Negotiated" ALPN; the former is
extracted from the CH, the latter from the SH.
Add some entries to the known ALPN list.
Fix printing of "TLS Supported Versions" field.
* fixed autoconf CFLAGS/LDFLAGS MSAN issue which could lead to build errors
* introduced portable version of gmtime_r aka ndpi_gmtime_r
* do as most as possible of the serialization work in ndpi_utils.c
* use flow2json in ndpiReader
Signed-off-by: lns <matzeton@googlemail.com>
Add (basic) internal stats to the main data structures used by the
library; they might be usefull to check how effective these structures
are.
Add an option to `ndpiReader` to dump them; enabled by default in the
unit tests.
This new option enables/disables dumping of "num dissectors calls"
values, too (see b4cb14ec).
Add (basic) internal stats to the main data structures used by the
library; they might be usefull to check how effective these structures
are.
Add an option to `ndpiReader` to dump them; disabled by default to avoid
too much fuss with the unit tests.
* Fixes#1528
* Serialization Interface should also fuzzed
* libjson-c may only be used in the unit test to verify the internal serialization interface
* Serialization Interface supports tlv(broken), csv and json
* Unit test does work again and requires libjson-c
Signed-off-by: lns <matzeton@googlemail.com>
At every fuzz iteration (i.e for every trace file):
* keep the same ndpi context (`ndpi_init_detection_module` is very
slow);
* reset the flow table, otherwise it grows indefinitely.
This change should fix the "out-of-memory" errors reported by oss-fuzz.
As a general rule, the higher the confidence value, the higher the
"reliability/precision" of the classification.
In other words, this new field provides an hint about "how" the flow
classification has been obtained.
For example, the application may want to ignore classification "by-port"
(they are not real DPI classifications, after all) or give a second
glance at flows classified via LRU caches (because of false positives).
Setting only one value for the confidence field is a bit tricky: more
work is probably needed in the next future to tweak/fix/improve the logic.
Looking at `struct ndpi_flow_struct` the two bigger fields are
`host_server_name[240]` (mainly for HTTP hostnames and DNS domains) and
`protos.tls_quic.client_requested_server_name[256]`
(for TLS/QUIC SNIs).
This commit aims to reduce `struct ndpi_flow_struct` size, according to
two simple observations:
1) maximum one of these two fields is used for each flow. So it seems safe
to merge them;
2) even if hostnames/SNIs might be very long, in practice they are rarely
longer than a fews tens of bytes. So, using a (single) large buffer is a
waste of memory for all kinds of flows. If we need to truncate the name,
we keep the *last* characters, easing domain matching.
Analyzing some real traffic, it seems safe to assume that the vast
majority of hostnames/SNIs is shorter than 80 bytes.
Hostnames/SNIs are always converted to lowercase.
Attention was given so as to be sure that unit-tests outputs are not
affected by this change.
Because of a bug, TLS/QUIC SNI were always truncated to 64 bytes (the
*first* 64 ones): as a consequence, there were some "Suspicious DGA
domain name" and "TLS Certificate Mismatch" false positives.
We can write to `flow->protos` only after a proper classification.
This issue has been found in Kerberos, DHCP, HTTP, STUN, IMO, FTP,
SMTP, IMAP and POP code.
There are two kinds of fixes:
* write to `flow->protos` only if a final protocol has been detected
* move protocol state out of `flow->protos`
The hard part is to find, for each protocol, the right tradeoff between
memory usage and code complexity.
Handle Kerberos like DNS: if we find a request, we set the protocol
and an extra callback to further parsing the reply.
For all the other protocols, move the state out of `flow->protos`. This
is an issue only for the FTP/MAIL stuff.
Add DHCP Class Identification value to the output of ndpiReader and to
the Jason serialization.
Extend code coverage of fuzz tests.
Close#1343Close#1342
`ndpiReader` is only an example, aiming to show nDPI capabilities
and integration, without any claim about performances.
Nonetheless its memory usage per flow is *huge*, limiting the kinds
of traces that we can test on a "normal" hardware (example: scan
attacks).
The key reason of that behaviour is that we preallocate all the memory
needed for *all* the available features.
Try to reduce memory usage simply allocating some structures only
when they are really needed. Most significant example: JOY algorithms.
This way we should use a lot less memory in the two most common
user-cases:
* `ndpiReader` invoked without any particular flag (i.e `ndpiReader -i
$FILENAME_OR_IFACE`)
* internal unit tests
Before (on x86_64):
```
struct ndpi_flow_info {
[...]
/* size: 7320, cachelines: 115, members: 72 */
```
After:
```
struct ndpi_flow_info {
[...]
/* size: 2128, cachelines: 34, members: 75 */
```
The goal is to have a (roughly) idea about how many packets nDPI needs
to properly classify a flow.
Log this information (and guessed flows number too) during unit tests,
to keep track of improvements/regressions across commits.
Introduced in 5f7b9d802
reader_util.c: In function ‘process_ndpi_collected_info’:
reader_util.c:1148:60: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 64 [-Wformat-truncation=]
1148 | sizeof(flow->ssh_tls.client_requested_server_name), "%s",
| ^~
reader_util.c:1147:5: note: ‘snprintf’ output between 1 and 256 bytes into a destination of size 64
1147 | snprintf(flow->ssh_tls.client_requested_server_name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1148 | sizeof(flow->ssh_tls.client_requested_server_name), "%s",
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1149 | flow->ndpi_flow->protos.tls_quic_stun.tls_quic.client_requested_server_name);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some BSD APIs called in example/ return `struct bpf_timeval`, where nDPI
APIs expect `struct timeval`. These two structs, besides having
a different name, share the exact same set of fields.