ktlint requires a blank line between a top-level declaration and the
next declaration that's preceded by a comment block (rule
standard:spacing-between-declarations-with-comments). Adds the two
missing blank lines around the new ZYGISK_DEBUG_LOGGING_PERSIST and
ZYGISK_DEBUG_LOGGING_FILE constants from the previous commit.
The Zygisk debug_logging flag lived only in the module dir
(/data/adb/modules/vpnhide_zygisk/), which KSU/Magisk wipe on every
module reinstall. Until the user next opened the app, all newly
forked target apps under that module would fall back to the OFF
default — even when the user had toggled debug logging ON. Same
problem the targets.txt file solved long ago by living in the
persistent dir; apply the same fix here.
- Canonical location: /data/adb/vpnhide_zygisk/debug_logging
(alongside the existing targets.txt). Survives module reinstall.
- writeDebugFlagFiles writes BOTH the persistent file and the
module-dir mirror on every toggle, so the .so still reads its
current value via get_module_dir() fd at next fork.
- zygisk/module/service.sh copies persistent → module-dir at every
boot, mirroring the existing targets.txt copy. Restores the
module-dir mirror after a reinstall without needing the app
to open.
- MainActivity.onCreate re-propagation comment updated: this call
is now a safety-net for "reinstall mid-session, no reboot yet"
rather than the only defence against module-dir wipe.
- docs/state.md sections 1, 2, 3 updated.
No changelog fragment — internal refactor; the user-visible effect
(less likely to need to re-toggle after zygisk reinstall) is too
subtle for a CHANGELOG line.
The "Debug logging" toggle in Diagnostics drove three sinks: app
process (VpnHideLog.enabled), system_server hooks (/data/system/
vpnhide_debug_logging via inotify FileObserver), and the Zygisk
module (/data/adb/modules/vpnhide_zygisk/debug_logging at fork).
The kernel module's /proc/vpnhide_debug was its own little channel,
flipped only inside exportDebugZip's capture window — so a user
who enabled "Debug logging" persistently still got silent kmod,
and the kmod-flip in exportDebugZip would even reset it back to
0 at the end regardless of the user's preference.
Make kmod the fourth sink:
- writeDebugFlagFiles now drives /proc/vpnhide_debug alongside the
other two file flags, batched into a single su invocation so the
toggle UI doesn't pay three round-trips.
- /proc/vpnhide_debug is per-boot in-kernel state, so kmod's
service.sh re-seeds it from the canonical /data/system/
vpnhide_debug_logging at every boot — same model as targets.txt
→ /proc/vpnhide_targets.
- Manual echo 1/echo 0 > /proc/vpnhide_debug calls in exportDebugZip
go away — applyDebugLoggingRuntime + the existing loggingWasForced
save/restore block now handle kmod uniformly with the other sinks,
fixing the "kmod stays OFF after capture even if user toggle was
ON" bug as a side effect.
- Updated en + ru toggle description to mention dmesg, since the
toggle is no longer logcat-only.
- docs/state.md §3 and §4 updated; new "Debug-logging fan-out"
subsection in §3 with a diagram.
Behaviour change worth a changelog entry: a user with toggle ON
will now get verbose kmod entries in dmesg too.
Adds docs/state.md — a reference doc grouped by location prefix
(/data/adb/modules/, /data/adb/vpnhide_*/, /data/system/vpnhide_*,
/proc/vpnhide_*, app SharedPrefs + filesDir, iptables chains,
external paths we read). Each entry records format, writers,
readers (with file:line cites), lifetime, and mode/owner/SELinux
label where it matters.
Includes a heads-up that Vector LSPosed redirects the app's
SharedPreferences to /data/misc/<vector-uuid>/prefs/<pkg>/, so
inspecting /data/data/<pkg>/shared_prefs/ from a root shell
shows nothing even after the user toggles a setting.
Linked from CLAUDE.md's "Read before touching code" list.
The three writeToParcel hooks reach into private NetworkCapabilities /
NetworkInfo / LinkProperties fields by reflection (mIfaceName, mRoutes,
mStackedLinks, mTransportTypes, mNetworkCapabilities, mTransportInfo,
mNetworkType, mState, mDetailedState, mIsAvailable) plus two private
constructors. If a future AOSP rev renames or retypes any of those, the
hook silently fails open per call: getObjectField throws NoSuchFieldError,
the inner catch logs it but doesn't replace param.result, the original
writeToParcel runs and the VPN flag leaks to the target app. Worse, the
NC hook reads its critical field BEFORE the isTarget check, so a rename
spams an exception on every system_server NetworkCapabilities IPC, not
just target ones.
HookEntry.installSystemServerHooks now runs runReflectionSmokeCheck()
once at install (one boot, ~12 reflection lookups, not hot path) over a
declarative probe table for the 10 fields + 2 ctors. Each FieldProbe
carries a typeCheck predicate (e.g. mTransportTypes must be `long`,
mRoutes must be MutableList-assignable) and an optional minSdk gate so
mTransportInfo (API 29) is skipped on Android 9 / API 28 rather than
being reported as broken there. Per-hook gates split the keys into
critical vs non-critical: LP installs as long as mIfaceName + copy ctor
are intact (mRoutes / mStackedLinks already have inner try/catch and
degrade gracefully); NC installs as long as the two long bitmasks are
intact (mTransportInfo's existing inner try/catch covers it); NI is
all-or-nothing because its body has no per-field defensive catches.
A broken critical key skips the hook entirely instead of installing
one that would throw NoSuchFieldError on every call.
writeHookStatusFile now always writes `aosp_sdk=<int>` and, when the
smoke-check rejected anything, `broken_fields=<comma list>` (the keys
include `:type=<actual>` for type mismatches, so a bug report carries
exact AOSP-drift information). DashboardData reads both and emits an
ERROR issue with both the SDK and the broken list so users can see and
report the drift even when the lsposed module otherwise heartbeats as
Active.
Verified empirically on Pixel 8 Pro / Android 16 (SDK 36): post-reboot
status file shows aosp_sdk=36 and no broken_fields line; logcat shows
all three hooks firing per call without exceptions.
Lives at CLAUDE.md (Claude Code's filename), exposed as AGENTS.md via
symlink so vendor-neutral tooling that follows the AGENTS-convention
(Codex, Cursor, etc.) finds the same file. Both names point at one
canonical source.
Content:
- layout overview (kmod / zygisk / lsposed / portshide)
- links to CONTRIBUTING / docs / kmod/BUILDING (the rest of the repo
rules already live there; this file just makes them prominent)
- workflow rules: changelog fragment per user-visible change,
don't bump VERSION / don't run release.py without ask, don't put
`#NN` in commit messages (auto-links to wrong PRs)
- build entry points: kmod/build.py, zygisk/build.py, gradle commands
- design note about a possible KPatch-Next KPM port
The only side effect is that the auto-review workflow (claude-code-review)
now has CLAUDE.md to check compliance against — two of its four agents
were running idle without one. Scope is plain markdown, useful as a
short orientation file regardless of which (if any) AI tooling reads it.
The `code-review` plugin is dry-run by default — it formats the review
into the Job Summary but does NOT publish a PR comment unless `--comment`
is passed in the slash invocation. That's why every auto-review run so
far finished green with permission_denials_count > 0 and zero comments
on the PR: the action ran, Claude reviewed, but the plugin's last
sentence was "No --comment argument was provided, so no GitHub comment
will be posted."
Adding `--comment` to the prompt. Permissions were never the blocker —
the `claude[bot]` GitHub App already has write on issues + PRs at
install time, and the on-demand workflow (`@claude` mentions) has been
posting fine without any of our workflow-level changes.
service.sh waits for netd's bw_OUTPUT chain to appear, then applies our
rules once. On slow boots netd has been observed to flush/rebuild its
own chains AFTER bw_OUTPUT first appears — wiping the rules we just
installed. The rules then stay missing until the user opens the app
and Save runs apply again.
Add a backgrounded second pass 30 seconds after the first apply. The
apply script is already idempotent (chains created with
`-N ... 2>/dev/null || true`, atomically rebuilt via
`iptables-restore --noflush`), so the extra pass is harmless when
nothing was wiped and self-healing when it was. Cost is one shell
process and one iptables-restore per boot.
shellcheck clean (no new lint).
Earlier commit e977af0 raised pull-requests/issues to `write` in both
workflows on the assumption that workflow-level permissions gate the
Claude action's GitHub API. They don't — the action uses an OIDC token
exchanged for the `claude[bot]` GitHub App installation token, whose
permissions are configured at App install time and already include
read+write on issues + pull requests. Workflow `permissions:` only
control the default `secrets.GITHUB_TOKEN`, which the action doesn't
use unless `github_token: ${{ secrets.GITHUB_TOKEN }}` is passed in.
Revert the perms back to read-only (now matches reality) and add a
short comment explaining why — so the next reader doesn't try the
same dead end.
Also enable `display_report: true` on the auto-review job so Claude's
full review text shows up in the Actions log, even when it classifies
its inline findings as low-confidence and skips publishing them. Right
now we couldn't tell whether the action had nothing to flag or whether
the inline classifier filtered everything out — display_report makes
that visible without leaking secrets (it's the same review body that
would have been posted as a comment).
Both Claude workflows had only read-level scopes, so the auto-review
job ran for ~3 minutes per PR and silently dropped its findings — the
post-buffered-inline-comments step reported "No buffered inline
comments" because GitHub denied the API write. Same with the on-demand
@claude responder.
claude-code-review.yml: pull-requests read -> write
claude.yml: pull-requests read -> write
issues read -> write
Deliberately keeping `contents: read` everywhere — Claude advises,
doesn't commit. If we ever want @claude-driven code edits, we'd add
that scope explicitly to the on-demand workflow only.
Two small Kotlin fixes — both prevent silent UX failures.
ShellUtils.suExec — bound the runtime of a single su invocation.
Previously `proc.waitFor()` blocked indefinitely if the su binary hung
(e.g. waiting on a GUI prompt the user dismissed). UI was already on
Dispatchers.IO so the main thread didn't block, but the spinner just
never went away.
* Drain stdout on a dedicated thread (an AtomicReference holds the
result), same as the existing stderr drain. `readText()` directly
on the process input stream blocks until EOF — without a separate
drain, a hung child means waitFor(timeout) never even gets reached.
* `proc.waitFor(timeoutSec, TimeUnit.SECONDS)` with a 10-second cap.
On timeout, destroyForcibly() the child and return exit=-1.
* 1-second join on each drain to bound worst-case cleanup.
* `SU_DEFAULT_TIMEOUT_SEC` constant + `timeoutSec` parameter on both
`suExec` and `suExecAsync` so callers with a longer-running command
(none today, but easy to add) can override.
AppHidingScreen / AppPickerScreen / PortsHidingScreen — don't drop
unsaved checkbox edits when a cache refresh fires under us.
* `LaunchedEffect(cachedApps, targets)` rebuilds `allApps` from disk
snapshot whenever either flow emits — ON_RESUME via
`TargetsCache.ensureLoaded`, or another screen calling
`TargetsCache.refresh()`. With dirty=true (user has unsaved
checkbox toggles), the rebuild silently overwrote local state and
flipped dirty back to false. Edits gone.
* Add an `if (dirty) return@LaunchedEffect` guard at the top of each
of the three rebuilders. Saves restore the snapshot-as-truth path
naturally (Save flips dirty=false, then the next emission rebuilds).
Local: ktlint clean, `:app:lintDebug + :app:testDebugUnitTest` BUILD SUCCESSFUL.
Verified on Pixel 8 Pro: dashboard 26/26 PASS in Enforcing — no
regression in any probe.
Four small touches across zygisk and lsposed/native, all on adversarial-
or kernel-edge-case paths. Happy paths and the per-app filter behaviour
are unchanged.
zygisk hooks: recvmsg multi-iov no longer bypasses the filter
Old code bailed out as soon as `msg_iovlen != 1`. A caller could
hand recvmsg an iov array with one extra zero-length iov to skip
filtering entirely. Now we filter the portion of `ret` that landed
in iov[0] (recvmsg fills iovs in order) and propagate any shrink
back into the returned byte count. Bytes that fell into iov[1..]
pass through — bionic only ever uses iov[0], so legitimate dumps
are unaffected; an attacker who deliberately splits a netlink
message across iovs at least loses everything that landed in
iov[0], which is strictly better than the previous bypass.
zygisk install_hooks: roll back partial installs on error
install_hooks() walks five libc symbols and `?`-propagates the
first failure. Previously, if hook 1-4 succeeded and hook 5 failed,
the process kept hooks 1-4 in place — the app would see filtered
ioctl/getifaddrs/openat/recvmsg but unfiltered recv, a torn plan
that's worse than no install at all. Now we collect each
shadowhook stub as it installs and, on the first failure, walk
back through them in LIFO order calling shadowhook_unhook before
surfacing the original error. Added a thin `shadowhook::unhook`
wrapper so the FFI call site stays out of lib.rs.
zygisk hooks: document the TOCTOU window in is_dirfd_proc_net
match_rel_proc_net resolves dirfd via readlink before the open,
so a caller who races dup2/fchdir between the two can pass the
classifier and have the open land elsewhere. This is accepted
exposure (caller-controlled self-DoS, real detectors don't go
through dirfd anyway) — added a comment so the next reader
doesn't think it's an oversight.
lsposed/native for_each_rtattr: pass bounds-checked payload slice
Old callbacks computed `b.as_ptr().add(rta_off + 4)` and read
payload from there. If a kernel response set rta_len == 4
(header only, zero payload), the read would pass the rtattr
loop check (`if rta.rta_len < 4 { break }`) and then access
bytes belonging to the next attr or past the message end. Now
for_each_rtattr verifies `off + rta_len <= end` and yields a
`&[u8]` payload slice; callbacks check its length before
reading (e.g. `payload.len() >= 4` for an i32 ifindex).
Replaces the `*const i32` read with a safe `i32::from_ne_bytes`.
Verified on Pixel 8 Pro (husky, android14-6.1, Android 16):
Enforcing : 26/26 PASS, COLD start ~1.9 s.
Permissive : 22/26 PASS — same four by-design FAILs as before
(proc_dev / sys_class_net / proc_ipv6_route /
proc_if_inet6). No regression in netlink_getlink,
netlink_getroute, getifaddrs, ioctl_*, proc_route,
proc_fib_trie. zygisk and APK-native md5 match local
build artifacts.
Eight small touches in vpnhide_kmod.c plus a README sync. No behaviour
change for normal probe paths — the per-UID filter, the matcher, and
all six hooks behave identically. Edits target failure modes and
maintainability around the existing logic.
Code:
arm64 ABI guard
Handlers read syscall arguments via regs->regs[N] (AAPCS64). Add
`#ifndef CONFIG_ARM64 #error … #endif` so a non-arm64 build fails
loudly instead of silently producing a module that reads garbage.
maxactive 20 -> VPNHIDE_KRETPROBE_MAXACTIVE = 64
All six probes used .maxactive = 20, only marginally above the
NR_CPUS*2 default (~18 on a 9-core Pixel 8 Pro). Hot ioctl/netlink
paths under multi-app concurrency can exhaust that and silently
bump nmissed (= leaked iface). 64 buys headroom for ~30 KB total.
dev_ioctl: replace `data->cmd = 0` magic flag with `bool active`
The old code set cmd to 0 in the entry handler when the caller
wasn't a target, then keyed the ret handler on `cmd == 0`. Magic
sentinel; if any future ioctl number ever hashed to 0 the flag
would silently misbehave.
filter_ifconf_buf returns enum, sock_ioctl_ret handles partial
writes
Old function silently bailed on copy_from_user / copy_to_user
failure and could leave userspace with a half-compacted buffer
plus the original (now-stale) ifc_len. Now the function returns
`FILTER_IFCONF_NO_CHANGE / CHANGED / COPY_FAULT` and the caller
skips the put_user(ifc_len) on COPY_FAULT — better to leak all
ifaces visibly than to expose a length-vs-content mismatch.
put_user(ifc_len) error checked
Previously dropped on the floor — if updating ifc_len failed,
userspace would see compacted buffer with old length. Now logs
via vpnhide_dbg and returns; userspace falls back to the
pre-compaction view.
READ_ONCE/WRITE_ONCE around debug_enabled
Single bool, written from /proc/vpnhide_debug, read from every
probe handler. Compiler can't tear or hoist now — kosher style
for unsynchronised flags.
Header comment: dev_ioctl/sock_ioctl
Corrected the file-top hook list — it still claimed `dev_ifconf`
for SIOCGIFCONF, but the actual probe is on `sock_ioctl` (LTO
inlines dev_ifconf on 5.10 + the symbol moves out of
sock_do_ioctl on 6.1+, both rationale already in the inline
comment block at hook 2).
Doc:
README.md `rtnl_fill_ifinfo` table row + the standalone
`-EMSGSIZE trick` and `why NOT -EMSGSIZE` sections were stale
after #103 (which made all three netlink fill probes use
`skb_trim` + return 0). Replaced with one short joint section
pointing at issue #38 for context.
Verified on Pixel 8 Pro (husky, android14-6.1, Android 16):
Enforcing : 26/26 PASS, COLD start ~1020 ms.
Permissive : 22/26 PASS, same 4 by-design FAILs as before, no
regression in netlink_getlink / netlink_getroute /
getifaddrs / ioctl_* / proc_route / proc_fib_trie.
Six small review-list items rolled together — all CI/dev-tooling, no
runtime behaviour change.
#12 Dockerfile: pin Rust 1.95.0 and cargo-ndk 4.1.2 (was floating
`stable` + latest cargo-ndk on monthly rebuild). Versions live
in ENV vars to make the next bump a one-line edit.
#13 Add shellcheck to lint job. SC2034/SC3043 excluded — Magisk
reads SKIPUNZIP externally; Android's /system/bin/sh (mksh on
Pixel) does support `local` despite POSIX. Verified locally
that the 11 .sh files (module-side + dev tooling) pass.
shellcheck baked into the CI image via apt; inline apt-get
fallback covers the window before image rebuild.
#24 ci.yml keystore.properties: replace heredoc with `printf '%s\n'`.
Heredoc without single-quoted EOF re-expands $, backticks and
backslashes in the password — printf takes the value verbatim.
#31 scripts/release.py::patch_file now hard-fails when a regex
pattern doesn't match (was silently leaving stale versions).
#32 Split rotate_fragments_into_history into rotate + delete steps
so release.py can save_json + write_md *before* unlinking the
fragment files. If anything in between fails, fragments are
still on disk and the run is retryable.
#37 codegen-interfaces.py: emit `assert!(matches_vpn(…), msg)` /
`assert!(!matches_vpn(…), msg)` instead of
`assert_eq!(matches_vpn(…), true/false, msg)` —
clippy::bool_assert_comparison was firing on every generated
row under `cargo clippy --tests`. Both generated test modules
regenerated. CI's clippy steps now also pass `--tests` so this
class of regression is caught.
#106 added \`cargo install uniffi-bindgen --version "^0.29" --locked\`,
which fails ci-image rebuilds:
error: could not find \`uniffi-bindgen\` in registry \`crates-io\`
with version \`^0.29\`
Two errors in the original change:
1. The crate Gobley installs is \`gobley-uniffi-bindgen\` (its own
fork on crates.io at 0.3.7), not upstream \`uniffi-bindgen\`.
2. Gobley installs the binary into \`app/build/gobley-tools-install/
uniffi-bindgen/\`, not \`~/.cargo/bin\`. A globally pre-installed
binary wouldn't satisfy the task's UP-TO-DATE check anyway.
\`org.gradle.caching=true\` from #106 already makes
\`installUniffiBindgen\` go UP-TO-DATE on warm runs (verified locally),
so the optimisation is in effect via the build cache instead.
Profiling the warm-cache run on PR #105 showed three remaining hot spots
in the Gradle phase:
installUniffiBindgen 52s ← cargo install on every CI build
cargoBuildAndroidArm64Debug 30s ← Rust crate compile
lintAnalyze* (3 variants) 43s ← AGP Lint × main + unit + androidTest
This PR cuts the first one entirely and trims the third.
- Dockerfile: pre-install uniffi-bindgen 0.29.x in the CI image so
Gobley's :app:installUniffiBindgen task finds it ready instead of
rebuilding it from sources on every run. Triggers a ci-image
rebuild on merge — wait for that workflow to finish before merging
consumers (or the first lint/lsposed run will still hit the old
image and behave as before).
- lsposed/gradle.properties: enable build cache + configuration
cache. Verified locally: `./gradlew :app:assembleDebug
--configuration-cache` reports "Configuration cache entry stored"
cleanly with Gobley 0.3.7 + AGP 8.9.3 + Kotlin 2.1.20.
- lsposed/app/build.gradle.kts: `lint { checkTestSources = false }`.
Skips lintAnalyzeDebugUnitTest / lintAnalyzeDebugAndroidTest. Test
sources here are pure JVM unit-test logic — functional bugs caught
by :app:testDebugUnitTest, no Android-lifecycle code to lint.
Deliberately leave `checkReleaseBuilds` at its default so ad-hoc
`./gradlew :app:lint` still catches R8/ProGuard issues.
- .github/workflows/ci.yml: `:app:lint` -> `:app:lintDebug`. Lints
the debug variant only on PRs; release-variant Lint stays
available locally / for future tag-time CI.
- docs/development.md: refresh local-lint snippet.
Expected effect on warm cache (cumulative on top of PR #105):
lint 286s -> ~190s (3m10s, -32%)
lsposed 227s -> ~130s (2m10s, -42%)
Repo had ~1800 lines of Python (kmod/build.py, scripts/*, zygisk/build.py,
portshide/build-zip.py) with no formatter or linter. Long-lived scripts
like scripts/release.py and scripts/codegen-interfaces.py benefit from
catching unused-import / undefined-name / outdated-syntax issues early.
pyproject.toml — ruff config, target-py312, line-length 100,
rules E F W I B UP SIM. Excludes zygisk/third_party,
target/, .claude/.
ci.yml — astral-sh/ruff-action@v4 for `format --check` and `check`,
ahead of the slow Rust/Gradle steps so it fails fast.
docs/development.md — add `uvx ruff …` to the local-lint snippet.
Cleanup applied (`ruff format` + `ruff check --fix`):
- reformat: kmod/build.py, scripts/{changelog_lib,codegen-interfaces,
release,stats}.py, zygisk/build.py
- I001: split multi-name imports onto separate lines after the
sys.path.insert prelude (kmod/build.py, zygisk/build.py)
- E501 manual: wrap one console.print line in scripts/release.py
Stdlib-only invariant from scripts/build_lib.py is preserved — ruff is
a dev/CI tool, not imported at runtime.
The two slowest CI jobs were both Gradle:
lint 6m41s (Android lint = 264s of it)
lsposed 6m44s (assembleRelease = 307s of it)
Cargo was already cached; Gradle was not.
Changes:
- gradle/actions/setup-gradle@v6 in lint and lsposed jobs. Caches
~/.gradle/caches, wrapper, configuration cache. cache-read-only on
PRs so only main pushes write it.
- lint job now has a cargo cache too (was missing). Combined key for
both Cargo.lock files.
- lint: Android lint and Kotlin unit tests run in one Gradle
invocation (./gradlew :app:lint :app:testDebugUnitTest). Saves a
second Gobley/AGP configuration phase + JVM startup.
- lsposed: assembleDebug for PRs and main pushes, assembleRelease only
for v* tags. R8/ProGuard runs only when the artifact actually goes
into a release.
- Drop --no-daemon: with one invocation per job (or one warm daemon
between two), keeping the daemon is cheaper than killing it.
- Drop the manual `export ANDROID_NDK_ROOT="$ANDROID_NDK_HOME"`. The
CI image's Dockerfile already sets ANDROID_NDK_ROOT (line 63), so
the workaround is redundant.
Two related changes that ship together because they touch the same
build-script + docs surface and were verified together on-device.
16 KiB alignment
- zygisk/build.rs: pass `-Wl,-z,max-page-size=16384` to lld so the
cdylib's LOAD segments line up on 16 KiB pages. NDK r28+ already
does this by default, but the flag keeps r27 builds compatible.
- lsposed/native/build.rs: new file, same flag, for libvpnhide_checks.so.
- docs/development.md: bumped the NDK requirement to r28+ and noted
the 16 KiB rationale.
Verified via `llvm-readelf -l`: both libvpnhide_zygisk.so and
libvpnhide_checks.so now show `Align 0x4000` on every LOAD segment.
Unified build entry points
- kmod/build.py replaces kmod/build-zip.py. Single script that
auto-detects whether to build natively (we're inside the DDK image
or `--kdir` was passed) or to spawn `ghcr.io/ylarod/ddk-min` via
podman/docker. CI uses the same script with `--inside-container`.
- zygisk/build-zip.py renamed to zygisk/build.py for symmetry; logic
unchanged.
- kmod/BUILDING.md rewritten — local build is now one command:
`./kmod/build.py --kmi android14-6.1` (or `--all`). The old
hand-rolled podman/docker recipes are gone.
- .github/workflows/ci.yml updated to call the new entry points.
The DDK image tag in CI now has a comment pointing at
`DDK_IMAGE_TAG` in kmod/build.py as the source of truth.
- README.{md,en.md}, kmod/README.md, zygisk/README.md, docs/releasing.md,
scripts/build_lib.py: reference updates.
- README.en.md: also fixes a "bacame" typo and tightens the Windows
zygisk-build note (the aux.rs / libgit2 issue is still real).
Verified end-to-end on Pixel 8 Pro (husky, android14-6.1, Android 16):
APK installs, kmod + zygisk modules load, all 26 self-checks PASS in
Enforcing, 22/26 PASS in Permissive (the same 4 by-design FAILs as
before — kmod doesn't cover those paths in Permissive).
Returning -EMSGSIZE from rtnl_fill_ifinfo for VPN devs hangs RTM_GETLINK
dumps on android14-6.1: the dump iterator retries the same entry forever
on an empty skb. inet6_fill_ifaddr in this same file already documented
and worked around the issue using skb_trim + return-0; mirror that for
rtnl_fill_ifinfo so RTM_GETLINK skips VPN ifaces cleanly.
Reproduced on Pixel 8 Pro (husky, android14-6.1) with SELinux Permissive:
the netlink_getlink check inside dev.okhsunrog.vpnhide hung on splash
screen. After the fix all three netlink-backed checks (getifaddrs,
netlink_getlink, netlink_getroute) PASS in Permissive in <1s.
`open_filtered_proc_net` used a fixed 65 536-byte stack array as the
read buffer. On devices with several hundred concurrent TCP sockets
(p2p clients, browsers with many tabs, dev devices with tethering),
`/proc/net/tcp6` exceeds 64 KiB — the previous code silently dropped
the tail and handed the truncated content back to the caller via
memfd. An app reading its own socket list could miss real entries,
which is a correctness bug, not a security one (the truncated tail
might contain non-VPN sockets the app needs).
Replace the stack array with a thread-local growable Vec<u8>:
* Initial capacity 64 KiB matches the previous fixed buffer, so the
first call (and most subsequent ones) does no reallocation.
* When the read fills capacity, `Vec::reserve(8 * 1024)` triggers
amortised doubling — large files cost O(log size) reallocations.
* `clear()` between calls keeps capacity intact: the second and later
calls in the same thread are zero-allocation.
* Bounded memory: per-thread overhead is `max-observed-size`. On
Android only one or two threads ever open /proc/net/* (network-
info worker, diagnostics probe), so the steady state is well under
a megabyte of RAM.
Trade-off: had to drop the `const { ... }` block on the thread_local
because `Vec::with_capacity` is not const fn. The per-`with()` lazy-
init flag check is negligible relative to the syscalls we're about
to issue.
Empirically: on the device used for testing /proc/net/tcp6 stays at
~15 KiB (no truncation triggered with the old buffer either). The
fix is preventive for power-user scenarios where the limit is
actually reached. SELinux on stock Android blocks /proc/net/* for
untrusted_app domain entirely, so the new path is exercised only on
permissive ROMs or by privileged callers — testing was therefore
limited to verifying no regression on the happy path.
The three files written under /data/system/ to coordinate state between
the LSPosed system_server hook and the app — vpnhide_uids.txt,
vpnhide_hidden_pkgs.txt, vpnhide_observer_uids.txt — were chmodded
0644 root:root. /data/system/ itself is mode 0775 system:system,
traversable by untrusted apps, so any "other"-readable file there is
both enumerable (`ls /data/system/`) and openable by name. Untrusted
apps could:
cat /data/system/vpnhide_uids.txt # all target UIDs
cat /data/system/vpnhide_hidden_pkgs.txt # the hide list
cat /data/system/vpnhide_observer_uids.txt # observer UIDs
If the reader's own UID is in vpnhide_uids.txt, that's a positive
"vpnhide is filtering me right now" detection — strictly stronger than
the presence-of-marker fingerprint we already closed for
vpnhide_hook_active in PR #100.
Switch every write site to mode 0640 + chown root:system. system_server
runs as UID 1000 with `system` (GID 1000) in its supplementary groups,
so it still gets read via the group bit. Untrusted apps fall to the
"other" octet (now ---) and get EACCES on open.
Empirically verified on Pixel:
before: 644 root:root → `cat` from untrusted shell succeeds
after: 640 root:system → untrusted shell EACCES;
`su system -c cat` (uid=1000) reads fine,
mirroring what system_server sees
Boot-time service.sh in both kmod and zygisk modules also include an
idempotent migration block that re-stamps any pre-PR files left at
0644 by an older version on the next boot. Closes#36 in REVIEW.
Workflow-level `contents: write` was granted to every job — lint,
zygisk build, lsposed build, portshide build, kmod matrix — even
though only the release job needs it (to create the draft GitHub
release via softprops/action-gh-release@v2). Tighten to the
least-privilege default of `contents: read` at the workflow level
and override with `permissions: contents: write` on the release job
alone. Reduces blast radius if any of the lint/build jobs ever runs
untrusted code from a PR.
The Diagnostics → Export debug bundle ran logcat with -s filtering
on VPNHideTest, VpnHide, and VpnHide-Dashboard tags only. zygisk
writes under `vpnhide-zygisk` (zygisk/src/lib.rs LOG_TAG), so the
zip the user ends up sharing was missing every native-side hook
trace — exactly the half that's hardest to reproduce locally.
`cp $ZYGISK_TARGETS $ZYGISK_MODULE_TARGETS 2>/dev/null; true` hid
both stderr and the exit code, so a read-only mount or SELinux
denial silently broke "edits in the app aren't picked up by zygisk
on next app launch" — the user had no signal.
Rewrite as `if [ -d ... ]; then cp ... 2>&1; fi`, capture exit and
output, and log a warn() only when both indicate a real failure
(non-zero exit + non-empty stderr). The dir-not-installed case
naturally produces exit=0 + empty output and stays quiet.
The else-branch in sanitizeLinkProperties' stacked-links loop set
`modified = true` whenever `stackedCopy !== value`, i.e. after any
successful clone. Cloning is the common case (the catch fallback is
just a defensive shim for a missing copy ctor), so the flag was
effectively always-true: every non-empty stacked map triggered the
trailing `stacked.clear() + putAll(filtered)`, even when no VPN data
was present and no modification happened.
Drop the `stackedCopy !== value` disjunct. `modified` now reflects
only real sanitization changes — the rewrite is skipped when there's
nothing to write back, and the clones in `filtered` are discarded
along with the never-mutated original entries staying in place. No
behavioural change for VPN-stripping callers; just less wasted work
on the non-VPN-bearing path.
inotify MODIFY fires on every write(2), not just on close. With the
current single-write `echo $b64 | base64 -d > $file` writers (see
ShellUtils.ensureSelfInTargets and the AppPicker save path) it
effectively always fires before CLOSE_WRITE, causing the hook to
invalidate the cache and re-read the file. If a writer ever switches
to a multi-write pattern (`echo line >> file` repeated, or anything
that flushes mid-buffer), the hook would see a partially-written
file. CLOSE_WRITE + MOVED_TO covers both single-write and atomic-
rename writers without the mid-write race.
`/data/system/vpnhide_hook_active` was being chmodded to 0644 via
`setReadable(true, false)` so the VPN Hide app could read it
directly. But the app already runs every other read of files under
/data/system/ via `suExec` (see DashboardData.kt's read of this
exact file), so the world-readable bit was redundant — and a
discoverable marker for anti-tamper SDKs that scan /data/system/
for known filenames.
Drop the line; the file stays at the default 0640 system:system
mode, readable by system_server (which writes it) and by anything
running as root (which is how the app reads it).
`c` iterates one character at a time over the test name string;
a single character is never the empty string, so the disjunct was
always False. The remaining `0x20 <= ord(c) < 0x7F` already
correctly rejects non-ASCII names, so behaviour is unchanged.