From a133271a6c9b31f89af3c7563ae42f32df5f0e1b Mon Sep 17 00:00:00 2001 From: Flo Date: Mon, 30 Mar 2026 10:31:44 +0200 Subject: [PATCH] Address Daria's review: tree-sitter, path matching, MCP injection Changes based on reviewer feedback: - Promote tree-sitter-bash to primary parser recommendation (over bash Hex package). Battle-tested grammar, robust against adversarial input. - Fix path matching: directory-boundary matching instead of string prefix. reads_file("~/.ssh") no longer matches ~/.ssh_backup/key. - Tilde expansion uses HOME from hook payload, not daemon environment. Correct for containers, remote SSH. - MCP parameter injection: replaced regex with McpParameterInjection validator that runs string values through BashAnalyzer for consistency. - Regex safety: add 1ms evaluation timeout to prevent catastrophic backtracking. Timed-out regex falls through to AST pass. - Document that regex rules can only deny/suspect, never allow. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../specs/2026-03-26-security-hooks-design.md | 37 +++++++++++-------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/docs/superpowers/specs/2026-03-26-security-hooks-design.md b/docs/superpowers/specs/2026-03-26-security-hooks-design.md index 246e4d3..1757f34 100644 --- a/docs/superpowers/specs/2026-03-26-security-hooks-design.md +++ b/docs/superpowers/specs/2026-03-26-security-hooks-design.md @@ -178,7 +178,8 @@ security-hooks/ │ │ │ └── validators/ │ │ │ ├── unknown_executable.ex │ │ │ ├── dependency_mutation.ex -│ │ │ └── secret_access.ex +│ │ │ ├── secret_access.ex +│ │ │ └── mcp_parameter_injection.ex │ │ └── security_hooks.ex │ ├── mix.exs │ └── test/ @@ -374,7 +375,9 @@ Matches when a path appears in any of these AST positions: - Source/dot commands: `source ~/.bashrc`, `. ~/.profile` - Here-string file references: `command <<< $(cat ~/.ssh/id_rsa)` -Paths are prefix-matched: `reads_file("~/.ssh")` matches `~/.ssh/id_rsa`, `~/.ssh/config`, etc. Tilde expansion is applied before matching. +Paths are matched by **directory boundary**, not string prefix: `reads_file("~/.ssh")` matches `~/.ssh/id_rsa` and `~/.ssh/config`, but does not match `~/.ssh_backup/key` or `~/.sshrc`. The rule path is treated as a directory — the candidate path must either equal it exactly or have it as an ancestor with a `/` separator. + +Tilde expansion uses the `HOME` value from the hook payload (the user's environment), not the daemon's process environment. This ensures correct resolution in containerized or remote SSH scenarios. #### `writes_file` semantics @@ -383,7 +386,7 @@ Matches when a path appears in any of these AST positions: - Command arguments to known file-writing commands: `tee /etc/hosts`, `cp src /etc/` - The `dd` `of=` argument: `dd of=/dev/sda` -Same prefix-matching and tilde expansion as `reads_file`. +Same directory-boundary matching and tilde expansion as `reads_file`. #### `sets_env` semantics @@ -471,7 +474,9 @@ For bash rules, `{base_command}` is extracted as the first whitespace-delimited Rules are evaluated in **two passes**, grouped by matching strategy. First match within either pass wins. -**Pass 1: Regex rules** — all regex-based rules are checked in file order (fast, microsecond-level). If any matches, the verdict is returned immediately and the AST parser is never invoked. +**Pass 1: Regex rules** — all regex-based rules are checked in file order (fast, microsecond-level). If any matches, the verdict is returned immediately and the AST parser is never invoked. Note: regex rules can only deny or flag as suspicious — they never produce an `allow` verdict. A regex false positive blocks a safe command (annoying but not a security hole); it can never greenlight a dangerous one. + +All regex patterns are compiled with a 1ms evaluation timeout to prevent catastrophic backtracking from becoming a denial-of-service vector. If a regex times out, the rule is skipped (not matched), and the command falls through to the AST pass. **Pass 2: AST rules** — if no regex rule matched, the command is parsed into an AST (once, cached for the request). All AST-based rules are then checked in file order against the parsed tree. @@ -600,11 +605,7 @@ MCP rules match against `tool_name` (format: `mcp__servername__toolname`). The s ``` # Tier: block — injection patterns in MCP tool parameters block "mcp-parameter-injection" - match_any - \$\(.*\) - `.*` - ;\s*\w+ - \|\s*\w+ + validator SecurityHooks.Validators.McpParameterInjection nudge "MCP tool parameters contain shell injection patterns" # Tier: suspicious — external resource access @@ -925,6 +926,10 @@ defmodule SecurityHooks.Validators.DependencyMutation do end ``` +### `McpParameterInjection` validator + +Scans all string values in MCP `tool_input` for shell injection patterns. Rather than using regex on serialized JSON (which is fragile), it iterates over parameter values and runs any string that looks like it could be shell-executed through the BashAnalyzer. If the AST contains command substitutions, pipes, or semicolons, it denies. This gives MCP parameter checking the same evasion resistance as bash command checking. + ## Bash Parser Strategy The AST matching layer requires a bash parser that produces a traversable tree of command nodes. This is the highest-risk technical dependency in the system. @@ -942,15 +947,15 @@ The parser must handle: - Variable assignments: `FOO=bar cmd`, `export FOO=bar` - Quoting: single quotes, double quotes, `$"..."`, `$'...'` -### Parser evaluation (to be validated in implementation spike) +### Parser options -**Option A: `bash` Hex package** — if it exposes a stable parse API with the node types above, this is the simplest integration. Risk: the package is primarily an interpreter and its internal AST may not be a stable public API. +**Option A: tree-sitter-bash via Rust NIF (recommended)** — the tree-sitter bash grammar is battle-tested and widely used (ShellCheck, GitHub syntax highlighting, every major editor). A Rust NIF wrapping `tree-sitter` + `tree-sitter-bash` provides a robust, well-documented AST with concrete node types (command, pipeline, subshell, redirected_statement, variable_assignment, etc.). The grammar is maintained by the tree-sitter community and handles adversarial inputs well. The Rust NIF compiles cleanly into the Burrito binary. -**Option B: tree-sitter-bash via NIF** — the tree-sitter bash grammar is battle-tested and widely used (ShellCheck, GitHub syntax highlighting). A Rust NIF wrapping `tree-sitter` + `tree-sitter-bash` would provide a robust, well-documented AST. Higher implementation effort but lower risk for adversarial inputs. +**Option B: `bash` Hex package** — simpler integration if it exposes a stable parse API. Risk: the package is primarily a bash interpreter and its internal AST may not be a stable public API. Suitable as a fallback if the tree-sitter NIF proves too complex to build. -**Option C: `shlex` for tokenization + custom parser** — use `shlex` (Rust NIF) for POSIX-compliant tokenization, then build a lightweight parser on top for the structural features we need (pipelines, redirections, subshells). Middle ground: less effort than tree-sitter, more robust than depending on `bash` internals. +**Option C: `shlex` for tokenization + custom parser** — use `shlex` (Rust NIF) for POSIX-compliant tokenization, then build a lightweight parser for structural features. Less robust than tree-sitter for deeply nested or adversarial inputs. -**Decision:** Validate Option A first during the implementation spike. If the `bash` package's parse API is insufficient or unstable, fall back to Option B (tree-sitter-bash). The rule engine's interface to the parser is abstracted behind `BashAnalyzer`, so the parser can be swapped without affecting rules or the rest of the system. +**Decision:** Use tree-sitter-bash via Rust NIF as the primary parser. The tree-sitter grammar is the most battle-tested option and is the standard choice for security-sensitive shell analysis. The rule engine's interface to the parser is abstracted behind `BashAnalyzer`, so the parser can be swapped if needed. ### Security considerations for the parser @@ -975,4 +980,6 @@ Elixir/Hex packages required by the daemon: - `toml` — TOML config parsing - `file_system` — cross-platform file watcher for hot-reload - `burrito` — compile to single-binary for distribution -- Bash parser — one of: `bash` (Hex), tree-sitter-bash (via Rust NIF), or `shlex` + custom parser (see Bash Parser Strategy) + +Rust NIF (compiled into the Burrito binary): +- `tree-sitter` + `tree-sitter-bash` — bash AST parser for structural command analysis (primary parser, see Bash Parser Strategy)