Address Daria's review: tree-sitter, path matching, MCP injection
Changes based on reviewer feedback:
- Promote tree-sitter-bash to primary parser recommendation (over bash
Hex package). Battle-tested grammar, robust against adversarial input.
- Fix path matching: directory-boundary matching instead of string prefix.
reads_file("~/.ssh") no longer matches ~/.ssh_backup/key.
- Tilde expansion uses HOME from hook payload, not daemon environment.
Correct for containers, remote SSH.
- MCP parameter injection: replaced regex with McpParameterInjection
validator that runs string values through BashAnalyzer for consistency.
- Regex safety: add 1ms evaluation timeout to prevent catastrophic
backtracking. Timed-out regex falls through to AST pass.
- Document that regex rules can only deny/suspect, never allow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -178,7 +178,8 @@ security-hooks/
|
|||||||
│ │ │ └── validators/
|
│ │ │ └── validators/
|
||||||
│ │ │ ├── unknown_executable.ex
|
│ │ │ ├── unknown_executable.ex
|
||||||
│ │ │ ├── dependency_mutation.ex
|
│ │ │ ├── dependency_mutation.ex
|
||||||
│ │ │ └── secret_access.ex
|
│ │ │ ├── secret_access.ex
|
||||||
|
│ │ │ └── mcp_parameter_injection.ex
|
||||||
│ │ └── security_hooks.ex
|
│ │ └── security_hooks.ex
|
||||||
│ ├── mix.exs
|
│ ├── mix.exs
|
||||||
│ └── test/
|
│ └── test/
|
||||||
@@ -374,7 +375,9 @@ Matches when a path appears in any of these AST positions:
|
|||||||
- Source/dot commands: `source ~/.bashrc`, `. ~/.profile`
|
- Source/dot commands: `source ~/.bashrc`, `. ~/.profile`
|
||||||
- Here-string file references: `command <<< $(cat ~/.ssh/id_rsa)`
|
- Here-string file references: `command <<< $(cat ~/.ssh/id_rsa)`
|
||||||
|
|
||||||
Paths are prefix-matched: `reads_file("~/.ssh")` matches `~/.ssh/id_rsa`, `~/.ssh/config`, etc. Tilde expansion is applied before matching.
|
Paths are matched by **directory boundary**, not string prefix: `reads_file("~/.ssh")` matches `~/.ssh/id_rsa` and `~/.ssh/config`, but does not match `~/.ssh_backup/key` or `~/.sshrc`. The rule path is treated as a directory — the candidate path must either equal it exactly or have it as an ancestor with a `/` separator.
|
||||||
|
|
||||||
|
Tilde expansion uses the `HOME` value from the hook payload (the user's environment), not the daemon's process environment. This ensures correct resolution in containerized or remote SSH scenarios.
|
||||||
|
|
||||||
#### `writes_file` semantics
|
#### `writes_file` semantics
|
||||||
|
|
||||||
@@ -383,7 +386,7 @@ Matches when a path appears in any of these AST positions:
|
|||||||
- Command arguments to known file-writing commands: `tee /etc/hosts`, `cp src /etc/`
|
- Command arguments to known file-writing commands: `tee /etc/hosts`, `cp src /etc/`
|
||||||
- The `dd` `of=` argument: `dd of=/dev/sda`
|
- The `dd` `of=` argument: `dd of=/dev/sda`
|
||||||
|
|
||||||
Same prefix-matching and tilde expansion as `reads_file`.
|
Same directory-boundary matching and tilde expansion as `reads_file`.
|
||||||
|
|
||||||
#### `sets_env` semantics
|
#### `sets_env` semantics
|
||||||
|
|
||||||
@@ -471,7 +474,9 @@ For bash rules, `{base_command}` is extracted as the first whitespace-delimited
|
|||||||
|
|
||||||
Rules are evaluated in **two passes**, grouped by matching strategy. First match within either pass wins.
|
Rules are evaluated in **two passes**, grouped by matching strategy. First match within either pass wins.
|
||||||
|
|
||||||
**Pass 1: Regex rules** — all regex-based rules are checked in file order (fast, microsecond-level). If any matches, the verdict is returned immediately and the AST parser is never invoked.
|
**Pass 1: Regex rules** — all regex-based rules are checked in file order (fast, microsecond-level). If any matches, the verdict is returned immediately and the AST parser is never invoked. Note: regex rules can only deny or flag as suspicious — they never produce an `allow` verdict. A regex false positive blocks a safe command (annoying but not a security hole); it can never greenlight a dangerous one.
|
||||||
|
|
||||||
|
All regex patterns are compiled with a 1ms evaluation timeout to prevent catastrophic backtracking from becoming a denial-of-service vector. If a regex times out, the rule is skipped (not matched), and the command falls through to the AST pass.
|
||||||
|
|
||||||
**Pass 2: AST rules** — if no regex rule matched, the command is parsed into an AST (once, cached for the request). All AST-based rules are then checked in file order against the parsed tree.
|
**Pass 2: AST rules** — if no regex rule matched, the command is parsed into an AST (once, cached for the request). All AST-based rules are then checked in file order against the parsed tree.
|
||||||
|
|
||||||
@@ -600,11 +605,7 @@ MCP rules match against `tool_name` (format: `mcp__servername__toolname`). The s
|
|||||||
```
|
```
|
||||||
# Tier: block — injection patterns in MCP tool parameters
|
# Tier: block — injection patterns in MCP tool parameters
|
||||||
block "mcp-parameter-injection"
|
block "mcp-parameter-injection"
|
||||||
match_any
|
validator SecurityHooks.Validators.McpParameterInjection
|
||||||
\$\(.*\)
|
|
||||||
`.*`
|
|
||||||
;\s*\w+
|
|
||||||
\|\s*\w+
|
|
||||||
nudge "MCP tool parameters contain shell injection patterns"
|
nudge "MCP tool parameters contain shell injection patterns"
|
||||||
|
|
||||||
# Tier: suspicious — external resource access
|
# Tier: suspicious — external resource access
|
||||||
@@ -925,6 +926,10 @@ defmodule SecurityHooks.Validators.DependencyMutation do
|
|||||||
end
|
end
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### `McpParameterInjection` validator
|
||||||
|
|
||||||
|
Scans all string values in MCP `tool_input` for shell injection patterns. Rather than using regex on serialized JSON (which is fragile), it iterates over parameter values and runs any string that looks like it could be shell-executed through the BashAnalyzer. If the AST contains command substitutions, pipes, or semicolons, it denies. This gives MCP parameter checking the same evasion resistance as bash command checking.
|
||||||
|
|
||||||
## Bash Parser Strategy
|
## Bash Parser Strategy
|
||||||
|
|
||||||
The AST matching layer requires a bash parser that produces a traversable tree of command nodes. This is the highest-risk technical dependency in the system.
|
The AST matching layer requires a bash parser that produces a traversable tree of command nodes. This is the highest-risk technical dependency in the system.
|
||||||
@@ -942,15 +947,15 @@ The parser must handle:
|
|||||||
- Variable assignments: `FOO=bar cmd`, `export FOO=bar`
|
- Variable assignments: `FOO=bar cmd`, `export FOO=bar`
|
||||||
- Quoting: single quotes, double quotes, `$"..."`, `$'...'`
|
- Quoting: single quotes, double quotes, `$"..."`, `$'...'`
|
||||||
|
|
||||||
### Parser evaluation (to be validated in implementation spike)
|
### Parser options
|
||||||
|
|
||||||
**Option A: `bash` Hex package** — if it exposes a stable parse API with the node types above, this is the simplest integration. Risk: the package is primarily an interpreter and its internal AST may not be a stable public API.
|
**Option A: tree-sitter-bash via Rust NIF (recommended)** — the tree-sitter bash grammar is battle-tested and widely used (ShellCheck, GitHub syntax highlighting, every major editor). A Rust NIF wrapping `tree-sitter` + `tree-sitter-bash` provides a robust, well-documented AST with concrete node types (command, pipeline, subshell, redirected_statement, variable_assignment, etc.). The grammar is maintained by the tree-sitter community and handles adversarial inputs well. The Rust NIF compiles cleanly into the Burrito binary.
|
||||||
|
|
||||||
**Option B: tree-sitter-bash via NIF** — the tree-sitter bash grammar is battle-tested and widely used (ShellCheck, GitHub syntax highlighting). A Rust NIF wrapping `tree-sitter` + `tree-sitter-bash` would provide a robust, well-documented AST. Higher implementation effort but lower risk for adversarial inputs.
|
**Option B: `bash` Hex package** — simpler integration if it exposes a stable parse API. Risk: the package is primarily a bash interpreter and its internal AST may not be a stable public API. Suitable as a fallback if the tree-sitter NIF proves too complex to build.
|
||||||
|
|
||||||
**Option C: `shlex` for tokenization + custom parser** — use `shlex` (Rust NIF) for POSIX-compliant tokenization, then build a lightweight parser on top for the structural features we need (pipelines, redirections, subshells). Middle ground: less effort than tree-sitter, more robust than depending on `bash` internals.
|
**Option C: `shlex` for tokenization + custom parser** — use `shlex` (Rust NIF) for POSIX-compliant tokenization, then build a lightweight parser for structural features. Less robust than tree-sitter for deeply nested or adversarial inputs.
|
||||||
|
|
||||||
**Decision:** Validate Option A first during the implementation spike. If the `bash` package's parse API is insufficient or unstable, fall back to Option B (tree-sitter-bash). The rule engine's interface to the parser is abstracted behind `BashAnalyzer`, so the parser can be swapped without affecting rules or the rest of the system.
|
**Decision:** Use tree-sitter-bash via Rust NIF as the primary parser. The tree-sitter grammar is the most battle-tested option and is the standard choice for security-sensitive shell analysis. The rule engine's interface to the parser is abstracted behind `BashAnalyzer`, so the parser can be swapped if needed.
|
||||||
|
|
||||||
### Security considerations for the parser
|
### Security considerations for the parser
|
||||||
|
|
||||||
@@ -975,4 +980,6 @@ Elixir/Hex packages required by the daemon:
|
|||||||
- `toml` — TOML config parsing
|
- `toml` — TOML config parsing
|
||||||
- `file_system` — cross-platform file watcher for hot-reload
|
- `file_system` — cross-platform file watcher for hot-reload
|
||||||
- `burrito` — compile to single-binary for distribution
|
- `burrito` — compile to single-binary for distribution
|
||||||
- Bash parser — one of: `bash` (Hex), tree-sitter-bash (via Rust NIF), or `shlex` + custom parser (see Bash Parser Strategy)
|
|
||||||
|
Rust NIF (compiled into the Burrito binary):
|
||||||
|
- `tree-sitter` + `tree-sitter-bash` — bash AST parser for structural command analysis (primary parser, see Bash Parser Strategy)
|
||||||
|
|||||||
Reference in New Issue
Block a user