fix(security): harden destructive paths and add audit tiers (v0.6.0)

Address critical and high findings from an external security review. Critical/high fixes: - reset-signing no longer treats general-purpose keys (id_ed25519, etc.) as deletion candidates, defaults the delete prompt to No, and never deletes files in -y mode - FIDO2 retry now re-runs the same attempt (for-loop reassignment bug silently advanced to the next fallback key type) - core.hooksPath redirection installs dispatch stubs for all client-side hook types so repo-local hooks (husky, lefthook, pre-commit) keep running; pre-commit combines gitleaks with dispatch and warns loudly when gitleaks is absent - public-key validation everywhere a key path is consumed, preventing private key material in allowed_signers or user.signingkey - config backups written mode 600 (may contain tokens) - SSH config audit/apply is scope-aware (global vs host-specific), appends new directives at EOF to preserve precedence, scans Include-d files for keys - pubkey algorithm restriction guarded against RSA/DSA-only lockout and chooses the directive name by OpenSSH version Added: - audit tiers (security/hygiene/preference); --audit exit 2 reflects security-tier issues only - signing smoke test catching No-principal-matched at setup time - http.sslVerify audit distinguishes unset from insecure override Docs: correct fsmonitor precedence, log.showSignature and fsckObjects breakage, SSH scoping semantics in REASONING.md; plan for agent-backed keys (1Password/Bitwarden/forwarded agents) in docs/specs. 126/126 BATS tests pass; shellcheck clean. Closes #53 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 23:55:31 +02:00
parent e27bbaaa43
commit 382a35c47e
5 changed files with 1091 additions and 179 deletions
@@ -4,6 +4,16 @@ Every setting `git-harden.sh` audits or applies exists because of a specific att

 Settings are grouped the same way they appear in the script's audit output.

+## Audit Tiers
+
+Not every audited item carries the same weight, so each one belongs to one of three tiers:
+
+- **security** — protects against a concrete attack vector (protocol restrictions, object integrity, hook control, credential storage, signing, …)
+- **hygiene** — operational robustness and forensic readiness (reflog retention, `fetch.prune`, `user.useConfigOnly`, `pull.ff`, …)
+- **preference** — ecosystem alignment with no security impact (`init.defaultBranch`, `log.showSignature`)
+
+The audit summary reports issue counts per tier. In `--audit` mode, **only security-tier issues produce exit code 2** — hygiene and preference findings are reported but never fail a CI or fleet-compliance check. This keeps the exit code meaningful as a gate: a machine that fails the audit has an actual attack-surface problem, not a branch-naming opinion.
+
 ---

 ## Identity
@@ -28,9 +38,19 @@ Settings are grouped the same way they appear in the script's audit output.

 **Attack/risk mitigated:** Malicious or corrupted packfiles that exploit parsing vulnerabilities in the git binary. Historical CVEs include integer overflows in packfile handling and crafted objects that trigger code execution. Also catches silent data corruption from disk/network errors.

-**What could break:** Adds ~5-10% overhead to clone and fetch operations on large repositories. Some very old repositories with technically malformed (but benign) objects may fail to clone until the upstream runs `git fsck --full` and fixes them.
+**What could break:** Adds ~5-10% overhead to clone and fetch operations on large repositories. More importantly: a meaningful number of popular, actively maintained repositories contain historic objects that are technically malformed but benign (zero-padded file modes, missing tagger lines, malformed committer dates) — cloning those will **fail**, not just warn. This is not a rare-edge-case setting.

-**Why this default:** The performance cost is small. The alternative — silently accepting corrupted objects — has no upside.
+**Mitigation when it bites:** Don't turn fsck back off globally. Downgrade the specific, known-benign message class instead, e.g.:
+
+```
+git config --global fetch.fsck.zeroPaddedFilemode ignore
+git config --global fetch.fsck.badTimezone ignore
+git config --global fetch.fsck.missingTaggerEntry ignore
+```
+
+This keeps structural/hash validation intact while tolerating the legacy quirk you actually hit.
+
+**Why this default:** The performance cost is small and the failure mode is loud and fixable per message class. The alternative — silently accepting corrupted or malicious objects — has no upside.

 ### `transfer.bundleURI = false`

@@ -48,9 +68,9 @@ Settings are grouped the same way they appear in the script's audit output.

 **Attack/risk mitigated:** Stale remote refs can be confusing and misleading. In a security context, a deleted branch that still appears locally may cause a developer to base work on abandoned or reverted code.

-**What could break:** Nothing. This matches the behavior of `git fetch --prune`. Pruning only affects remote-tracking refs, not local branches.
+**What could break:** Pruning only affects remote-tracking refs, not local branches. One caveat: when a remote-tracking ref was the *only* reference to some commits, pruning makes them unreachable and therefore eligible for garbage collection once the reflog entries expire. The extended reflog retention configured under Forensic Readiness keeps that window long (90+ days), but "zero downside" would overstate it.

-**Why this default:** Pure hygiene with zero downside.
+**Why this default:** Hygiene with a negligible, reflog-mitigated downside. Tier: hygiene.

 ---

@@ -101,11 +121,11 @@ Settings are grouped the same way they appear in the script's audit output.

 **What it does:** Disables the filesystem monitor integration (fsmonitor, Watchman). This feature speeds up `git status` in large repos by using OS-level file change notifications.

-**Attack/risk mitigated:** The fsmonitor hook (`core.fsmonitor--hook-version`) can execute arbitrary commands. A malicious repository could set this in its local config. Disabling it globally prevents this vector.
+**Attack/risk mitigated:** `core.fsmonitor` can point at an arbitrary command, which git then executes. **An important precision about what this setting does and does not protect against:** git config precedence means a *local* (`.git/config`) value overrides the global one — so setting `false` globally does **not** neutralize a repository that carries its own fsmonitor setting. What actually limits that attack is that `.git/config` is not transferred on clone; the realistic delivery vector is an embedded/planted `.git` directory, which is what `safe.bareRepository` and git's ownership checks (CVE-2022-24765 fix) address. The global `false` here is defense-in-depth: it prevents the feature from being silently enabled by tooling or copied configs, and it removes a code-execution-capable knob from the default environment.

 **What could break:** Performance of `git status` in very large repositories (100k+ files) where fsmonitor provides significant speedups. Developers working on such repos can override this per-repo.

-**Why this default:** Most repositories are not large enough to notice the difference. The attack surface is not worth the performance gain for typical use.
+**Why this default:** Most repositories are not large enough to notice the difference. The attack surface is not worth the performance gain for typical use — but be aware it is a hardening layer, not the primary defense.

 ### `core.symlinks = false` (interactive-only, skipped in `-y` mode)

@@ -125,11 +145,13 @@ Settings are grouped the same way they appear in the script's audit output.

 **What it does:** Redirects git hook execution from each repository's `.git/hooks/` directory to a centralized, user-controlled directory.

-**Attack/risk mitigated:** Malicious repositories can include hooks (e.g., `pre-commit`, `post-checkout`) that execute on clone, commit, or checkout. By redirecting to a user-managed directory, repo-local hooks are ignored unless explicitly installed.
+**Attack/risk mitigated:** Malicious repositories can include hooks (e.g., `pre-commit`, `post-checkout`) that execute on clone, commit, or checkout. By redirecting to a user-managed directory, repo-local hooks are ignored unless explicitly dispatched.

-**What could break:** Project-specific hooks defined in `.git/hooks/` or installed by frameworks like `husky`, `lefthook`, or `pre-commit`. Teams using these must either: (a) install their hooks into the global hooks directory, or (b) override `core.hooksPath` per-repo via `git config --local`.
+**How repo-local hooks keep working:** Redirecting hooks would otherwise *silently* disable every repo-local hook of every type — including security hooks teams installed deliberately (husky, lefthook, the `pre-commit` framework). To prevent that, the script installs **dispatch stubs** for all client-side hook types in the global hooks directory. Each stub forwards to the repository's own `.git/hooks/<name>` when present and executable. The `pre-commit` stub additionally runs the gitleaks secret scan first. The result: you get centralized control plus a visible choke point, without breaking per-repo workflows. (Repos that set `core.hooksPath` in their local config — husky v9 does — override the global value entirely and are unaffected.)

-**Why this default:** The attack is trivial to execute and devastating (arbitrary code execution). Teams that need repo-local hooks can override per-repo.
+**What could break:** Hooks installed into `.git/hooks/` still run, but now *after* the global stub's own logic (for `pre-commit`: after the secret scan). Tools that verify their hook is "installed" by checking the effective `core.hooksPath` may complain.
+
+**Why this default:** The attack is trivial to execute and devastating (arbitrary code execution). With dispatch stubs the usual breakage objection no longer applies.

 ---

@@ -143,7 +165,9 @@ Settings are grouped the same way they appear in the script's audit output.

 **What could break:** False positives on test fixtures or example credentials may require bypassing with `SKIP_GITLEAKS=1 git commit`. Adds ~1-2 seconds to each commit.

-**Why this default:** Both research reports rank pre-commit secret scanning as the #1 workstation-level defense. The hook is safe without gitleaks installed (guards with `command -v`). The `SKIP_GITLEAKS` bypass avoids the need for `--no-verify` which skips ALL hooks.
+**Fail-open by design, but loudly:** If gitleaks is not installed, the hook does not block commits — but it prints a clearly visible "secret scan SKIPPED" warning on every commit instead of silently doing nothing. Failing *closed* (blocking all commits until gitleaks is installed) was considered and rejected: it punishes machines the user doesn't fully control and trains people to delete the hook. A loud warning preserves the signal without the lockout.
+
+**Why this default:** Both research reports rank pre-commit secret scanning as the #1 workstation-level defense. The `SKIP_GITLEAKS` bypass avoids the need for `--no-verify` which skips ALL hooks. After scanning, the hook dispatches to the repository's own `pre-commit` hook (see Hook Control above).

 ---

@@ -213,6 +237,8 @@ Settings are grouped the same way they appear in the script's audit output.

 **Attack/risk mitigated:** Disabling SSL verification allows MITM attacks even over HTTPS — the attacker presents any certificate and git accepts it.

+**Audit semantics:** Unset is **not** the same as overridden. The audit reports an unset `http.sslVerify` as OK (git's built-in default is `true`) and only flags an explicit insecure override. The apply phase still pins it to `true` when other changes are being made, as a guard against future overrides in lower-precedence scopes.
+
 **What could break:** Self-signed certificates on internal git servers. The proper fix is to add the CA certificate to git's trust store (`http.sslCAInfo`), not to disable verification globally.

 **Why this default:** Ensuring the default hasn't been overridden. This is a safety net, not a new restriction.
@@ -297,7 +323,13 @@ Settings are grouped the same way they appear in the script's audit output.

 **Attack/risk mitigated:** Makes unsigned or invalid signatures visible in normal workflow. Without this, developers must remember to use `git log --show-signature` to check.

-**What could break:** Log output is slightly more verbose. Some terminal environments may not render the verification status cleanly.
+**What could break:** More than "slightly more verbose" — be honest about this one:
+
+- **Anything that parses `git log` output breaks.** Scripts, release tooling, and integrations that read log output without `--no-show-signature` get signature blocks interleaved with the text they expect to parse.
+- **Every log invocation pays a verification cost** (an `ssh-keygen`/`gpg` call per displayed commit), which is noticeable on large histories.
+- **Without a matching allowed_signers entry, every entry shows "No principal matched"** noise — which is why the script now smoke-tests the signature round-trip at setup time.
+
+Scripts you control should use `git log --no-show-signature` or `git -c log.showSignature=false`. Tier: preference — turn it off if it fights your tooling; verification on the hosting platform is unaffected.

 **Why this default:** Signature verification is only useful if people see the results. Making it visible by default closes the gap between "we sign commits" and "we verify signatures."

@@ -331,10 +363,14 @@ Settings are grouped the same way they appear in the script's audit output.

 **What could break:** Nothing — the file is additive. Without it, local verification simply doesn't work (signatures are only verified on the hosting platform).

+**The "No principal matched" trap:** verification matches the *committer email* against the principals in this file. If a repo overrides `user.email` (work vs. personal identities) or the entry was written with a different address, `git log` shows "Good signature … No principal matched" on every commit. To catch this at setup time instead of in every future log, the script offers a signing smoke test after enabling signing: it signs a test message with the configured key and verifies it against allowed_signers with the recorded principal. Per-identity setups need one allowed_signers line per email (the same key can appear on multiple lines).
+
 ---

 ## SSH Configuration

+**How the script reads and writes `~/.ssh/config`:** ssh resolves options with *first-obtained-wins* semantics, and a directive inside a `Host github.com` block does not apply globally. The audit therefore only counts directives in **global scope** (top-level lines or a `Host *` block) — a directive that exists only in host-specific blocks is reported as "no global default". When applying, the script replaces values in global scope only, and appends new directives in a `Host *` block at the **end** of the file, so existing host-specific settings always keep precedence. `Include`-d files are scanned for keys (one level deep) but their directives are not audited or modified; the audit prints a notice when Includes are present.
+
 ### `StrictHostKeyChecking = accept-new`

 **What it does:** Automatically accepts host keys on first connection (TOFU — Trust On First Use) but rejects changed keys on subsequent connections.
@@ -371,6 +407,11 @@ Settings are grouped the same way they appear in the script's audit output.

 **What could break:** Connections to legacy servers that only support RSA. These servers should be upgraded; RSA-SHA1 is deprecated by OpenSSH since version 8.7.

+**Guards the script applies before setting this:**
+
+1. **Key inventory.** If the only available keys (on disk *or* loaded in the SSH agent) are RSA/DSA, applying this directive would lock the user out of every server those keys authenticate to. The script warns, requires an explicit default-No confirmation, and skips entirely in `-y` mode.
+2. **OpenSSH version.** The option is spelled `PubkeyAcceptedAlgorithms` from OpenSSH 8.5 and `PubkeyAcceptedKeyTypes` before that — and an unknown option in `~/.ssh/config` makes *every* ssh invocation fail with "Bad configuration option". The script detects the client version and writes the correct spelling, or skips the directive when the client is too old or not OpenSSH.
+
 **Why these algorithms:** Ed25519 is the recommended default (fast, small keys, no parameter pitfalls). ECDSA P-256 is included because some FIDO2 hardware keys only support it. RSA is excluded because accepting it creates a fallback path to weaker cryptography.

 ---
@@ -0,0 +1,112 @@
+# git-harden.sh — Agent-Backed Keys: 1Password / Bitwarden / Forwarded Agents, Zero Plaintext Keys on Disk
+
+Status: **proposed** (plan for v0.7.0)
+Depends on: v0.6.0 (scope-aware SSH config handling, key inventory via `ssh-add -L`, signing smoke test)
+
+## Motivation
+
+The script currently assumes file-based keys: it scans `~/.ssh/*.pub`, generates keys to disk, and treats "no key files" as "no keys". That assumption breaks — and in some places actively fights — the strongest available key-storage model:
+
+- **1Password / Bitwarden SSH agents** hold private keys encrypted in a vault and expose them only through an agent socket, with per-use approval prompts. No private key ever touches disk.
+- **Hardware keys** (FIDO2 resident keys) keep the private key in the authenticator.
+- **Forwarded agents** (`SSH_AUTH_SOCK` in a remote/container session) let agent-less environments — including agent containers used for AI-assisted development — sign and authenticate without holding any identity locally.
+
+The end state this plan targets: **a machine can pass the full audit and have working signing + authentication with zero plaintext private keys on disk.** Today it can't: the audit reports "No SSH public keys found", the signing wizard only offers to generate file-based keys, and `IdentitiesOnly yes` (which we set) can silently break agent-only setups.
+
+## Current gaps
+
+| # | Gap | Where |
+|---|-----|-------|
+| 1 | Key hygiene audit ignores agent-held keys ("No SSH public keys found" on a fully provisioned 1Password machine) | `audit_ssh_key_hygiene` |
+| 2 | Signing wizard offers only on-disk generation (software file or FIDO2 file handle) | `signing_wizard` |
+| 3 | `IdentitiesOnly yes` is applied without checking that configured `IdentityFile` stubs exist — agent-only keys stop being offered | `apply_ssh_config` |
+| 4 | No detection of *unencrypted* private keys on disk — the exact artifact this plan wants to eliminate | missing audit |
+| 5 | No `ForwardAgent` policy; forwarded-agent risk/benefit is undocumented | missing audit + docs |
+| 6 | Signing smoke test requires a private key file next to the `.pub` | `verify_signing_setup` |
+
+## Design
+
+### Phase 1 — Detection & audit (no behavior changes)
+
+**1a. Agent detection helper.** `detect_ssh_agents` reports every reachable agent:
+
+- `SSH_AUTH_SOCK` (generic; also covers forwarded agents — detect forwarding via `SSH_CONNECTION`/`SSH_TTY` being set alongside it)
+- 1Password: `~/Library/Group Containers/2BUA8C4S2C.com.1password/t/agent.sock` (macOS), `~/.1password/agent.sock` (Linux)
+- Bitwarden: `~/.bitwarden-ssh-agent.sock`
+- gpg-agent: `gpgconf --list-dirs agent-ssh-socket`
+
+Probe each socket with `SSH_AUTH_SOCK=<sock> ssh-add -L`; report agent type, reachability, and key count. Sockets are *probed read-only*; nothing is written.
+
+**1b. Agent keys join the key-hygiene audit.** `audit_ssh_key_hygiene` merges `ssh-add -L` output (per detected agent) with on-disk `.pub` files, deduplicated by key blob. Agent keys get the same weak-algorithm checks and are labeled `(agent: 1password)` etc. "No SSH public keys found" only appears when disk *and* agents are empty.
+
+**1c. New audit: plaintext private keys on disk** (tier: security).
+
+- Enumerate candidate private keys in `~/.ssh` (files whose first line is an OpenSSH/PEM private key header — never by filename guessing).
+- Classify encrypted vs. unencrypted: `ssh-keygen -y -P "" -f <key>` succeeding in batch mode ⇒ **unencrypted**.
+- `[WARN security]` unencrypted private key on disk → recommend vault import + passphrase or deletion after migration.
+- `[INFO]` encrypted private key on disk when an agent holds the same public key → candidate for cleanup (Phase 4).
+- Audit-only, like the existing credential-hygiene section: never modify or delete.
+
+**1d. `ForwardAgent` audit** (tier: security). `[OK]` when unset/`no` globally; `[WARN]` when `yes` in global scope ("any root user on any host you ssh to can use your keys while connected — scope it per-host, prefer agents with per-use confirmation"). Uses the v0.6.0 scope-aware reader.
+
+### Phase 2 — Signing without key files
+
+**2a. Wizard option 3: "Use a key from your SSH agent".** Lists agent keys (`ssh-add -L`, modern algorithms only), lets the user pick one, then:
+
+- `user.signingkey = key::<keytype> <blob> <comment>` (literal pub key, no file needed; supported since Git 2.34 for `gpg.format=ssh`)
+- allowed_signers entry written from the same literal key (existing `setup_allowed_signers` path, fed by string instead of file — refactor it to accept key material, keeping the public-key validation guard)
+- In `-y` mode with no file-based key found: if exactly one modern agent key exists, use it (consistent with the existing "found existing key" auto-pick); otherwise skip with the current message.
+
+**2b. Smoke test via agent.** `verify_signing_setup` learns a second path: when no private key file exists, sign with `ssh-keygen -Y sign -U -f <tmp pubkey file>` (`-U`: private half lives in the agent). Same prompt gating — 1Password/Bitwarden will pop an approval dialog, which is exactly what we want the user to see once at setup time.
+
+**2c. Audit accepts `key::` signing keys.** `audit_signing` currently prints `(inline key)` for `ssh-*` values; extend the case to `key::*` and verify the *allowed_signers* file contains the same blob.
+
+### Phase 3 — SSH config integration
+
+**3a. `IdentitiesOnly` guard.** Before applying `IdentitiesOnly yes`, check: does any `IdentityFile` directive exist in global scope, or do on-disk `.pub` stubs match agent keys? If the user is agent-only with no stubs, offer to write **public-key stubs** (`~/.ssh/<name>.pub` — public material only, consistent with the zero-plaintext goal) plus matching `IdentityFile` lines, so `IdentitiesOnly yes` keeps working with the agent. Decline ⇒ skip the directive with a warning instead of applying a lockout.
+
+**3b. `IdentityAgent` offering.** When a 1Password/Bitwarden socket is detected and `SSH_AUTH_SOCK` doesn't already point at it, offer a global `IdentityAgent <socket>` directive (written with the v0.6.0 append-at-EOF semantics). Never overwrite an existing `IdentityAgent`.
+
+**3c. `ForwardAgent no` as an applied default** (matching the Phase 1d audit), with documentation of the per-host override pattern and a note that confirmation-prompting agents (1Password, `ssh-add -c`) are the safe way to use forwarding when it's unavoidable.
+
+### Phase 4 — Migration assistant (interactive only, opt-in)
+
+Goal: walk an existing file-based user to zero plaintext keys. Strictly guided, never automatic:
+
+1. Inventory on-disk private keys (from 1c) and show which are already represented in an agent.
+2. Print vault-import instructions per detected agent (1Password and Bitwarden import flows differ; we print, we don't drive their CLIs).
+3. After the user confirms a key is imported **and** the agent probe shows its public key, offer deletion of the on-disk private key under the v0.6.0 reset-signing safety rules: interactive only, default **No**, rename-to-`.bak` middle option, never in `-y` mode, keep the `.pub` stub for `IdentitiesOnly`.
+4. Re-run the relevant audits to show the end state.
+
+### Documentation
+
+- New REASONING.md section "Agent-Backed Keys" covering: why vault agents beat files, forwarded-agent threat model, `key::` signing, pub-stub pattern for `IdentitiesOnly`.
+- README quick-start for 1Password/Bitwarden users.
+
+## Acceptance criteria
+
+1. On a machine whose only keys live in a 1Password agent: full audit passes (given config applied), `audit_ssh_key_hygiene` lists the agent keys, signing works via `key::`, and the smoke test round-trips through the agent.
+2. `git-harden.sh -y` never blocks on an agent approval prompt and never deletes/moves any key file.
+3. Applying `IdentitiesOnly yes` is impossible in a state that would stop agent keys from being offered (stub-write or skip-with-warning).
+4. An unencrypted private key in `~/.ssh` is flagged as a security-tier finding; an encrypted one is not flagged red.
+5. Forwarded-agent sessions (`SSH_AUTH_SOCK` + `SSH_CONNECTION`) are detected and labeled; no file-based assumptions break inside such a session or an agent container.
+6. All existing BATS tests keep passing; new tests use a throwaway `ssh-agent` (start, `ssh-add` a generated key, delete the private file, point `SSH_AUTH_SOCK` at it) so CI needs no vault products.
+
+## Test plan
+
+- **Unit (BATS):** agent detection with fake sockets; key inventory merge/dedup; unencrypted-key classifier (generated with/without `-N`); `key::` signingkey audit; `IdentitiesOnly` guard paths (stubs present / agent-only / decline).
+- **Agent integration (BATS, real `ssh-agent`):** sign + verify via `-U` with the private key file removed — proves the zero-plaintext path end to end.
+- **Container e2e:** extend `test/containers/*` with an `ssh-agent`-only variant (no `~/.ssh/id_*` private files) running `--audit` and `-y`.
+- **Interactive:** new `test/interactive/test-signing-agent.sh` covering wizard option 3.
+
+## Non-goals
+
+- Driving 1Password/Bitwarden CLIs (`op`, `bws`) — instructions only; their CLIs change and require auth sessions.
+- gpg-agent-backed *GPG* signing (this project standardizes on SSH signing).
+- Hardware-token provisioning changes (FIDO2 path already exists and is orthogonal).
+
+## Open questions
+
+1. Should `-y` auto-adopt a single agent key for signing (2a) or stay fully passive? Current plan: adopt only when exactly one modern key exists, mirroring file-based behavior. Approval prompts make even this debatable in headless runs — the smoke test stays interactive-only either way.
+2. Bitwarden's agent socket path is configurable; do we read its `data.json` to find it, or only check the default path and document the rest? Plan: default path + docs.
+3. Stub naming for written `.pub` stubs (3a): derive from the key comment (sanitized) or `agent_<fingerprint-prefix>.pub`? Plan: comment-derived with fingerprint fallback.