Files
git-hardening/docs/specs/2026-03-30-e2e-container-tests.md
Flo 6dd2e6aec8 docs: update e2e spec for safety review gate
Add safety gate prompt to all interactive tmux scenarios, add new
"safety gate decline" test scenario, update acceptance criteria.

Closes: #10

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 23:47:11 +02:00

263 lines
13 KiB
Markdown

# spec: End-to-End Container Tests
## Overview
A test harness that runs the existing BATS test suite inside Docker/Podman containers across multiple Linux distributions. A developer invokes a single command (`test/e2e.sh`) and gets a pass/fail result per distro, confirming that `git-harden.sh` works correctly on each target platform.
## Purpose
Catch platform-specific regressions that the host-only BATS tests cannot surface: different default git versions, missing utilities, musl vs glibc edge cases, different `sed`/`grep` flavors, and package-layout differences (e.g. `git-credential-libsecret` paths).
### Non-Goals
- Testing macOS in containers (no official macOS Docker images; macOS is covered by running BATS on the host).
- Testing FIDO2 hardware key prompts (requires physical security key; cannot be simulated).
- CI/CD pipeline integration (GitHub Actions matrix YAML) -- that can be layered on later without spec changes.
- Building or publishing container images for end users.
- Testing with real SSH keys or real remotes.
## User Stories
**As a** contributor
**I want** to run `test/e2e.sh` and see per-distro pass/fail output
**So that** I know the script works on all supported Linux distributions before merging.
**As a** contributor
**I want** to run tests against a single distro for faster iteration
**So that** I can debug a platform-specific failure without waiting for the full matrix.
## Functional Requirements
### Runner Script (`test/e2e.sh`)
- Accepts an optional `--runtime` flag: `docker` (default) or `podman`. Auto-detects if only one is installed.
- Accepts an optional positional argument to run a single distro by name (e.g. `test/e2e.sh alpine`).
- Without arguments, runs all distros in the matrix sequentially and prints a summary table at the end.
- Exit code: 0 if all distros pass, 1 if any distro fails, 1 if the container runtime is not installed.
- Each distro run builds a container image (if not cached) and executes `test/run.sh` inside it.
- Passes `--tap` to BATS so output is machine-readable; the runner reformats it into a human-friendly per-distro summary.
- Build context is the repo root; only the files needed for testing are copied (script, test dir, submodules).
### Containerfiles (`test/containers/`)
- One `Containerfile.<distro>` per target distro. Each file:
1. Starts from the distro's official base image, pinned to a specific release tag (not `latest`).
2. Installs the minimum packages: `bash`, `git` (>= 2.34), `openssh` (client + `ssh-keygen`), `tmux`.
3. Creates a non-root test user and switches to it.
4. Copies `git-harden.sh` and `test/` into the image.
5. Sets `CMD` to `test/run.sh`.
### Distro Matrix
| Name | Base Image | Package Manager | Notes |
|------|-----------|-----------------|-------|
| `ubuntu` | `ubuntu:24.04` | apt-get | Mainstream deb-based |
| `debian` | `debian:trixie` | apt-get | Upcoming stable (Debian 13) |
| `fedora` | `fedora:42` | dnf | rpm-based |
| `alpine` | `alpine:3.21` | apk | musl libc, BusyBox coreutils |
| `arch` | `archlinux:base` | pacman | Rolling release, latest packages |
### Interactive Testing via `tmux`
The signing wizard and interactive apply flow read from `/dev/tty`, which does not exist in a container by default. Instead of `expect` (TCL), interactive tests use `tmux send-keys` to drive the prompts. This keeps all test code in bash, consistent with the rest of the project.
#### How it works
1. `tmux` is installed in every container alongside the other test dependencies.
2. Interactive test scripts live in `test/interactive/` as plain bash scripts.
3. Each script starts a `tmux` session, runs `git-harden.sh` inside it, and drives the interaction:
- `tmux new-session -d -s test "bash /path/to/git-harden.sh"` -- starts the script in a detached session with a real tty.
- A `wait_for` helper polls `tmux capture-pane -t test -p` until a pattern appears (or a timeout fires, defaulting to 10 seconds).
- `tmux send-keys -t test "y" Enter` -- sends keystrokes to the session.
- After the script exits, `tmux capture-pane` captures the final output for assertions.
4. No `--tty` flag needed on `docker run` / `podman run` -- `tmux` creates its own pseudo-terminal inside the container.
#### `wait_for` helper
```bash
# Wait for a string to appear in the tmux pane. Polls every 0.2s, times out after $2 seconds (default 10).
wait_for() {
local pattern="$1"
local timeout="${2:-10}"
local elapsed=0
while ! tmux capture-pane -t test -p | grep -qF "$pattern"; do
sleep 0.2
elapsed=$(( elapsed + 1 ))
if (( elapsed > timeout * 5 )); then
printf 'TIMEOUT waiting for: %s\n' "$pattern" >&2
tmux capture-pane -t test -p >&2
return 1
fi
done
}
```
#### Interactive scenarios to cover
**Note:** Every interactive run hits the **safety review gate** first ("Have you reviewed this script...?"). All scenarios below must send `y` + Enter to pass the gate before reaching the audit/apply flow.
| Scenario | `tmux send-keys` sequence | Verifies |
|----------|---------------------------|----------|
| Full interactive apply (accept all) | `y` + Enter (safety gate), `y` + Enter (proceed with hardening), `y` + Enter to each setting prompt | All settings applied; re-audit exits 0 |
| Interactive apply (decline some) | `y` + Enter (safety gate), `y` + Enter (proceed), then `n` + Enter for specific prompts | Declined settings remain unchanged |
| Safety gate decline | `n` + Enter (safety gate) | Script exits 0; prints AI review instructions; no config changes |
| Signing wizard: generate ed25519 key | `y` + Enter (safety gate), then through apply prompts, `1` + Enter for menu, Enter for empty passphrase (twice) | Key created at `~/.ssh/id_ed25519.pub`; signing config set |
| Signing wizard: use existing key | `y` + Enter (safety gate), then through apply prompts, `y` + Enter when prompted "Use this key?" | `user.signingkey` set to the existing key path |
| Signing wizard: skip | `y` + Enter (safety gate), then through apply prompts, `s` + Enter for menu | No signing key configured; `commit.gpgsign` not set |
#### What is NOT tested interactively
- FIDO2 key generation (`ssh-keygen -t ed25519-sk`) -- requires physical hardware token touch.
- Real passphrase entry with confirmation -- tests use empty passphrases to keep scripts simple.
### Test Isolation
- The existing BATS tests already create a fresh `$HOME` via `mktemp` per test. No changes to the test suite are required.
- Containers run with `--network=none` -- the tests do not need network access, and this prevents accidental external calls.
- Containers are removed after each run (`--rm`).
## Edge Cases & Error States
### Input Boundaries
| Condition | Expected Behavior |
|-----------|-------------------|
| Unknown distro name passed | Print available distros and exit 1 |
| Neither docker nor podman installed | Print clear error with install hint and exit 1 |
| `--runtime` points to missing binary | Print error naming the binary and exit 1 |
### Failure Modes
| Failure | Response |
|---------|----------|
| Container build fails (e.g. package 404) | Print build log, mark distro as FAIL, continue to next |
| BATS tests fail inside container | Capture TAP output, mark distro as FAIL, continue to next |
| Container runtime daemon not running | Print clear error ("Is the Docker/Podman daemon running?") and exit 1 |
| Disk full during image build | Container runtime's own error propagates; distro marked FAIL |
### Security Boundaries
| Threat | Mitigation |
|--------|------------|
| Container escapes host filesystem | `--network=none`, non-root user, no volume mounts (files are `COPY`'d) |
| Stale base images with CVEs | Pinned image tags; updating tags is a deliberate, reviewable change |
## Non-Functional Requirements
### Performance
- Full matrix (5 distros, cold build): under 5 minutes on a machine with a reasonable internet connection.
- Full matrix (warm cache, images already built): under 60 seconds.
- Single distro (warm cache): under 15 seconds.
### Portability
- `test/e2e.sh` itself must pass `shellcheck` and follow the project's shell standards (AGENTS.md).
- Works with Docker Engine >= 20.10 and Podman >= 4.0.
- `Containerfile` syntax (not `Dockerfile`) for Podman compatibility; Docker handles this fine too.
## Pre-Mortem
### Likely Failure Modes
| Failure | Why It Could Happen |
|---------|---------------------|
| Alpine tests fail due to BusyBox `sed`/`grep` differences | `git-harden.sh` uses `sed` and `grep` features that differ between GNU and BusyBox |
| Arch image breaks on next pacman keyring rotation | Rolling distro; base image may need periodic tag bumps |
| `wait_for` polling misses fast prompts or races | Prompt appears and is overwritten before `capture-pane` sees it, or script advances before `send-keys` arrives |
| `tmux` version differences across distros | Older tmux may lack `capture-pane -p` flag or have different `send-keys` behavior |
| BATS submodules missing in container | Build context doesn't include submodule contents |
### Mitigations
| Failure | Addressed By | Status |
|---------|--------------|--------|
| BusyBox incompatibilities | Testing on Alpine surfaces these; fixes go into `git-harden.sh` | Mitigated |
| Arch keyring breakage | Pinned to `archlinux:base` (monthly snapshots); update in a PR when needed | Accepted Risk |
| `wait_for` race conditions | 0.2s polling interval is fast enough for human-speed prompts; `git-harden.sh` blocks on `read` so prompts persist until input arrives | Mitigated |
| tmux version differences | `capture-pane -p` available since tmux 1.8 (2013); all target distros ship tmux >= 3.x | Mitigated |
| Missing BATS submodules | Containerfile copies `test/libs/` explicitly; build-time check | Mitigated |
## Acceptance Criteria
### Must Have
- [ ] **`test/e2e.sh` runs full matrix and reports per-distro results**
- Given: Docker or Podman is installed and running
- When: `test/e2e.sh` is invoked with no arguments
- Then: All 5 distros are tested; output shows PASS/FAIL per distro; exit code reflects overall result
- [ ] **Single-distro mode works**
- Given: Docker or Podman is installed
- When: `test/e2e.sh ubuntu` is invoked
- Then: Only the Ubuntu container is built and tested
- [ ] **`--runtime` flag selects container engine**
- Given: Both Docker and Podman are installed
- When: `test/e2e.sh --runtime podman`
- Then: Podman is used exclusively
- [ ] **All existing BATS tests pass on every distro in the matrix**
- Given: Containers are built from Containerfiles
- When: `test/run.sh` executes inside each container
- Then: All tests pass (exit 0) on Ubuntu, Debian, Fedora, Alpine, and Arch
- [ ] **Containers run with no network and no root**
- Given: Any distro container
- When: Inspecting the `docker run` / `podman run` command
- Then: `--network=none` is set and the test user is non-root
- [ ] **Runner handles missing container runtime gracefully**
- Given: Neither docker nor podman is on `$PATH`
- When: `test/e2e.sh` is invoked
- Then: Prints actionable error and exits 1
- [ ] **`test/e2e.sh` passes shellcheck**
- Given: The runner script exists
- When: `shellcheck test/e2e.sh` is run
- Then: No warnings or errors
- [ ] **Interactive apply flow works end-to-end via `tmux`**
- Given: A container with no prior git hardening and `tmux` installed
- When: `tmux`-driven script runs `git-harden.sh` (no flags), answering `y` to safety review gate, then `y` to all subsequent prompts
- Then: All settings applied; `git-harden.sh --audit` exits 0 afterward
- [ ] **Safety review gate decline exits cleanly**
- Given: A container with `tmux` installed
- When: `tmux`-driven script runs `git-harden.sh` (no flags), answering `n` to safety review gate
- Then: Script exits 0; output contains AI review instructions; no config changes made
- [ ] **Signing wizard key generation works via `tmux`**
- Given: A container with no existing SSH keys
- When: `tmux`-driven script runs `git-harden.sh`, selects option 1 (generate ed25519), provides empty passphrase
- Then: `~/.ssh/id_ed25519.pub` exists; `user.signingkey` is configured; `commit.gpgsign=true`
- [ ] **Signing wizard skip leaves signing unconfigured**
- Given: A container with no existing SSH keys
- When: `tmux`-driven script runs `git-harden.sh`, selects `s` (skip) at signing menu
- Then: `user.signingkey` is not set; `commit.gpgsign` is not set
### Should Have
- [ ] **Build failures don't abort the full matrix**
- Given: One distro's Containerfile has a broken package install
- When: `test/e2e.sh` runs the full matrix
- Then: The broken distro is marked FAIL; remaining distros still run
- [ ] **Summary table at end of full run**
- Given: Full matrix completes
- When: Runner finishes
- Then: A table showing distro name + PASS/FAIL + duration is printed to stderr
### Could Have
- [ ] Parallel distro execution (run containers concurrently for faster feedback)
- [ ] `--rebuild` flag to force image rebuild ignoring cache
### Won't Have (This Release)
- [ ] GitHub Actions / CI integration (separate concern, separate spec)
- [ ] macOS container testing
- [ ] Windows container testing
- [ ] Automatic base image tag bumping / Dependabot-style updates