CI Validation¶
Pulp validates branches on macOS (local), Ubuntu (SSH), and Windows (SSH) before merging.
Setting up a dedicated machine as a persistent CI runner? See self-hosted-runner.md for the walkthrough + first-run gotchas (git-lfs hook conflict, Xcode license, Apple Clang version skew).
Primary: Shipyard¶
Shipyard is Pulp's primary CI tool. It delivers exact SHAs via git bundles, runs your build/test commands on each platform, and gates merges on per-SHA evidence.
./tools/install-shipyard.sh # install pinned version
./tools/install-shipyard.sh --status # compare installed vs pinned
shipyard run # validate current branch
shipyard pr # create, track, validate, and merge on green
shipyard cloud run build <branch> # dispatch to Namespace
shipyard rescue <PR> # recover a wedged PR
shipyard runner watch --kill-hung-workers # prevent self-hosted runner wedges
shipyard update --check --json # report installed vs latest
Runner timing metrics¶
Pulp does not store CI timing history in the Pulp CLI or MCP server. When a checkout uses Shipyard, and optionally tartci for disposable local VMs, Shipyard owns the timing database and query surface:
- Shipyard can import GitHub Actions job timings and local command evidence.
- tartci can optionally emit per-VM runtime records for macOS, Linux, and Windows VM lanes: boot/setup/run/cleanup durations, labels, host, provider, golden/cache hints, outcome, and failure class.
- Shipyard imports those tartci records into its local metrics store and exposes agent-readable summaries, slowest lanes, trend/drift checks, comparisons, and placement advice.
This is mainly for agents watching Pulp CI over time. It gives them enough history to answer "is this runner behaving normally?", "did boot/build time regress?", "which lane should I monitor next?", and "is this worth investigating or just within the usual range?" Humans can use the same commands for high-level platform comparisons, but no observability service is required.
The shipyard metrics commands require a Shipyard build that includes the
metrics subcommand. Pulp's current pinned source-checkout version in
tools/shipyard.toml is v0.68.0, which does not include that surface yet. Use
a newer Shipyard binary or source checkout for this optional metrics workflow
until the Pulp pin is bumped.
# Enable VM runtime records on tartci hosts or LaunchAgents.
export TARTCI_RUNTIME_MEASURE=1
export TARTCI_RUNTIME_GH_ENRICH=1
# Inspect tartci's local VM timing records.
tartci runtime recent --repo danielraffel/pulp --limit 20 --json
tartci runtime summary --repo danielraffel/pulp --json
# Import both GitHub Actions and tartci VM timing into Shipyard's metrics store.
shipyard metrics import github --repo danielraffel/pulp --limit 50 --json
tartci runtime export --repo danielraffel/pulp --since-days 14 \
| shipyard metrics import tartci --json
# Agent-friendly queries.
shipyard metrics summary --project pulp --json
shipyard metrics slowest --project pulp --limit 20 --json
shipyard metrics watch --project pulp --since 14d --json
shipyard metrics advise --project pulp --json
Use the Shipyard and tartci docs for setup details; this guide only records how
Pulp expects agents and contributors to consume the optional integration.
Without tartci, shipyard metrics import github and manual/command metrics
still work for GitHub-hosted or SSH-backed CI lanes.
Pulp intentionally pins Shipyard in tools/shipyard.toml even if your daily
global shipyard is newer. Use shipyard pin bump --to vX.Y.Z for pin
updates instead of hand-editing the file; newer Rust Shipyard releases changed
the macOS asset shape to a signed/notarized .dmg, and the bump command keeps
the version and asset metadata in sync.
The public Pulp installer does not install Shipyard or GitHub CLI (gh).
That is intentional: ordinary Pulp users do not need either tool to create,
build, run, or upgrade projects. They are source-checkout contributor tools.
pulp pr defaults to Shipyard and fails with install/switch guidance if
Shipyard is missing; contributors who prefer their own PR flow can set
pulp config set pr.workflow github or manual. The github workflow uses
gh directly and requires it to be installed and authenticated. Run
pulp status to see the effective workflow and local tool health.
Optional local VM routing¶
Core Pulp development can also use local, disposable VMs through tartci. This is optional: a normal contributor can open a PR and let GitHub Actions run on hosted runners. The value of the local VM setup is faster feedback on trusted Apple Silicon hardware while keeping every job clean-per-run.
The current Pulp routing policy is intentionally kept in parseable TOML at
.shipyard/ci-profiles/normal-local-fast.toml
instead of copied into this guide. It names the PR, release, coverage,
scheduled, and issue-on-failure policies and maps stable target IDs to concrete
GitHub runs-on selectors. Shipyard owns orchestration and profile selection;
tartci owns the local VM providers, goldens, host caches, and per-host status.
Use the upstream docs for details:
- Shipyard profiles explain how profiles and fallback resolution work.
- tartci explains the Tart/QEMU VM
lanes,
tartci status --json, andtartci profile explain|plan. - mac-ci-host-setup.md is the Pulp-specific host setup guide for joining the macOS VM pool.
Shipping a PR: shipyard pr¶
shipyard pr is the single "ship this" orchestrator. Agents and humans should
route every normal ship cycle through it rather than pairing gh pr create
with shipyard ship manually. It:
- Runs
tools/scripts/skill_sync_check.py(hard-fails on missing SKILL.md updates). - Runs
tools/scripts/version_bump_check.py --mode=applyto bump SDK / Claude plugin / marketplace versions consistently. - Commits the bump (if any) as
chore: bump <surfaces>. - Pushes the branch, creates the PR, and records Shipyard tracking state.
- Runs cross-platform validate + merge on green.
- The auto-release workflow tags and publishes binaries on merge.
shipyard pr # primary ship path
shipyard pr --base develop/package-manager # ship to a develop branch
shipyard pr --title "..." # override PR title
shipyard pr --dry-run # print the plan without executing
pulp pr is a compatibility wrapper that delegates to shipyard pr by
default; it is valid, but guidance should name shipyard pr directly so
humans and agents understand where PR tracking state lives. Its github and
manual workflows are explicit local opt-outs and do not create Shipyard
tracking state.
Direct gh pr create is an emergency/manual bypass only. If it is used, call
out that the PR may not appear in Shipyard-managed state until it is reconciled
or re-shipped through Shipyard.
Shipyard v0.3.0 workflow surface¶
Shipyard v0.3.0 adds stateful ship resume, SSH --resume-from staging, and
incremental git bundles on top of the basic run / ship / cloud run
surface above.
# Resume an interrupted ship
shipyard ship --resume # pick up where the last session left off
shipyard ship --no-resume # discard stale state and ship fresh
# Inspect in-flight ship state
shipyard ship-state list # self-describing inventory: PR, title, URL, tip SHA, dispatched run IDs
shipyard ship-state show <pr> # full state for one PR
shipyard ship-state discard <pr> # archive stale state
# Prune old ship state + evidence
shipyard cleanup --ship-state # dry run — show what would be pruned
shipyard cleanup --ship-state --apply # prune closed-PR state + aged records
# Fast test iteration on any target
shipyard run --resume-from build # skip configure+setup, start at the build stage
shipyard run --resume-from test # skip configure+build, run tests only
shipyard run --targets windows --smoke # fast Windows-only preflight
shipyard run --targets windows --resume-from test # ~2 min rerun vs ~15 min full
# Target and config inspection
shipyard targets # list configured targets with reachability
shipyard targets test windows # probe a single target
shipyard config show # effective merged config
shipyard config profiles # list profiles plus the active one
Ship state lives at <state_dir>/ship/<pr>.json. Shipyard auto-resumes the
next time you run shipyard ship on the same PR — it refuses to resume if
the PR's head SHA or merge policy changed since the state was written, so a
rebase or force-push deliberately forces a fresh ship.
--resume-from works on both local and SSH targets. On SSH targets,
Shipyard probes the remote for a marker file proving the previous stage
passed for the exact SHA, and skips earlier stages when it finds one.
Incremental bundles — SSH validation now sends only the git delta between the remote HEAD and the target SHA. Typical cycles drop from ~443 MB to a few KB. No configuration needed — Shipyard falls back to a full bundle automatically when the delta would be larger than the full pack.
Validation Profiles¶
Shipyard validates from a profile (shipyard run --pipeline <name>).
Pulp's .shipyard/config.toml defines three:
| Profile | When to use | What it runs |
|---|---|---|
default |
Most PRs. The lane every cross-platform target gates on. | Full setup → configure → build → test. Examples ON. Excludes the slow ctest label. |
parser |
PRs that only touch runtime-import parser code. | Same stages with PULP_BUILD_EXAMPLES=OFF; tests filter to --label-include parser-import. Skips plugin validators (auval / pluginval / clap-validator) and the broader format-adapter smoke surface. |
smoke |
Quick downstream-scaffold check after dependency or install-layout edits. | Configure + build only; runs the SDK-smoke export against a downstream scaffold. |
gates |
Version-bump / skill-sync gate scripts. | tools/scripts/skill_sync_check.py + tools/scripts/version_bump_check.py in report mode. |
shipyard config profiles lists what is installed locally and which one is active.
Auto-selecting the parser profile¶
tools/scripts/validation_profile_select.py classifies the current diff
and prints parser or default:
# Default: diff HEAD against origin/main
shipyard run --pipeline "$(python3 tools/scripts/validation_profile_select.py)"
# Explicit diff base
shipyard run --pipeline "$(python3 tools/scripts/validation_profile_select.py --base origin/develop)"
# Operate on a literal file list (e.g. piped from gh pr diff)
gh pr diff <PR> --name-only \
| python3 tools/scripts/validation_profile_select.py --paths-from -
# JSON envelope (profile + matched + unmatched)
python3 tools/scripts/validation_profile_select.py --json
The script returns parser only when every changed path falls inside
the explicit parser-only scope (the standalone tools/import-design
tool, tools/import-validation scripts, the packages/pulp-import-ir
package, test/fixtures/imports/**, the parser test files in test/,
the core/view/.../design_import* family, and the import-runtime JS).
Any path outside that set forces default — the safety bias is toward
broad validation.
To opt out for an individual run, pass --pipeline default explicitly.
macOS overflow routing (Plan B)¶
Namespace is OFF (cost). We build macOS on local Macs + GitHub-hosted only.
PULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSONis kept UNSET, so the Namespace overflow described here never fires — it's a documented break-glass option, not the active path. The local overflow tier is the M5 Mac (PULP_OVERFLOW_BUILD_MACOS_RUNS_ON_JSON = pulp-build-m5); the required gate is the Mac Studio (PULP_LOCAL_MACOS_RUNS_ON_JSON). Do not repurpose the Namespace var to point at self-hosted runners (see CLAUDE.md "Runner priority").
When the local self-hosted Mac runner is saturated, build.yml's
resolve-provider job could route new macOS legs to a Namespace cloud runner
instead of queueing on local — but only if the Namespace var is set, which it is
not. Snap-back is automatic — once local clears, fresh dispatches return to
local. Source of truth:
planning/2026-05-13-namespace-overflow-implementation.md (Plan B,
reviewed by /codex 2026-05-13).
Precedence (highest first, resolved per dispatch):
- Operator override —
gh workflow run build.yml --field macos_runner_selector_json='"<label>"'. Always wins. - Overflow —
BUSY >= PULP_LOCAL_MAC_OVERFLOW_THRESHOLD(default2) ANDPULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSONis set AND the trigger ispull_request. Routes tonamespace-profile-generouscorp-macos. - Local default —
PULP_LOCAL_MACOS_RUNS_ON_JSON(currently["self-hosted","sanitizer"]).
Tuning knobs (repo variables):
| Variable | Default | Purpose |
|---|---|---|
PULP_LOCAL_MAC_OVERFLOW_THRESHOLD |
2 |
BUSY count that triggers overflow. Raise when Plan A's 2nd local runner lands. |
PULP_LOCAL_MAC_RUNNER_LABEL |
sanitizer |
Label the busy probe filters runners by. Must match PULP_LOCAL_MACOS_RUNS_ON_JSON's target label. |
PULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSON |
(unset) | Namespace selector JSON, e.g. "namespace-profile-generouscorp-macos". When empty, overflow is disabled — everything stays local. |
Disabling overflow (revert to local-only):
The next PR's resolve-provider falls through to local since condition #2 fails.
Inspecting routing decisions: resolve-provider's stderr prints a one-line summary, e.g. resolve-provider: macOS route = overflow (BUSY=2 >= 2); selector = "namespace-profile-generouscorp-macos". Find it via the GitHub Actions UI under the resolve-provider job's log, or:
Manual overflow / rescue is still available via shipyard rescue <PR> and remains useful for in-flight PRs that queued before the overflow logic kicked in. With Plan B in place, manual rescue should be needed much less frequently.
pulp overflow — operator surface¶
tools/cli/cmd_overflow.cpp wraps the three repo variables behind a discoverable CLI:
# Show current routing state (local target, overflow target, threshold,
# plus self-hosted runner registration if visible to the default token):
pulp overflow status
# Turn overflow on (defaults to free GH-hosted "macos-15"):
pulp overflow enable
pulp overflow enable --to '"namespace-profile-generouscorp-macos"' # paid Namespace
# Turn overflow off — every macOS leg goes to the local target.
# In-flight cloud jobs continue to completion; only new dispatches change.
pulp overflow disable
# Read / set the BUSY threshold (default 2; set to 1 for single-runner setups):
pulp overflow threshold
pulp overflow threshold 1
pulp overflow disable does not cancel in-flight cloud runs — it's a
config-change only. To force a currently-routed-to-cloud PR back to local,
use pulp macos retarget --pr N --to local (see "Per-PR macOS retargeting"
below).
Per-PR macOS retargeting (pulp macos)¶
For the case where automatic overflow picked the "wrong" pool — e.g. you want to push a specific PR to Namespace for paid-fast turnaround, or pull a queued GH-hosted job back to the local Mac because local just freed up — use the build-macos.yml workflow + the pulp macos CLI:
# Switch PR's macOS leg to the local self-hosted Mac, freeing the GH-hosted slot:
pulp macos retarget --pr 1910 --to local
# Pay to skip the queue (Namespace billable, fast parallel):
pulp macos retarget --pr 1910 --to namespace
# Force GH-hosted macos-15 (free, slower):
pulp macos retarget --pr 1910 --to github-hosted
# See where the current macOS check is routed:
pulp macos status --pr 1910
pulp macos retarget cancels any in-flight macOS-bearing workflow_run for the PR and fires a fresh build-macos.yml dispatch on the chosen runner. Branch protection's required macos check is satisfied by whichever workflow most recently produced that check name, so retargeting supersedes the previous macOS leg without re-running Linux/Windows.
build-macos.yml is independent of build.yml's matrix — they share check names but not workflow_runs. The matrix workflow continues running Linux/Windows as usual; only the macOS leg is replaced.
Workflow inputs (visible in gh workflow run build-macos.yml --help):
| Input | Default | Effect |
|---|---|---|
runner |
local |
Routes to PULP_LOCAL_MACOS_RUNS_ON_JSON |
runner=namespace |
— | Routes to PULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSON |
runner=github-hosted |
— | Routes to "macos-15" (free GH-hosted) |
target_ref |
(workflow's ref) |
Branch / SHA to build |
Opportunistic reroute daemon¶
tools/scripts/macos_reroute_watcher.py is a long-running watcher (intended as a launchd agent on the self-hosted Mac) that automates the "when local frees up, claw back queued GH-hosted jobs" pattern. It polls every 30 seconds:
- Is the local Mac runner idle? (process-based detection via
ps; no admin token needed.) - Is there a queued Build-and-Test workflow_run whose macOS job has
macos-15(ornscloud-*/namespace-profile-*) labels — i.e., dispatched to cloud but not yet picked up?
When both conditions hold, the watcher invokes pulp macos retarget --pr N --to local to cancel the cloud dispatch and rerun on local with warm caches. A 5-minute flap-guard prevents repeatedly bouncing the same PR.
Install (one-time per host):
# Copy the template into LaunchAgents, substituting your Pulp checkout path:
sed "s|\$PULP_REPO|$PWD|g" \
tools/launchd/pulp-macos-reroute-watcher.plist.template \
> ~/Library/LaunchAgents/com.danielraffel.pulp.macos-reroute-watcher.plist
launchctl load ~/Library/LaunchAgents/com.danielraffel.pulp.macos-reroute-watcher.plist
# Logs:
tail -F ~/Library/Logs/pulp/macos-reroute-watcher.log
Run by hand for testing:
Stop:
The watcher is safe to run alongside the overflow probe in build.yml — they cooperate. The probe decides where to dispatch initially; the watcher opportunistically reroutes dispatches that landed on cloud while local was busy, once local frees up before the cloud runner has picked up the job. If the cloud runner has already started, the watcher takes no action.
Self-hosted runner operations: prevent, recover, maintain¶
Shipyard v0.55.0+ is the minimum pin for the full self-hosted-runner
operational toolkit, and Pulp pins v0.56.2+ so rescue, update, and
shipyard wait pr have REST fallback paths when GraphQL is rate-limited. The
commands are discoverable from shipyard --help and replace the legacy
planning/scripts/runner-watchdog.sh + manual reinstall workflow.
# Recover one PR whose required macOS check is wedged or stale
shipyard rescue <PR> # cancel queued runs + redispatch to github-hosted
shipyard rescue <PR> --rerun-failed # also re-arm cancelled/failed runs
shipyard rescue <PR> --dry-run # preview without acting
shipyard rescue --all-stuck # repo-wide stuck-run sweep
shipyard rescue <PR> --to github-hosted # explicit destination provider
# Prevent future wedges on a self-hosted runner host
shipyard runner watch --kill-hung-workers # implies --fix; pair with launchd/systemd
# Keep the installed Shipyard CLI current
shipyard update --check --json # report installed vs available
shipyard update # apply latest stable
shipyard update --to v0.56.2 # pin or roll back to Pulp's minimum
shipyard update --dry-run # plan only
# Wait after handoff/rescue without depending solely on GraphQL
shipyard wait pr <PR> --state green # REST fallback as of v0.56.2
GraphQL quota fallback for PR sweeps¶
The gh pr ... --json and gh pr merge paths can consume or require GitHub's
GraphQL quota. That quota is separate from the REST core quota and can hit
zero while REST still has thousands of calls available.
When a broad PR sweep hits GraphQL exhaustion, switch the sweep to REST instead of waiting:
gh api rate_limit --jq '.resources | {core, graphql}'
gh api repos/OWNER/REPO/pulls/PR
gh api repos/OWNER/REPO/commits/SHA/check-runs?per_page=100
gh api repos/OWNER/REPO/actions/jobs/JOB_ID/logs
For a PR already verified green through REST, merge through the REST endpoint:
head_sha=$(gh api repos/OWNER/REPO/pulls/PR --jq '.head.sha')
gh api repos/OWNER/REPO/pulls/PR/merge \
-X PUT \
-f sha="$head_sha" \
-f merge_method=squash \
-f commit_title='subject (#PR)'
If the merge endpoint returns 405 Base branch was modified, refresh the PR
state and check runs through REST, recompute head_sha, then retry once only
if the refreshed head SHA and green status are still the values you intend to
merge. This is a transport fallback, not a validation bypass: do not merge
around real CI, coverage, sanitizer, or review failures.
Use shipyard rescue when a PR is otherwise ready but blocked by queued,
cancelled, or failed runner contexts caused by a self-hosted-runner wedge. It is
the PR-side recovery path and avoids the old failure mode where cancelling
queued runs left required checks stuck as failure.
Use shipyard runner watch --kill-hung-workers on the runner host itself. It
auto-cancels stale queued runs and kills hung Runner.Worker processes through
Shipyard's safe recovery sequence: snapshot, SIGTERM, grace period, SIGKILL,
child reaping, partial-build quarantine, Listener verification, and optional
wait for GitHub status to flip. Its JSON output uses runner.watch envelopes
with event=auto_kill_worker and phase values of attempt, killed,
failed, or no-pid-found.
Use shipyard update instead of the old ad hoc curl install.sh | sh path
once a machine already has Shipyard installed. Pulp still records the canonical
repo pin in tools/shipyard.toml; shipyard update --check --json is the
machine-local drift check, while shipyard pin bump --to vX.Y.Z is the repo
pin-change workflow.
Required Merge Process (All Agents)¶
Every change to main must go through this workflow — no exceptions:
- Branch — work on
feature/*orfix/*, never directly on main - Ship — run
shipyard prto create and track the PR, validate on macOS + Ubuntu + Windows, and merge on green - GitHub Actions — PR also triggers build+test CI on all 3 platforms (redundant safety net)
The ci skill (.agents/skills/ci/SKILL.md) captures this as the authoritative trigger list — natural-language phrases like "ship this", "push a PR", "we're done", and "run CI" all route through shipyard pr.
Legacy: pulp ci-local¶
tools/local-ci/local_ci.py is the previous CI controller. It remains available as a fallback but is scheduled for removal (see issue #120).
TL;DR¶
pulp ci-local runqueues the currentHEADin a machine-global queue shared by every worktree on that Mac.pulp ci-local run <branch>queues that branch tip's exact commit SHA, not the launching checkout'sHEAD.pulp ci-local run --smokequeues a fast clean install/export preflight instead of a full test run.- The queue serializes jobs, not targets. One CI job runs at a time, but its requested targets (
mac,ubuntu,windows) run in parallel inside that job. - Mac runs locally. Ubuntu and Windows run over SSH against repos you already cloned on those machines.
- Remote targets validate the exact queued git SHA, not "whatever the branch points to later". The runner uploads that SHA as a git bundle before validation, so full-matrix checks do not depend on the host already seeing your latest branch tip.
pulp ci-local statusshows the active runner, pending jobs, SSH/VM reachability, and live per-target state for the running job.pulp ci-local bump <job-id> highmoves a pending job forward.- queueing now prints the submission root, current cwd, config path/source, and per-target host preflight before a job is recorded
- queueing fails fast if you launched from the wrong git root or selected an SSH target that is currently unreachable with no fallback, unless you explicitly override that safety check
- While a job is running,
pulp ci-local statusalso shows live per-target state such asmac=pass, ubuntu=pass, windows=running. - Quiet long-running targets now emit runner heartbeats, so
statuscan showheartbeat=...,idle=..., andliveness=quiet|stuckeven when the underlying toolchain has not printed a new line recently. - If you queue a newer SHA for the same branch, targets, and validation mode, older pending work is superseded automatically instead of sitting behind it forever.
pulp ci-local logs <job-id> --target windowstails the saved per-target log from the machine-global CI state dir, so you do not need ad hoc SSH just to see whether a target is building or testing.pulp ci-local evidence [branch]shows the last-good exact-SHA target evidence already recorded for a branch, so you can keep earlier same-SHA passes instead of rerunning them blindly.pulp ci-local cleanupshows reclaimable local-CI disk usage without deleting anything;--applyis blocked while jobs are running.pulp ci-local cloud workflowslists the GitHub Actions workflows that the local CI control plane knows how to dispatch, plus which runner providers each one supports.pulp ci-local cloud run <workflow> [branch]dispatches a GitHub Actions workflow deliberately when workflow semantics or neutral-host confirmation matter more than the local queue.pulp ci-local cloud statusshows the latest tracked GitHub Actions dispatches that this machine has launched;pulp ci-local statusincludes the same recent cloud summary alongside local queue state.- Persistent local CI hosts now keep a prepared root per
target + validationso a narrow same-SHA rerun can reuse earlier work instead of rematerializing from scratch. - If a runner is interrupted, the queued job keeps its last-known per-target state so you can see what already passed before deciding whether to rerun everything or just the remaining target.
- Jobs submitted through
pulp ci-localare globally queued, and validation itself now takes a per-host lock on macOS/Linux plus a Windows host mutex, so oldvalidate-build.shruns wait instead of colliding. - SSH targets receive a per-job git bundle before validation. That keeps exact-SHA validation working even when the host validates from a stale local mirror instead of GitHub directly.
- Windows SSH jobs execute from short detached worktrees under
C:\pulp-ci, and stale worktree metadata is pruned automatically before reruns. - If a stale runner leaves behind an old Windows validator, the next drain pass now targets that specific remote validator PID for cleanup before starting new work, and
statuskeeps the cleanup result visible. - For Windows SSH validation, choose the configured target whose non-interactive PowerShell context resolves
git,cmake, andctest. Keep those host aliases local to your environment; shared repo docs should describe the selection rule, not your personal machine names. - Reuse is a persistent-host feature for local macOS and SSH-backed/self-hosted hosts. Ephemeral cloud runners should keep the default clean path unless a later policy explicitly opts them in.
- Truly raw ad hoc
ssh,cmake, or custom background processes still bypass coordination until they are stopped or migrated.
Why local instead of cloud¶
Pulp has GitHub Actions workflows for CI, but running them on every branch costs money. Local CI is free and faster for iterative development — you get results in minutes from machines you already own or have running locally. Cloud CI remains available for release branches, public PRs, and the narrow cases where you need workflow-level or neutral-host confirmation.
Cloud orchestration is now available through the same control plane:
pulp ci-local cloud workflowspulp ci-local cloud run <workflow> [branch]pulp ci-local cloud status [dispatch-id|latest]
That cloud surface is intentionally separate from the local queue. run,
check, ship, enqueue, and drain still operate on the exact-SHA
local/SSH queue. cloud run dispatches GitHub Actions explicitly and tracks the
result beside local CI state instead of pretending a hosted workflow is just
another local target.
Namespace is now wired into the deliberate cloud companion path for both
docs-check.yml and build.yml. The normal day-to-day default remains
local-first: macOS runs locally, while deliberate cloud dispatches can route
Linux/Windows through Namespace and keep macOS local unless you opt into a
one-off cloud macOS selector.
How it works¶
When you run pulp ci-local, it:
- Queues a job in a machine-global queue shared by every worktree on that Mac
- Prints the exact queue intent first: submission root, cwd, config path/source, and remote-host preflight
- Runs only one queue drain owner at a time, so separate agents do not stampede the same Mac and VMs
- Validates locally on Mac via
./validate-build.sh --ref <sha> - For each SSH target in
config.json: uploads a per-job git bundle, injects that exact SHA into the configured repo on the host, then validates it there - If an SSH target is unreachable, it tries to start the corresponding UTM VM, waits for it to boot, then retries the SSH connection
- Drains queued work on login or wake if you install the launchd agent
Mac validation always runs. SSH targets are skipped if disabled in config.
GitHub Actions companion¶
Use the cloud subcommands when you want GitHub Actions as the orchestrator,
not when you want another exact-SHA local queue job:
pulp ci-local cloud workflows
pulp ci-local cloud defaults
pulp ci-local cloud history
pulp ci-local cloud compare build
pulp ci-local cloud recommend build
pulp ci-local cloud run build feature/my-branch
pulp ci-local cloud run build feature/my-branch --provider namespace
pulp ci-local cloud run build feature/my-branch --provider namespace --macos-runner-selector-json '"namespace-profile-big-apple"'
pulp ci-local cloud run build feature/my-branch --provider namespace --macos-runner-selector-json '"nscloud-macos-tahoe-arm64-6x14"'
pulp ci-local cloud run docs-check feature/my-branch --provider namespace --wait
pulp ci-local cloud run docs-check feature/my-branch --provider namespace --runner-selector-json '"namespace-profile-big-apple"'
pulp ci-local cloud namespace doctor
pulp ci-local cloud namespace setup
pulp ci-local cloud status
pulp ci-local cloud status latest --refresh
Important constraints in the current phase:
cloud rundispatches by branch name, not by a detached exact SHA- cloud dispatch records are persisted under the same machine-global CI state
directory as local results, but they do not enter
queue.json - local
statusremains fast and local-first; it shows the latest tracked cloud summaries without hitting GitHub unless you explicitly runcloud status --refresh cloud defaultsshows the effective workflow/provider defaults plus where the current selector values came from (local config versus repo-variable fallback)cloud historyshows recent tracked cloud runs with saved timing plus any configured estimated cost line itemscloud compare <workflow>rolls up observed provider medians for a workflow from tracked run historycloud recommend <workflow>suggests a provider from recorded cloud history; it is intentionally conservative and uses observed medians instead of hardcoded guessescloud statusnow reports Namespace runtime/machine-shape truth when the run was launched on Namespace andnsccan see the matching instances- tracked cloud runs now persist queue-delay and elapsed-duration timing so the later comparison view can answer "how long did GitHub-hosted vs Namespace take?" from saved run history instead of rough notes
- estimated cost output is opt-in via local config; every estimate is labeled
estimated; verify provider pricing - if the provider CLI does not expose billing totals, Pulp keeps reporting runtime and machine shape instead of inventing invoice truth
- if a Namespace dispatch dies in
resolve-providerbefore any matrix leg starts, inspect the GitHub run annotations first; provider billing or control-plane failures are a different problem from repo or workflow breakage build.ymlnow acceptsrunner_providerand routes Linux and Windows through the selected provider; macOS is omitted from the cloud build by default so it can stay local-first- Default provider for PR checks is controlled by the GitHub repo variable
PULP_DEFAULT_RUNNER_PROVIDER. Set it tonamespaceto route all PR checks through Namespace runners (faster, parallel). Set togithub-hostedto use GitHub-hosted runners (free tier, queued). Theworkflow_dispatchinput overrides this for manual runs. To change the default: buildalso accepts one-off leg overrides:--linux-runner-selector-json,--windows-runner-selector-json, and--macos-runner-selector-json; that means you can keep the normal Linux/Windows Namespace + macOS local default and still do an explicit one-off macOS Namespace build without changing saved config- those one-off selector overrides can be either:
a Namespace profile label such as
"namespace-profile-generouscorp-macos", or a direct Namespace machine label such as"nscloud-macos-tahoe-arm64-6x14" docs-checkaccepts an explicit--runner-selector-jsonoverride, for example"namespace-profile-default"or["self-hosted","linux"]- if no explicit selector is passed,
docs-checkfalls back togithub_actions.workflows.docs-check.providers.<provider>.runner_selector_jsonin local config when present, then to the repo variablePULP_NAMESPACE_DOCS_CHECK_RUNS_ON_JSONfor the Namespace provider buildcan take Linux/Windows Namespace selectors fromgithub_actions.workflows.build.providers.namespace.linux_runner_selector_jsonand.windows_runner_selector_jsonin local config, and the workflow also supports repo-variable fallbacksPULP_NAMESPACE_BUILD_LINUX_RUNS_ON_JSONandPULP_NAMESPACE_BUILD_WINDOWS_RUNS_ON_JSON- macOS Namespace is an explicit validation path, not part of the default cloud
build: if you want to test macOS on Namespace, provide
--macos-runner-selector-json, or setgithub_actions.workflows.build.providers.namespace.macos_runner_selector_jsonin local config, orPULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSON - make sure that selector points at a real macOS-capable Namespace profile:
GitHub job names alone do not guarantee the underlying OS, and a Linux-backed
profile can still satisfy the
runs-onlabel while executing the leg on Linux instead of macOS - if you want macOS to stay local-first by default, leave the macOS selector
unset in shared config and repo variables, and pass
--macos-runner-selector-jsononly for one-off validation runs - for the Namespace path, install the
nscCLI and runnsc loginfirst before trying to route work there; that is the recommended operator setup path for this pilot - SSH/VM target topology and Namespace provider setup stay separate:
targets.*still configures local/SSH validation hosts, while Namespace provider routing lives under the GitHub Actions workflow/provider config and thecloud namespacehelper commands
Fast-CI vs full-CI (build.yml)¶
Issue #1589 split the Build and Test workflow into two test
trajectories without forking the YAML:
- Fast-CI runs on
pull_requestevents. The ctest invocation excludes BOTH thevalidationandslowCTest labels, dropping the longest-running tests so PR cycle time stays tight. Examples that carryLABELS slowtoday (defined intest/CMakeLists.txt): cmake-ios-auv3-configure— fresh-cache ~3 min iOS-leg configurecmake-pulp-add-binary-data-encoder/cmake-pulp-install-layoutpulp-test-hot-reload,pulp-test-scripted-ui,pulp-test-scan-cache,pulp-test-scan-blacklist(filesystem-mtime sleep loops)-
pulp-test-sync,pulp-test-sync-race-hammer,pulp-test-events-timer-helpers(race + timer hammers; also covered under sanitizer.yml's TSan lane) -
Full-CI runs on
pushtomain, the nightly schedule, andworkflow_dispatch. Only thevalidationlabel is excluded — everyslow-labelled test runs before code lands on the release lane.
Both paths satisfy the branch-protection-required macos /
advisory linux / advisory windows alias gates because the alias
jobs read each matrix leg's outcome via the GitHub API.
Tagging a new test as slow¶
Add LABELS slow either to a single test's set_tests_properties, or
to a Catch2 binary's catch_discover_tests(... PROPERTIES LABELS slow)
so every discovered test inherits the label:
add_test(NAME my-expensive-cmake-smoke COMMAND ...)
set_tests_properties(my-expensive-cmake-smoke PROPERTIES
LABELS "smoke;slow"
TIMEOUT 600)
add_executable(pulp-test-my-suite test_my_suite.cpp)
target_link_libraries(pulp-test-my-suite PRIVATE pulp::view Catch2::Catch2WithMain)
catch_discover_tests(pulp-test-my-suite PROPERTIES LABELS slow)
Rule of thumb for slow: a test consistently >5 sec on at least one
platform, OR a sleep-bounded smoke (file-mtime, hammer race, message-loop
bound) whose value lies in soak coverage rather than per-PR feedback.
Anything covered by sanitizers.yml or another scheduled lane is a
strong candidate.
Demoting a fast test to slow (or vice versa)¶
ctest --test-dir build -L slow -N lists every test currently tagged
slow. To move a test in or out of the fast-CI surface, add or remove
the slow label in test/CMakeLists.txt (or the appropriate subdir
CMakeLists) and reconfigure. There's no separate registry to keep in
sync.
Switching a job's runner without a code change¶
Every runner-selection decision in .github/workflows/*.yml is driven by
tools/scripts/resolve_runs_on.py. That means the runner for any job
below can be flipped between GitHub-hosted, Namespace, and local
self-hosted by setting a repository variable — no workflow edit, no
full-matrix re-run, no PR. This is the fluidity we want: move a job
mid-incident without touching code.
Precedence (identical for every variable below)¶
For each target the resolver checks, in this order:
- A
workflow_dispatchinput (if present on the workflow) — one-off override. - The target's repository variable (the
PULP_*_RUNS_ON_JSONvalues below). - For the build matrix only:
PULP_DEFAULT_RUNNER_PROVIDER+ the provider's selector var (PULP_NAMESPACE_*orPULP_LOCAL_*). - A hard-coded default label (e.g.
macos-14,ubuntu-24.04,macos-15).
When every variable below is unset, the workflows resolve to exactly the hard-coded defaults they had before this mechanism was lifted into a shared resolver. Nothing changes by default. Setting one variable moves one job. Nothing more.
Coverage is stricter than the build matrix. It reads explicit
workflow_dispatch inputs and PULP_COVERAGE_*_RUNS_ON_JSON, not
PULP_NAMESPACE_BUILD_*. If coverage moves local, use a dedicated ephemeral
label such as pulp-coverage-vm-macos; do not point coverage at pulp-build,
pulp-build-vm, or the warm macOS gate pool.
Global default (build.yml matrix only)¶
| Variable | Effect | Example |
|---|---|---|
PULP_DEFAULT_RUNNER_PROVIDER |
Default provider for Linux and Windows legs of build.yml. One of github-hosted | namespace | local. Falls back to github-hosted when unset. |
gh variable set PULP_DEFAULT_RUNNER_PROVIDER --body "namespace" |
build.yml — Linux / Windows / macOS legs¶
| Variable | Provider | Example |
|---|---|---|
PULP_NAMESPACE_BUILD_LINUX_RUNS_ON_JSON |
Namespace | gh variable set PULP_NAMESPACE_BUILD_LINUX_RUNS_ON_JSON --body '["namespace-profile-generouscorp"]' |
PULP_NAMESPACE_BUILD_WINDOWS_RUNS_ON_JSON |
Namespace | gh variable set PULP_NAMESPACE_BUILD_WINDOWS_RUNS_ON_JSON --body '["namespace-profile-generouscorp-windows"]' |
PULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSON |
Namespace (optional) | gh variable set PULP_NAMESPACE_BUILD_MACOS_RUNS_ON_JSON --body '"namespace-profile-generouscorp-macos"' |
PULP_LOCAL_MACOS_RUNS_ON_JSON |
Local macOS ARM64 VM pool | gh variable set PULP_LOCAL_MACOS_RUNS_ON_JSON --body '["self-hosted","macOS","ARM64","pulp-build","pulp-build-vm"]' |
PULP_LOCAL_LINUX_RUNS_ON_JSON |
Local Linux ARM64 VM pool | gh variable set PULP_LOCAL_LINUX_RUNS_ON_JSON --body '["self-hosted","Linux","ARM64","pulp-build-linux"]' |
PULP_LOCAL_WINDOWS_RUNS_ON_JSON |
Local Windows ARM64 QEMU pool | gh variable set PULP_LOCAL_WINDOWS_RUNS_ON_JSON --body '["self-hosted","Windows","ARM64","pulp-build-windows"]' |
.tartci/normal-local-fast.toml is the repo-local profile that documents the
intended default policy: PR macOS, Linux, and Windows prefer local ARM64 VMs
where available, while GitHub-hosted x64 Linux/Windows remains the scheduled
compatibility safety net. Use tartci profile plan normal-local-fast --repo
danielraffel/pulp --json from the tartci checkout to see the concrete variable
values before changing repository variables.
Do not put ordered fallback chains directly into GitHub Actions. GitHub receives
one runs-on selector per job; Shipyard/tartci must resolve "Mac Studio, then
M5/blackbook, then GitHub" before dispatch or variable application.
Windows local QEMU is Windows ARM64. An x64 MSVC/Prism smoke can be useful, but
it is not a replacement for the GitHub-hosted windows-latest Intel/x64 signal
until explicitly proven.
Nightly GitHub Intel validation¶
.github/workflows/cross-platform-check.yml is the scheduled Linux/Windows
Intel safety net for this profile. It runs GitHub-hosted ubuntu-latest and
windows-latest, files or updates one deduped issue per broken platform, and
auto-closes the tracker when the platform recovers. Do not add a duplicate
nightly Intel workflow unless this one is deliberately retired.
sanitizers.yml — per-sanitizer target selection¶
Each sanitizer job resolves independently. Setting one variable moves exactly that sanitizer; the others stay on their defaults.
| Variable | Default label when unset | Example (dedicated sanitizer VM label) |
|---|---|---|
PULP_SANITIZER_ASAN_RUNS_ON_JSON |
macos-14 |
gh variable set PULP_SANITIZER_ASAN_RUNS_ON_JSON --body '["self-hosted","macOS","ARM64","pulp-sanitizer-vm-macos"]' |
PULP_SANITIZER_TSAN_RUNS_ON_JSON |
macos-14 |
gh variable set PULP_SANITIZER_TSAN_RUNS_ON_JSON --body '["self-hosted","macOS","ARM64","pulp-sanitizer-vm-macos"]' |
PULP_SANITIZER_UBSAN_RUNS_ON_JSON |
macos-14 |
gh variable set PULP_SANITIZER_UBSAN_RUNS_ON_JSON --body '["self-hosted","macOS","ARM64","pulp-sanitizer-vm-macos"]' |
PULP_SANITIZER_RTSAN_RUNS_ON_JSON |
ubuntu-24.04 |
gh variable set PULP_SANITIZER_RTSAN_RUNS_ON_JSON --body '["self-hosted","linux","x64","sanitizer"]' |
The three macOS sanitizers (ASan/TSan/UBSan) carry a --deny-labels
pulp-build,pulp-build-vm guard in sanitizers.yml's resolver, so a
sanitizer can never be misrouted onto the required-gate pool (the
resolver hard-fails). They no longer read PULP_NAMESPACE_BUILD_MACOS_*
either — the per-sanitizer var is the single switch (#4101).
Capacity finding (#4101) — localize at most one, and only TSan.
macOS allows only two running macOS guests per host (Apple's limit),
and both belong to the required macos build gate. A local sanitizer VM
is a third guest, so localizing is gated on the tartci idle-gate:
the pulp-sanitizer-vm-macos lane shares TART_HOME with the gate (a real
host-wide 2-guest semaphore) and yields its slot whenever the gate has
queued/in-progress work — it can never starve the required check the way
the coverage lane did. Pick TSan: it is the longest sanitizer (scoped
-j1 serial, ~45 min on the 3 vCPU macos-14) and the highest value for a
real-time audio framework, and being single-core-bound it gains most from a
local M-series runner. The other three stay on macos-15: the four run in
parallel on GitHub but would serialize (~4×) on one cap=1 local lane,
which is slower than hosted except during a hosted backlog. Full parallel
local sanitizers would need a third macOS host. Roll out one sanitizer at a
time, each behind a measured go/no-go (gate queue latency + matrix
wall-clock).
One-off overrides via workflow_dispatch¶
sanitizers.yml accepts one *_runner_selector_json input per
sanitizer. They win over the corresponding repo variable for a single
manual run:
gh workflow run sanitizers.yml \
-f tsan_runner_selector_json='["self-hosted","macOS","ARM64","pulp-sanitizer-vm-macos"]'
Prove TSan green via this dispatch (with the pulp-sanitizer-vm-macos
LaunchAgent loaded and the tartci idle-gate present) before setting
PULP_SANITIZER_TSAN_RUNS_ON_JSON. The lane template is
tools/launchd/pulp-tart-runner-sanitizer-macos.plist.template (ships
parked — see its header for the load/quiet-window preconditions).
coverage.yml accepts linux_runner_selector_json,
macos_runner_selector_json, and windows_runner_selector_json inputs. The
macOS Tart coverage proof path is:
gh workflow run coverage.yml \
-f macos_runner_selector_json='["self-hosted","macOS","ARM64","pulp-coverage-vm-macos"]'
Only set PULP_COVERAGE_MACOS_RUNS_ON_JSON to that selector after the proof
run uploads the os-macos Codecov flag. The coverage LaunchAgent uses
--queue-match-labels so existing hosted Coverage jobs do not accidentally
boot a local coverage VM.
build.yml has the equivalent linux_runner_selector_json,
windows_runner_selector_json, and macos_runner_selector_json inputs.
These are the same inputs already documented above; they are listed
here for completeness alongside the repo-variable knobs.
Reverting¶
Unset the variable and the job falls back to the hard-coded default immediately on the next run:
No code change is needed to revert, either.
Registering a self-hosted Mac runner (appendix)¶
Flipping a job to "self-hosted" labels assumes those labels are
advertised by a running GitHub Actions runner somewhere. Register one
on the Mac you want the job to run on:
# 1. From repo Settings -> Actions -> Runners, click "New self-hosted runner"
# to get a short-lived registration token. Then on the Mac:
mkdir -p ~/actions-runner && cd ~/actions-runner
curl -o actions-runner.tar.gz -L https://github.com/actions/runner/releases/latest/download/actions-runner-osx-arm64.tar.gz
tar xzf actions-runner.tar.gz
# 2. Register with labels that match the JSON you set in the repo var.
# Example for the TSan / sanitizer lane:
./config.sh --url https://github.com/danielraffel/pulp \
--token <REGISTRATION_TOKEN> \
--name "$(hostname)-sanitizer" \
--labels "self-hosted,macos,arm64,sanitizer" \
--work _work
# 3. Install as a launchd service so it runs at login and survives reboots.
./svc.sh install
./svc.sh start
./svc.sh status
Operational note. Self-hosted runners execute arbitrary code from any branch that can trigger the workflow. Use them on dedicated hardware / VMs you control, not shared personal machines. Apple Silicon hosts should prefer
arm64labels so jobs don't try to match Intel-only labels.Agents do NOT register runners. Treat these commands as a human ops task documented here for completeness.
Creating a Namespace macOS runner profile¶
Today, nsc can verify login/workspace state and inspect the instances created
by GitHub Actions, but it does not create or edit GitHub Actions runner
profiles from this workflow. Creating a new runner profile is currently a
Namespace dashboard step.
Use this path in Namespace:
GitHub Actions -> Profiles -> New Profile
Recommended fields for the first macOS validation profile:
- Name in the UI:
generouscorp-macos - OS & Architecture:
macOS on Apple Silicon - Resources:
6 vCPU, 14 GB RAM - Base image: a recent Xcode/macOS image appropriate for your build
- Cache toggles: leave enabled unless you have a reason to turn them off
Important selector detail:
- the Namespace UI shows the profile name without the GitHub runner prefix
- the selector you pass to Pulp/GitHub Actions is the prefixed form
- example: UI profile
generouscorp-macosbecomes selector"namespace-profile-generouscorp-macos" - for one-off experiments you can skip profile creation entirely and pass a
direct machine label instead, for example:
"nscloud-macos-tahoe-arm64-6x14"
After creating the profile, validate it with a one-off run:
pulp ci-local cloud run build feature/my-branch \
--provider namespace \
--macos-runner-selector-json '"namespace-profile-generouscorp-macos"'
Or use a direct machine label for an ad hoc run:
pulp ci-local cloud run build feature/my-branch \
--provider namespace \
--macos-runner-selector-json '"nscloud-macos-tahoe-arm64-6x14"'
Then confirm the backing instance shape with:
For a real macOS runner, the matching entry should report:
user_label.nsc.runner-profile-tag = "namespace-profile-generouscorp-macos"shape.os = "macos"shape.machine_arch = "arm64"
If it instead shows linux/amd64, the profile label is valid but the backing
runner is not a real macOS machine yet.
Prerequisites¶
- UTM — free VM manager for macOS (Apple Silicon and Intel)
- SSH key access to your VMs (password auth is not supported)
- The Pulp repo cloned on each VM at the path specified in
config.json
UTM is the simplest option, but any SSH-reachable host works: Proxmox, a cloud VM (Azure/AWS/GCP), or a physical machine on your network. Cloud VMs cost money to run but are otherwise fully supported.
Setup¶
1. Create your config¶
Local CI now prefers a machine-global config at ~/Library/Application Support/Pulp/local-ci/config.json on macOS (or the platform-equivalent state_dir()/config.json) so every worktree on the same machine sees the same host topology. tools/local-ci/config.json remains the fallback if no shared config exists, and PULP_LOCAL_CI_CONFIG still overrides both when you need an explicit one-off config.
Create the initial file from the example, then copy it to the shared state location if you want all worktrees to reuse it:
mkdir -p ~/Library/Application\\ Support/Pulp/local-ci
cp tools/local-ci/config.example.json ~/Library/Application\\ Support/Pulp/local-ci/config.json
Edit the chosen config.json and fill in your SSH hostnames and repo paths. The host field is the primary SSH target. fallback_host, if present, is tried next. The utm_fallback block is optional and is only used if SSH targets are unreachable.
Keep those aliases environment-local. Shared skills and docs should not hardcode your personal hostnames or VM names; they should explain how to choose the right target and where that target is configured.
The optional github_actions.workflows.docs-check.providers.namespace.runner_selector_json
value lets you set the default Namespace runs-on selector that cloud run docs-check
should dispatch when you do not pass --runner-selector-json explicitly.
1b. Optional estimated billing config¶
If you want per-run and billing-period cost estimates in cloud status,
cloud history, and cloud compare, fill in the telemetry.billing block in
your local config.
These numbers are estimates only. Verify provider pricing.
Example:
{
"telemetry": {
"billing": {
"enable_provider_reported_totals": false,
"currency": "USD",
"billing_period_start_day": 1,
"github_hosted_job_os_rates_per_minute": {
"linux": 0.008,
"windows": 0.016,
"macos": 0.08
},
"namespace_profile_tag_rates_per_hour": {
"namespace-profile-generouscorp": 0.50,
"namespace-profile-generouscorp-macos": 1.20
}
}
}
}
Notes:
- GitHub-hosted estimates use per-job OS rates when Pulp can infer the runner OS
- Namespace estimates prefer a profile-tag hourly rate and fall back to a machine-shape rule if you configured one
- if no matching rate exists, the CLI prints
cost: unavailable (...) enable_provider_reported_totalsis off by default; turn it on only if you want Pulp to ask GitHub for repo-wide billing totals when that API is available- provider-reported GitHub totals are shown separately from tracked-run estimates because they are repo-wide current-period figures, not per-run truth
- GitHub can still return
unavailablehere if the account/API path does not support the newer billing endpoints
1a. Recommended Namespace setup¶
If you want to use the Namespace runner-provider path, the easiest setup today is:
That is the recommended operator path for this pilot. Pulp can dispatch the
GitHub workflow without shelling out to nsc, but keeping nsc installed makes
it much easier to verify your Namespace workspace, inspect the account, and
later support thin pulp ci-local cloud namespace ... helper commands without
re-implementing Namespace setup logic inside Pulp.
Once nsc is installed, Pulp's thin helper commands can verify the state for
you:
doctor checks that nsc exists, verifies login state, and prints the current
workspace identity. setup stays deliberately thin: it runs nsc login when
needed and then re-renders the same status.
{
"targets": {
"mac": {
"type": "local",
"enabled": true
},
"ubuntu": {
"type": "ssh",
"host": "ubuntu",
"repo_path": "/home/yourname/Code/pulp-validate",
"utm_fallback": {
"vm_name": "Ubuntu 24.04",
"boot_wait_secs": 30,
"ssh_retry_secs": 60
}
},
"windows": {
"type": "ssh",
"host": "win",
"repo_path": "C:\\Users\\yourname\\pulp-validate",
"cmake_generator": "Visual Studio 17 2022",
"cmake_platform": "x64",
"cmake_generator_instance": "",
"fallback_host": "win2",
"utm_fallback": {
"vm_name": "Windows 11",
"boot_wait_secs": 60,
"ssh_retry_secs": 120
}
}
}
}
SSH host aliases come from ~/.ssh/config. Set them up there rather than putting raw IPs in this file. This makes it easy to prefer a fast local VM as the primary target and keep a slower hardware-backed machine as the fallback when you only need it for edge cases.
Before trusting a Windows SSH target for CI, verify that its non-interactive PowerShell context resolves git, cmake, and ctest. An interactive shell that works is not sufficient proof for the SSH service context the runner actually uses.
If your Windows VM is Windows on ARM, you can either set cmake_platform to "ARM64" explicitly or leave it blank and let the runner infer ARM64 vs x64 from the remote host. If CMake keeps picking the wrong Visual Studio install, set cmake_generator_instance to the exact VS path, for example C:/Program Files/Microsoft Visual Studio/2022/Community. If you leave cmake_generator_instance blank, the runner prefers a full Visual Studio install over BuildTools when both are present. The pinned WebGPU dependency already has a Windows aarch64 prebuilt for this path, so ARM Windows smoke runs can stay on the normal GPU-enabled configuration. This is useful for fast smoke validation on a local UTM VM. Keep an x64 Windows machine for parity runs when you need the authoritative Windows architecture.
2. Set up SSH keys¶
Each VM needs your public key in its authorized_keys. The Linux path is
straightforward; Windows requires extra steps because OpenSSH on Windows uses a
separate file with strict ACLs for admin users.
Find your public key (on your Mac)¶
If your private key is ~/.ssh/id_ed25519, your public key is:
Copy the output — you'll paste it on each VM. If you're running the VM in UTM or another hypervisor and can't copy/paste between host and guest, install the guest tools for your hypervisor first (e.g. SPICE guest tools for UTM/QEMU, VMware Tools, VirtualBox Guest Additions).
Linux (Ubuntu)¶
If ssh-copy-id is available and you can already reach the VM by password:
If you're setting up from scratch on a fresh VM, SSH into it (or open its console) and run:
1. Note the VM's IP address:
Look for the inet line under your active adapter (usually enp0s1 or eth0).
2. Install and enable the SSH server (if not already running):
3. Add your public key:
mkdir -p ~/.ssh && chmod 700 ~/.ssh
echo "ssh-ed25519 AAAA...your-key-here..." >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
4. (Optional) Disable password auth for tighter security:
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh
Windows¶
On the Windows VM, open PowerShell as Administrator and run:
1. Note the VM's IP address (you'll need it for SSH config later):
Look for the IPv4 Address line under your active adapter.
2. Create the admin authorized_keys file and add your public key:
New-Item -Force -ItemType File -Path "C:\ProgramData\ssh\administrators_authorized_keys"
Add-Content -Path "C:\ProgramData\ssh\administrators_authorized_keys" -Value "ssh-ed25519 AAAA...your-key-here..."
3. Fix the ACL (OpenSSH ignores the file if permissions are wrong):
icacls "C:\ProgramData\ssh\administrators_authorized_keys" /inheritance:r /grant "SYSTEM:(F)" /grant "Administrators:(F)"
4. Make sure sshd is running and set to auto-start:
Why
administrators_authorized_keys? Windows OpenSSH usesC:\ProgramData\ssh\administrators_authorized_keysfor users in the Administrators group, not~/.ssh/authorized_keys. The ACL step is required — without it, sshd silently skips the file and falls back to password auth.
Set up SSH config on your Mac¶
Add entries to ~/.ssh/config so you can type ssh win instead of remembering
IPs and usernames:
Host win
HostName 192.168.64.5
User your-username
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
ConnectTimeout 5
Host ubuntu
HostName 192.168.64.4
User your-username
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
ConnectTimeout 5
Replace the HostName values with the actual IPs from ipconfig (Windows) or
ip addr (Linux). The host aliases here (win, ubuntu) are what you'll use
in hosts.local.json for CI targets.
Test passwordless login¶
3. Clone the repo on each VM¶
The runner does a git fetch + checkout on the target, so the repo must already exist at the configured repo_path.
4. (Optional) Install the launchd drain agent¶
To automatically drain the queue on login and every 30 minutes:
cp tools/local-ci/dev.pulp.local-ci.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/dev.pulp.local-ci.plist
Edit the plist first if your repo is at a different path. To remove:
launchctl unload ~/Library/LaunchAgents/dev.pulp.local-ci.plist
rm ~/Library/LaunchAgents/dev.pulp.local-ci.plist
Usage¶
# Enqueue the current HEAD and wait for completion
pulp ci-local run
# Queue even if your current cwd belongs to a different git root than the script checkout
pulp ci-local run --allow-root-mismatch
# Fast preflight: clean configure/build/install + installed-SDK smoke, no tests
pulp ci-local run --smoke
# Fast PR preflight with a comment that is clearly labeled as smoke-only
pulp ci-local check 56 --smoke
# Run Mac-only while iterating locally
pulp ci-local run --targets mac
# Queue background work with explicit priority
pulp ci-local enqueue --priority low
# Bump a pending job to the front of the queue
pulp ci-local bump <job-id> high
# Drain pending jobs if no other runner already owns the queue
pulp ci-local drain
# Show queue, active runner, recent results, live target state, and VM status
pulp ci-local status
# Tail a running or completed target log
pulp ci-local logs <job-id> --target windows
# Show accumulated exact-SHA target evidence for a branch
pulp ci-local evidence feature/my-branch --limit 3
# Show local-CI disk usage and reclaimable artifacts without deleting anything
pulp ci-local cleanup
pulp ci-local cleanup --dry-run
# Delete stale bundles/logs/results once no local CI job is running
pulp ci-local cleanup --apply
# Include prepared build/install caches too; later reruns will rebuild them
pulp ci-local cleanup --apply --include-prepared
pulp ci-local run is the most common command. It enqueues the current HEAD, joins the machine-global queue, and waits until that exact job finishes.
Develop branch workflow¶
For complex, multi-piece features that use a develop/* integration branch, PRs target the develop branch instead of main. The ship command supports this via --base:
# Ship a feature to the develop branch (not main)
pulp ci-local ship feature/pkg-registry --base develop/package-manager
# The develop branch itself ships to main at phase boundaries
pulp ci-local ship develop/package-manager
GitHub Actions CI triggers on PRs to both main and develop/** branches, so CI runs automatically regardless of the target.
If you pass a branch name explicitly, for example pulp ci-local run feature/my-branch, local CI resolves and records that branch tip's exact SHA immediately. This prevents a stale launching checkout from accidentally queuing its own HEAD while you intended to validate a different branch.
Before queueing, local CI now also records:
- the worktree root that is actually being queued
- the current cwd and its git root, if any
- the config path and whether it came from PULP_LOCAL_CI_CONFIG, shared state, or the worktree fallback
- the selected SSH host/transport intent for each remote target
If the current cwd belongs to a different git root than the local_ci.py checkout you are invoking, queueing fails fast by default. Pass --allow-root-mismatch only when that mismatch is intentional.
If a selected SSH target is down and no fallback host or UTM fallback is configured, queueing now fails fast instead of burning time on a doomed job. Pass --allow-unreachable-targets only when you deliberately want to queue past that preflight.
Use --smoke when you want a quicker preflight before a full matrix run. Smoke mode still validates a clean detached worktree and installed SDK export path, but it disables tests, examples, and GPU in that clean build and skips ctest. Queue summaries and PR comments label these jobs as validation=smoke so they are not mistaken for full validation.
When a rerun is narrow and stays on the exact same SHA, local CI can now reuse the prepared root for that target + validation on persistent hosts. Status output calls this out as prepared=reused or prepared=clean so reused proof is never mistaken for a fresh cold path.
While a job is still running, pulp ci-local status reports live per-target state for the active job when available, for example:
Runner: pid=12345 active=[abcd1234ef56] feature/my-branch
Running (1):
[abcd1234ef56] feature/my-branch @ 0123456789ab priority=normal targets=mac,ubuntu,windows
submission: root=/Users/me/Code/pulp-worktree config=/Users/me/Library/Application Support/Pulp/local-ci/config.json (shared-state)
live targets: mac=pass, ubuntu=pass, windows=running
windows: phase=test, output=2026-04-01T01:34:18+00:00, heartbeat=2026-04-01T01:34:33+00:00, idle=15s, liveness=quiet, log=windows.log
37/1263 Test: OSC 4-byte alignment
If a run is interrupted after some targets have finished, the job is requeued but keeps its last known target state:
Pending (1):
[abcd1234ef56] feature/my-branch @ 0123456789ab priority=normal targets=mac,ubuntu,windows
last known targets: mac=pass, ubuntu=pass, windows=running
Results are written to the machine-global state directory:
- macOS:
~/Library/Application Support/Pulp/local-ci/results/ - Linux:
${XDG_STATE_HOME:-~/.local/state}/pulp/local-ci/results/
A non-zero exit means at least one target failed.
If a newer SHA is queued for the same branch, targets, and validation mode, older
pending work is marked superseded and written to the results directory with a
reference to the replacement job. If a runner dies and reconciliation finds a newer
replacement already queued for that same scope, the stale running job is also
superseded instead of being requeued.
Cleanup And Disk Usage¶
pulp ci-local status now includes a local footprint summary so retained CI
state stops being invisible drift:
- bundles
- prepared build/install caches
- logs
- results
- tracked cloud-run records
Use pulp ci-local cleanup to inspect what can be reclaimed. The command is a
dry run by default, and --dry-run is available explicitly when you want that
spelled out in scripts or notes.
What is cleaned automatically after job completion:
- completed-job git bundles once no pending/running job still needs them
- orphaned logs outside retained queue history
- orphaned result files outside retained queue history
What is not cleaned automatically in this first pass:
- prepared build/install state under
prepared/<target>/<mode>
Prepared state is an intentional reuse cache. If you include it in manual cleanup, later reruns will rebuild it from scratch.
Examples:
# Inspect reclaimable space
pulp ci-local cleanup
# Show the same dry-run plan explicitly
pulp ci-local cleanup --dry-run
# Delete stale bundles/logs/results
pulp ci-local cleanup --apply
# Also delete prepared caches
pulp ci-local cleanup --apply --include-prepared
Safety rules:
cleanup --applyis blocked while local CI jobs are running- prepared cleanup is destructive to cached build/install state
- logs/results tied to jobs still present in queue history are retained
If you need immediate manual cleanup outside the CLI, make sure no
pulp ci-local job is active first.
Desktop automation¶
pulp ci-local desktop ... adds a GUI/session automation layer under the same local CI control plane. Use it when an agent needs to launch an app, inspect it, click on it, capture screenshots, or publish a local evidence gallery without logging into the target machine manually.
Current desktop commands:
# Prepare one target and record its contract/receipt
pulp ci-local desktop install mac
pulp ci-local desktop install ubuntu
pulp ci-local desktop install windows
# Health and capability reporting
pulp ci-local desktop doctor mac
pulp ci-local desktop status
pulp ci-local desktop recent mac --limit 3
pulp ci-local desktop proof windows --action inspect --source-mode exact-sha --sha <commit-sha>
# Configure artifact/publish settings
pulp ci-local desktop config show
pulp ci-local desktop config set artifact_root ~/Library/Application\\ Support/Pulp/desktop-automation/runs
pulp ci-local desktop config set publish_mode none
# Run GUI actions
pulp ci-local desktop smoke mac --bundle-id com.apple.TextEdit --label textedit-smoke
pulp ci-local desktop inspect mac --command '/path/to/pulp-ui-preview' --label ui-preview-inspect --pulp-app-automation
pulp ci-local desktop click mac --command '/path/to/pulp-ui-preview' --click-view-id bypass-toggle --capture-ui-snapshot --pulp-app-automation
pulp ci-local desktop inspect windows --command 'notepad.exe' --label notepad-inspect
pulp ci-local desktop click windows --command 'notepad.exe' --click 885,18 --label notepad-maximize
# Run against an exact prepared SHA instead of the live checkout
pulp ci-local desktop inspect mac \
--command './build-desktop-automation/examples/ui-preview/pulp-ui-preview' \
--source-mode exact-sha \
--sha <commit-sha> \
--prepare-command 'cmake -S . -B build-desktop-automation && cmake --build build-desktop-automation --target pulp-ui-preview' \
--pulp-app-automation
# Publish or prune local bundles
pulp ci-local desktop publish mac --limit 5 --label mac-gallery
pulp ci-local desktop cleanup mac --older-than-days 14 --keep-last 10
Ubuntu prerequisite:
sudo apt-get update
sudo apt-get install -y git-lfs xvfb xauth xdotool imagemagick wmctrl x11-utils
git lfs install
Supported Ubuntu/Linux setup tiers:
- baseline deterministic backend:
xvfb+xauth - source/bootstrap prerequisite:
git-lfs - richer interaction/capture lane:
xdotool,imagemagick,x11-utils, andwmctrl
xvfb-run is the supported deterministic backend for Ubuntu/Linux desktop automation. A visible :0 display socket alone is not enough for SSH-driven automation because X11 authorization is often unavailable inside the remote shell. For repeatable CI and agent-driven runs, use the package set above and keep xvfb-run as the documented default.
desktop doctor ubuntu is an aggregate report. If multiple prerequisites are missing, it reports the full missing set plus remediation commands in one run instead of stopping at the first failure.
desktop doctor ubuntu checks the non-interactive SSH environment, not your interactive shell. setup.sh now prepends the common ~/.local/bin path automatically before dependency checks, but if git-lfs still fails over SSH after that, add the real install location to the non-interactive login-shell PATH or install git-lfs system-wide so git lfs version succeeds without extra shell setup.
Exact-SHA source prep on Ubuntu/Linux uses the same non-interactive SSH environment. The controller now treats bundle-based checkout and LFS materialization as separate steps:
- prepend
~/.local/binbefore anygit-lfs-dependent command - fetch and checkout with
GIT_LFS_SKIP_SMUDGE=1 - attach the clone URL as
origin - then let
setup.sh --deps-only --ci/git lfs pullmaterialize the SDK blobs
That split matters on fresh VMs because a bundle checkout alone does not carry an origin remote, and LFS smudge/pull fails if it cannot resolve the repository URL.
Fresh Ubuntu proof checklist:
- Start from a fresh source root,
PULP_HOME, andPULP_PROJECTS_DIR - Run
./setup.sh --deps-only --ci - Build
pulp-cli - Run
pulp create <ProjectName> --manufacturer "<Name>" --no-interactive - Run
pulp buildinside the generated project - Verify actual emitted artifacts, not just configure/test success
Current expected native outputs from that proof are:
- Linux: VST3 target output under
build/VST3,CLAP,LV2, and the standalone binary - macOS:
build/VST3/<Name>.vst3,build/AU/<Name>.component,build/CLAP/<Name>.clap, and the standalone.appbundle - Windows:
build/VST3/Debug/<Name>.dllfor the VST3 target,build/CLAP/Debug/<Name>.clap, and the standalone.exe
If web formats are required, make them explicit in the generated project format list; the default native create proof does not imply web artifact output.
Windows first-time setup checklist:
- Install and enable OpenSSH Server.
- Keep a normal desktop user logged in to the VM. The Windows session-agent runs inside that logged-in session; SSH by itself is not a GUI session.
- Make sure
wingetis available.desktop install windowsuses it to provision required remote tooling such as Git when the VM is still fresh. - Run
pulp ci-local desktop install windowsonce. This bootstraps the scheduled task, installs required remote tooling when possible, and writes the target-side PowerShell agent under%LOCALAPPDATA%\\Pulp\\desktop-automation-agent. - Run
pulp ci-local desktop doctor windowsand make sure SSH, the scheduled-task contract, and the requiredgitcheck are green before attempting live proofs. - For source builds on the Windows VM itself, use
powershell -ExecutionPolicy Bypass -File .\setup.ps1. The wrapper imports the Visual Studio environment and uses a short temporary drive alias so first-time bootstrap does not fail on long nested dependency paths.
The short-path rule is not theoretical. Windows source builds can fail from
long nested checkout roots and then pass once the same source tree is mapped
through a temporary drive alias before the first configure/build. Treat
setup.ps1 or an equivalent short-path wrapper as the supported bootstrap path
for Windows source builds.
Remote tooling policy on Windows:
- required:
git - used by the exact-SHA bundle-sync and prepare flows
desktop install windowswill provision it viawingetwhen possible- optional:
gh - useful for remote GitHub workflows on the target
- not required for smoke/inspect/click proofs
- optional:
gh auth - advisory only; authenticate it only if you intentionally want GitHub CLI workflows on the Windows target
Remote repo bootstrap policy on Windows:
- first-time
desktop install windowsshould not require GitHub credentials on the target VM - the controller prefers a locally uploaded git bundle to materialize
pulp-validate originis still attached when available so later fetches remain truthfulghand stored Git credentials are optional unless you intentionally want GitHub workflows on the Windows machine itself
Useful host-side verification commands:
Get-Service sshd
Set-Service -Name sshd -StartupType Automatic
Start-Service sshd
Get-NetFirewallRule -Name *ssh*
where.exe winget
where.exe git
where.exe gh
Useful first-time remote installs if you want to pre-provision them manually:
winget install --id Git.Git -e --source winget --accept-package-agreements --accept-source-agreements --disable-interactivity
winget install --id GitHub.cli -e --source winget --accept-package-agreements --accept-source-agreements --disable-interactivity
Supported Windows v1 interaction tiers:
- generic window-capture lane:
--commandonly- works for normal desktop apps such as
notepad.exe - supports window screenshot capture and coordinate clicks
- Pulp-owned app automation lane:
- add
--pulp-app-automation - enables
ViewInspectorsnapshots and view-target selectors such as--click-view-id
Artifact bundles are written outside the repo by default:
- macOS:
~/Library/Application Support/Pulp/desktop-automation/runs/ - Linux:
${XDG_STATE_HOME:-~/.local/state}/pulp/desktop-automation/runs/ - Windows:
%LOCALAPPDATA%\\Pulp\\desktop-automation\\runs\\
Each bundle stores:
manifest.jsonstdout.log/stderr.logprepare.logwhen exact-SHA mode runs a fresh prepare stepui-tree.jsonwhen a UI snapshot is availablescreenshots/window.pngscreenshots/before.png/screenshots/diff.pngwhen an interaction captures before/after evidence
The artifact root also maintains rolling summaries for agents and status tooling:
latest-run.json— newest observed run summarylatest-proof.json— newest successful proof summaryruns.jsonl— raw summary stream for recent desktop automation runs- target-scoped copies under
<artifact-root>/<target>/... _published/latest-report.json— newest staged local HTML/JSON gallery summary_published/reports.jsonl— raw summary stream for local published galleries
manifest.json now includes additive source provenance when desktop actions run through the controller:
source.mode(liveorexact-sha)source.branchsource.shasource.prepare_commandsource.prepare_timeout_secssource.prepared_rootsource.launch_cwd
Desktop reporting surfaces are intentionally split:
desktop recent= raw run history, including failed attemptsdesktop proof= successful proof summaries grouped bytarget + action + source.mode + source.shadesktop status= target config pluslatest_run,latest_proof, and the newest local publish summary (latest_publish)
Use desktop proof when you need to answer questions like:
- “What live-host proof do we already have for Ubuntu on this SHA?”
- “Did Windows ever pass this exact-SHA inspect lane?”
- “What is the newest successful proof, even if the newest run failed?”
Exact-SHA desktop source mode¶
desktop smoke, desktop click, and desktop inspect all share a controller-owned source mode:
--source-mode live|exact-sha--branch--sha--prepare-command--prepare-timeout
Behavior:
livelaunches from the target's normal working copy behavior.exact-shaprepares a per-target source root for the requested SHA, launches from that prepared root, and records the prepared-root provenance in the run manifest.- On Windows,
--prepare-commandexecutes inside a generated.cmdscript undercmd.exe. Use double quotes for paths, generator names, and arguments. POSIX-style single-quoted tokens are treated as literal text and are rejected by the controller before the remote prepare step starts. - When
desktop_automation.targets.<target>.optional.webview_driver=true,desktop doctorprobes the configuredwebdriver_urlthrough the WebDriver/statusendpoint and reports whether the driver is actually reachable and ready, not just whether the URL exists in config.
Preparation/cache semantics:
- Prepared roots are keyed by
target + sha + prepare_command. - A repeated identical request may reuse the prepared root instead of rebuilding it.
prepare_commandonly runs when a fresh prepared root is created.
Launch behavior:
- Desktop actions switch their launch
cwdto the prepared root in exact-SHA mode. - Repo-local executable paths in the first command token are rewritten into the prepared root automatically.
- The current exact-SHA workflow is a
--commandlane. Do not assume--bundle-idparticipates in exact-SHA source preparation.
Examples:
# macOS local exact-SHA inspect
pulp ci-local desktop inspect mac \
--command './build-desktop-automation/examples/ui-preview/pulp-ui-preview' \
--source-mode exact-sha \
--sha <commit-sha> \
--prepare-command 'cmake -S . -B build-desktop-automation && cmake --build build-desktop-automation --target pulp-ui-preview' \
--pulp-app-automation
# Ubuntu xvfb exact-SHA click
pulp ci-local desktop click ubuntu \
--command './build-desktop-automation/examples/ui-preview/pulp-ui-preview' \
--click-view-id bypass-toggle \
--capture-ui-snapshot \
--source-mode exact-sha \
--sha <commit-sha> \
--prepare-command 'cmake -S . -B build-desktop-automation && cmake --build build-desktop-automation --target pulp-ui-preview' \
--pulp-app-automation
# Windows session-agent exact-SHA smoke
pulp ci-local desktop smoke windows \
--command '.\\build-desktop-automation\\examples\\ui-preview\\pulp-ui-preview.exe' \
--source-mode exact-sha \
--sha <commit-sha> \
--prepare-command 'cmake -S . -B build-desktop-automation -G \"Visual Studio 17 2022\"; cmake --build build-desktop-automation --target pulp-ui-preview --config Debug' \
--pulp-app-automation
# Windows generic live inspect
pulp ci-local desktop inspect windows \
--command 'notepad.exe' \
--label notepad-inspect
# Windows generic live click with before/after evidence
pulp ci-local desktop click windows \
--command 'notepad.exe' \
--click 885,18 \
--label notepad-maximize
# Query the newest successful Windows proof for one SHA
pulp ci-local desktop proof windows \
--action smoke \
--source-mode exact-sha \
--sha <commit-sha>
Desktop adapter truth¶
macos-local- runs directly on the local logged-in macOS session
- supports bundle launch via
--bundle-id - supports Pulp-owned app automation (
--pulp-app-automation) for direct launch commands, includingViewInspectorsnapshots and view-target clicks linux-xvfb- runs GUI smoke/inspect/click through
xvfb-run - currently supports
--commandonly - currently requires
--pulp-app-automationfor the click/inspect lane windows-session-agent- bootstraps a scheduled task plus target-side PowerShell agent in the logged-in Windows desktop session
- requires a real logged-in desktop user; SSH alone is not enough
- currently supports
--commandonly - supports generic
window-capturesmoke/inspect/click for normal desktop apps - supports coordinate clicks and before/after screenshot diffs without
--pulp-app-automation - supports
ViewInspectorsnapshots and view-target selectors only with--pulp-app-automation - uses the scheduled task plus target-side agent as the honest v1 Windows interaction lane; external UI automation tools are optional future adapters, not the core controller
Proof lookup¶
desktop proof is the first-class proof query surface for desktop automation:
- filters:
target--action--source-mode live|exact-sha|legacy--sha--branch- groups successful proofs by
target/action/source.mode/source.sha - ignores failed runs when computing proof summaries
Example:
desktop status now reports both:
latest_run: the newest run, even if it failedlatest_proof: the newest successful proof summary for that targetlatest_publish: the newest local HTML/JSON gallery summary staged under_published/
Desktop config keys¶
tools/local-ci/config.json accepts a desktop_automation block:
{
"desktop_automation": {
"artifact_root": "",
"publish_mode": "none",
"publish_branch": "dev-artifacts",
"retention_days": 14,
"targets": {
"mac": {
"adapter": "macos-local",
"bootstrap": "launchagent",
"capability_tier": "v2",
"optional": {
"webview_driver": false,
"webdriver_url": "",
"debug_attach": false,
"debugger_command": "lldb",
"video_capture": false,
"frame_stats": false
}
},
"ubuntu": {
"adapter": "linux-xvfb",
"bootstrap": "xvfb-run",
"capability_tier": "v2",
"optional": {
"webview_driver": false,
"webdriver_url": "",
"debug_attach": false,
"debugger_command": "lldb",
"video_capture": false,
"frame_stats": false
}
},
"windows": {
"adapter": "windows-session-agent",
"bootstrap": "scheduled-task",
"capability_tier": "v2",
"task_name": null,
"remote_root": null,
"optional": {
"webview_driver": false,
"webdriver_url": "",
"debug_attach": false,
"debugger_command": "",
"video_capture": false,
"frame_stats": false
}
}
}
}
}
For Windows:
task_nameis optional. If omitted, local CI usesPulpDesktopAutomationAgent-<target>.remote_rootis optional. If omitted, the agent is installed under%LOCALAPPDATA%\Pulp\desktop-automation-agent.optional.webview_driverenables the future WebView/WebDriver capability vocabulary for that target. Pair it withoptional.webdriver_urlonly when the app under test actually exposes a localhost WebDriver endpoint in debug/test mode.optional.debug_attach,optional.video_capture, andoptional.frame_statsare opt-in groundwork flags. They make the target advertise and doctor those optional tiers; they do not magically make the adapter support them unless the required tooling is also present.
Convenience updates through the CLI:
pulp ci-local desktop config set target.mac.webview_driver true
pulp ci-local desktop config set target.mac.webdriver_url http://127.0.0.1:4444
pulp ci-local desktop config set target.mac.debug_attach true
pulp ci-local desktop config set target.mac.debugger_command lldb
pulp ci-local desktop config set target.mac.video_capture true
pulp ci-local desktop config set target.mac.frame_stats true
Recommended host-side remediation when desktop doctor windows reports SSH service reset during handshake:
Get-Service sshd
Set-Service -Name sshd -StartupType Automatic
Start-Service sshd
Get-NetFirewallRule -Name *ssh*
Treat that failure as a Windows host-side OpenSSH issue, not a desktop-agent contract failure.
Desktop publication¶
pulp ci-local desktop publish always stages a local HTML/JSON gallery from recent bundles. In the default publish_mode=none path, that is the whole feature. When publish_mode=branch, the same report is also mirrored to the configured publish branch under desktop-automation/latest/ and desktop-automation/reports/<report-id>/.
index.htmlindex.json- copied screenshots and diffs
- source manifest/log references
_published/latest-report.jsonand_published/reports.jsonlrollups for the newest/known local galleries
Use desktop config set publish_mode ... only when you intentionally want publication behavior. The default should stay none for normal development.
Branch publication notes:
publish_mode=branchpushes the latest local report topublish_branch- the branch stores
desktop-automation/latest/plus immutabledesktop-automation/reports/<report-id>/snapshots - when the repo remote is GitHub, the publish report includes clickable branch/tree/blob URLs for the mirrored artifacts
Evidence Tracking¶
pulp ci-local evidence summarizes the last-good recorded results by exact SHA, target, and validation mode. This is the operator-facing answer to:
- what already passed on this branch?
- which exact SHA has Windows full proof?
- do we really need to rerun macOS again?
The compact evidence section in pulp ci-local status uses the same data so the current branch’s known-good results stay visible during active work.
Working A Failure¶
Do not wait for a whole matrix to finish before reacting. The fastest loop is:
- start a run
- watch
pulp ci-local status - tail
pulp ci-local logs <job-id> --target <name>on the first failing or suspicious target - begin the narrowest local repro or code inspection immediately
- rerun only the truthful scope needed after the fix
In practice, that means:
- one process owns CI monitoring and host state
- one process or agent works the likely fix locally as soon as a failure becomes actionable
- user updates should be sent when a target changes state or the first actionable failure appears, not only when asked
- a target that already failed is enough to start debugging; do not burn time waiting for unrelated targets to finish unless their result changes the fix
- once a failure is actionable, start the fix track in parallel unless it would contend with the same host or invalidate the active run
- do not rerun a target that already passed on the exact same SHA unless that prior result is untrustworthy or the environment changed
- if only one or two targets are stale, rerun only those targets instead of the whole matrix
- once the failure surface is isolated, prefer the minimum sufficient proof instead of a symmetric rerun
- a direct exact-SHA validate on one target counts as valid evidence for that target; keep earlier same-SHA passes for the other targets unless something actually invalidated them
- on persistent hosts, narrow same-SHA reruns should prefer prepared-state reuse instead of paying again for clean worktree/setup/build work
- use
--smokefirst when the risk is install/export/build structure rather than runtime test behavior all targets on one SHAis a goal, not a reason to blindly rerun already-green same-SHA targets- if a broader in-flight job is no longer informative, cut over to the narrower rerun instead of letting the queue drift
Priorities¶
Jobs are ordered by priority first, then FIFO within the same priority.
low— background validationnormal— default interactive workhigh— shipping, PR checks, or work you want to run first
You can set the initial priority with --priority and change a pending job later with:
pulp ci-local status prints the job ids you can bump.
Exact SHAs On Remote Targets¶
Remote targets validate the queued SHA, not the latest branch tip. That keeps queued jobs truthful, and the runner now uploads that exact SHA to SSH targets as a git bundle before validation.
If you queued work with an explicit branch name, the runner first resolves that branch name to a commit SHA and then treats the run exactly like any other exact-SHA validation.
That means this works even for a local-only commit:
pulp ci-local ship still pushes first because it opens and validates a PR, but ordinary local validation no longer depends on the remote host already having your branch tip.
Running Mac-only¶
If you don't have VMs set up, disable the SSH targets in your active CI config:
Mac validation still runs. You get single-platform coverage, which is better than nothing for catching build breaks before pushing.
You can also keep the SSH targets enabled and request Mac-only while iterating:
For contributors¶
You don't need the same VM setup as the original developer. Options:
- Mac-only: Disable all SSH targets. Fast, free, covers the primary development platform.
- UTM VMs: Free. Requires ~40 GB of disk for both VMs. UTM images can be created from ISO or from the UTM gallery.
- Cloud VMs: Works with any SSH-accessible host. Costs money while running — stop them when not in use.
- Physical machines: A spare Linux box or Windows machine on your network works fine.
Local CI config is intentionally gitignored. Keep your host topology local, and prefer the machine-global config path so every worktree uses the same host map by default.
Troubleshooting¶
JSONDecodeError on Shipyard queue file¶
Shipyard's local job queue lives at ~/Library/Application Support/shipyard/queue/queue.json on macOS (~/AppData/Local/shipyard/queue/queue.json on Windows, ${XDG_STATE_HOME:-~/.local/state}/shipyard/queue/queue.json on Linux). On rare crashes Shipyard can truncate this file to zero bytes, which then breaks every subsequent invocation with a JSONDecodeError.
Recovery (run once):
Re-running tools/install-shipyard.sh also performs this reset automatically. Tracked as #528.