Skip to content

Pull requests

We use pull requests (PRs) to propose changes to the codebase. They are the best way to suggest changes to the codebase, and they are also the best way to get feedback on your work.

Every PR in the ETL repository has an associated staging server created to it. To smooth this process, we have automated all of this with the command etl pr.

Learn more about how to use the etl pr command

PR work summary

Once you've created a PR, the automation user @owidbot will add a comment to the PR summarizing your work in the PR and providing links to relevant resources. This comment will include the following information:

  • Quick links: Links to the site and tooling using changes introduced by the PR. This includes admin site, public site, Wizard, documentation.
  • Login: Instructions on how to ssh into the staging server.
  • chart-diff: Wizard app showing chart changes (if any) compared to PRODUCTION. Owidbot will complain if there are chart changes pending review.
  • data-diff: Changes introduced in the data compared to PRODUCTION.
Chart Upgrader
PR, and comment by @owidbot, as of 19th November 2024

Scheduling a PR merge

You can schedule a PR merge by using the command /schedule at the end of your PR description. This is useful whenever you want to merge your PR at a specific time, e.g. nightly if it could trigger a long deployment process in the main branch.

You have multiple options to schedule a PR merge:

  • /schedule: The PR will be merged at the next sharp hour (e.g., 13:00, 14:00), based on the current UTC time.
  • /schedule 2024-11-19: The PR will be merged at midnight (00:00 UTC) on the specified date.
  • /schedule 2024-11-19T12:50:00.000Z: The PR will be merged at the next sharp hour immediately following the specified timestamp (e.g., if scheduled for 12:50 UTC, it will merge at 13:00 UTC).

You can find an example here.

Chart Upgrader
GitHub action comment, as of 19th November 2024

Working on multiple branches in parallel

If you need several PRs in flight at once (e.g. several agent sessions, one per branch), etl pr --worktree creates the new branch in a separate git worktree so your current working tree is untouched:

etl pr "Update some dataset" data --worktree

The command creates the worktree at ../etl-<branch> and runs uv sync inside it, so the worktree's .venv/ is ready to use by the time the command finishes. To start working there:

cd ../etl-<branch>

Otherwise also run source .venv/bin/activate. Or, even better, set up auto-activation once (see below) — then cd alone is enough.

When you're done with the worktree (typically after the PR is merged), clean up:

git worktree remove ../etl-<branch>
git branch -D <branch>

Optional: auto-activate the venv when you cd

Add this snippet to your ~/.zshrc so the right .venv/ activates automatically every time you cd into a worktree (or any project folder with a .venv/):

autoload -U add-zsh-hook
load-py-venv() {
    if [ -f .venv/bin/activate ]; then
        source .venv/bin/activate
    elif [ -f env/bin/activate ]; then
        source env/bin/activate
    elif [ -f venv/bin/activate ]; then
        source venv/bin/activate
    elif [ ! -z "$VIRTUAL_ENV" ] && [ -f poetry.toml -o -f requirements.txt ]; then
        deactivate
    fi
}
add-zsh-hook chpwd load-py-venv
load-py-venv

After reloading your shell (source ~/.zshrc or open a new terminal), you can cd between worktrees and the matching venv will activate on its own.

Tip: open each worktree in its own VS Code window (File > New Window). The Claude Code extension is scoped per workspace, so each window gets its own chat.

Sharing the data folder (optional)

--share-data symlinks the new worktree's data/ to the original's, so upstream ETL steps don't get recomputed:

etl pr "Update dataset" data --worktree --share-data

Warning

Never run rm -rf data/ in a shared worktree — the trailing slash makes rm follow the symlink and wipe the original data/. Use git worktree remove ../etl-<branch> to clean up instead.