Published on

How QuantCo automatically keeps hundreds of repositories up-to-date

Authors

QuantCos are heavy Git and GitHub users: There are multiple hundreds of (public and private) repositories in the shared QuantCo GitHub organization alone! In 2023, QuantCos averaged three new repositories per week.

While the code in those repositories is as diverse as the solutions we build for our clients, most of the repositories share common traits:

  • They are based on a QuantCo Python project template.
  • They use our Conda-based pre-commit hooks hooks.
  • They use Conda lock files, a version-pinned variant of Conda environment files.

The QuantCo Engineering team has always strived to provide QuantCos with very good defaults and automation for their Python projects to help them focus on creating value for our clients. Initially, we performed many large-scale repository updates with the help of one-off bash scripts that automatically create pull requests against each repository. While this scrappy approach served us surprisingly well, we were looking for a more scalable solution that could keep up with the immense growth of the number of repositories in our GitHub organization.

Specifically, we were looking for a tool that can:

  • Automatically update the projects to the latest version of the Python project template
  • Automatically update versions of any pre-commit hooks used
  • Automatically update Conda dependency versions in the lock files

For some use cases like updating pre-commit hooks, there are existing tools like pre-commit.ci that can be used. This, however, raises data security concerns and is needlessly expensive. We could not find any open-source tool that suits our needs, so we built or own tool: Quant Ranger. Quant Ranger is a GitHub app (or bot) that we use to automate the above-mentioned updates in all of our repositories. In fact, you might have already seen the bot appear in some of our repositories' pull requests such as multiregex #43, polarify #70 or glum #788. The bot is written in TypeScript for easy integration with Octokit, the official SDK for interacting with the GitHub API, and has a plugin architecture that allows to easily extend the list of automated updates beyond project templates, pre-commit versions, and lock files. In the source code, this results in a separate "updater" for each task:

src/
  ...
  updaters/
    index.ts
    conda-env-lock.ts
    copier.ts
    pre-commit.ts
    pixi.ts
...

To run updates, Quant Ranger is scheduled as a weekly or monthly GitHub Action. This has the advantage that we don't need to run a separate server to keep our repositories up-to-date. In addition, GitHub Actions provide a simple way to monitor the updater's activity and logs without the need for setting up additional monitoring.

Example of an automated pre-commit pull request

From the point of view of the users who own the hundreds of repositories that our bot opens automated pull requests against, the update process is configurable. Specifically, users can customize the following aspects:

  • Pull request title and label templates can be specified so that pull requests comply with repository-specific policies and processes such as Conventional Commits.
  • The update schedule can be set to monthly, weekly, or never.
  • Users can opt-in to automatically enabling automerge for pull requests.

Every week, the updater scans the QuantCo GitHub organization, identifies repositories that need updates, and creates pull requests against them. While doing this, each updater can decide for itself whether it needs to update a particular repository, e.g. by looking for a file at a well-defined path in the repository. In the case of pre-commit updates, the updater looks for a .pre-commit-config.yaml file, for project template updates, the updater looks for a .copier-answers.yml file, etc.

Triggering Quant Ranger

If you don't want to wait for Quant Ranger to come around at the start of the next week or month, you can also trigger it manually. This can be done by adding a comment to a pull request or creating an issue in the respective repository. Quant Ranger will then update your comment and display a status message with the progress of the update.

Triggering Quant Ranger with a GitHub comment

This is implemented using an AWS Lambda function that listens to the GitHub issue_comment.created event in the QuantCo organization. On trigger, it parses the comment and triggers the corresponding Quant Ranger updater workflow.

Running CI in automatic PRs

One of the major pain points we ran into when developing Quant Ranger was how to ensure that CI jobs run whenever an automated pull request is opened. The default behavior for automated pull requests is to not run CI jobs at all to avoid potential recursion as explained in the GitHub Actions documentation. There are multiple ways to work around this limitation, as described in the docs of peter-evans/create-pull-request.

Most of them have some drawbacks for our use case:

  • Manually closing and reopening pull requests becomes cumbersome quickly
  • SSH deploy keys need to be generated per repository and added to the repository secrets
  • PATs need to be either scoped per repository and refreshed periodically or belong to a technical user with access to all repositories - not ideal from a security perspective
  • Technical users that can create pull requests from forks only work for public repositories

The option that we chose is using a GitHub App to generate a token. Here, the GitHub App is installed in the QuantCo organization and has access to all repositories. The token is only available to the repository that hosts the code of Quant Ranger and may only be used from the main branch using environments for deployment. By making sure that only few people may merge changes to the main branch, we can, thus, prevent privilege escalation where a single user in the QuantCo organization gains write access to all repositories via the app's token.

Prospective Challenges

While Quant Ranger has been running smoothly for a few months now, we are aware of some challenges that we might face in the future:

  • If the updates take longer than 1 hour, the GitHub App token needs to be refreshed.
  • At some point, we might run into rate limits for the GitHub API. This can be fixed by distributing the updates over multiple days.

While this blog post has focused entirely on the automation for keeping our repositories up-to-date, stay tuned for future blog posts, where we will discuss lock files for Conda environments and our Python project template in detail.