- Published on
conda support in pre-commit
- Authors
Developing code involves several tasks that are simple yet repetitive. This includes styling your code (we use black
) and checking for common issues. These tasks can be easily automated. To automate them, we use Git's hook system using the pre-commit utility. On every commit pre-commit
applies our code style conventions with black
, checks for common (but subtle) issues with flake8
and lets mypy
check that our Python type hints are consistent. These checks resolve many issues long before code reaches the CI system and goes into human code review without putting any additional burden on the developer.
As scientific Python and R developers, we have lots of compiled dependencies and thus prefer to manage our (Python) environments using conda. In addition, much of our development takes place in environments that are not directly connected to the internet. All packages are installed from a local conda mirror, and we try to keep this our sole package mirror. We do not have a local PyPI or CRAN mirror. While this may sound like an impediment, we are very active contributors to conda-forge and can get most packages built and deployed there in less than 4 hours (a massive thanks to the really fast reviewers!).
To be able to use only a single package mirror, all tools must support conda
to create environments. Sadly, pre-commit
only had support for virtualenv
and python-venv
to create a virtual environment for Python. It had no support for R at all. To work around this problem, we did not use pre-commit
's feature to create the environments the checks are run in automatically but instead used the language: system
flag to tell pre-commit to use the black
/ mypy
/ … it finds first on the PATH
. While this is a simple solution, we lose the ability to ensure that pre-commit
runs the correct version of a check. This is especially problematic for, e.g., mypy
- different versions of which produce different issues. The only solution was to make sure that all our code repositories use the same versions of the check and all are updated simultaneously. Everyone who has ever managed a large codebase will know that this is quite a job.
To work around this, one of the possible options was to set up a PyPI mirror. This would enable us to use the language: python
facility in pre-commit with the required features. But it would only solve the issue for Python checks. For checks in other languages like R or other binaries that are available through conda-forge we would still be out of luck. In the end we choose the option to implement conda
as a language
type in pre-commit
.
Python pre-commit hook with conda
To write a pre-commit hook that is set up using conda
, you should have a repository that contains the hook specification file .pre-commit-hooks.yaml
and the conda
environment definition environment.yml
. As an example, a pre-commit hook for mypy
would have the following environment.yml
:
channels:
- conda-forge
- defaults
dependencies:
- mypy=0.761
With the respective .pre-commit-hooks.yaml
taken from the language: python
hook:
- id: mypy-conda
name: mypy-conda
entry: mypy
language: conda
'types': [python]
args: ["--ignore-missing-imports", "--scripts-are-modules"]
require_serial: true
additional_dependencies: []
You can then use this hook using the following snippet in your .pre-commit-config.yaml
- repo: https://github.com/Quantco/pre-commit-mirrors-mypy
rev: '0.761'
hooks:
- id: mypy-conda
R pre-commit hook with conda
As conda
is a general package management system and not bound to Python, we can also use it to write pre-commit hooks in R. While in Python, things are commonly available as executable via entry points, in the R world, most things are only callable from R and not directly from the command line. Thus we need to include a bit of R code in the hook.
First, we declare the dependencies in the environment.yml
:
channels:
- conda-forge
- defaults
dependencies:
- r-base=3.6
Additional R packages are readily available on conda-forge with an r-
prefix and the CRAN package name in lowercase. For example, if you wanted to write a pre-commit hook that knits Readme.Rmd
into Readme.md
on each commit that touches Readme.Rmd
, you would also add r-knitr
to the environment.yml
.
For the scope of this example, we'll limit ourselves to reimplementing the parsable-R
hook from lorenzwalthert/precommit. This hook is originally written as language: script
, meaning that it is up to the user to have the dependencies pre-installed on the system. While for this simple case, it would be enough to have any working R installation at all, in other cases, you may want to pin to a specific R version or package version, at which point having proper dependency management courtesy of conda
becomes extremely helpful. The hook itself calls a script that loops over all R files and checks them for (in)valid R code.
For using language: conda
, we need to combine the script and pre-commit configuration in .pre-commit-hooks.yaml
:
- id: parsable-R-conda
name: parsable-R-conda
description: check if a .R file is parsable
language: conda
types: [r]
entry: |
Rscript -e 'files <- commandArgs(trailing = TRUE)
out <- lapply(files, function(path) {
tryCatch(
parse(path),
error = function(x) stop("File ", path, " is not parsable", call. = FALSE)
)
})'
You can then use this hook in your project in the usual way:
- repo: https://github.com/some/repo
rev: '0.1.2'
hooks:
- id: parsable-R-conda
This hook will also run successfully if R is not on your PATH
and even if R is not installed at all. The hook will create a conda environment that has the necessary dependencies (in this case only base R).
pre-commit mirrors for conda
As the language: conda
feature is quite new to pre-commit
, we also needed to take care of having the hooks available for conda. For starters, this meant converting the existing Python/virtualenv-based hooks to conda as explained above in the Python pre-commit hook with conda section. At the time of writing this post, we have also converted the following hooks to use conda:
mypy
: https://github.com/Quantco/pre-commit-mirrors-mypyflake8
: https://github.com/Quantco/pre-commit-mirrors-flake8isort
: https://github.com/Quantco/pre-commit-mirrors-isortpyupgrade
: https://github.com/Quantco/pre-commit-mirrors-pyupgradeblack
: https://github.com/Quantco/pre-commit-mirrors-black
These repositories contain the same configurations as their virtualenv
equivalents but install all dependencies from conda-forge
via conda
.
Universal pre-commit config behind firewalls
As we already mentioned, we use these hooks in firewalled/internet-decoupled environments. While we are not able to reach https://github.com
in these environments, we can still use the same .pre-commit-config.yaml
we are using in settings where we have full internet access. You can still specify https://github.com/
URLs in your configuration but let git
rewrite the clone URLs on the fly to a local git repository mirror using:
git config --global url."https://github-mirror.local".insteadOf "https://github.com"
This is not actually related to the conda
support, but it is definitely worth a note for all who face the same situation.