Published on

Automated License Checking for Conda Projects

Authors

Third-party software is a cornerstone of many modern software projects. These third-party packages come with their own license files.

Instead of forcing all our developers to be familiar with their details, we automate the process of checking our projects for license compliance.

This post is inspired by our recent open-sourcing of conda-deny, a simple CLI to check packages for such compliance.

Why care about licenses?

Almost every bit of software you can find on GitHub or other platforms contains an associated license file. Licenses can seem intimidating at first. Their sheer number makes many people uncomfortable.

We want to introduce you to some ways of thinking about licenses and some tools that might make them less frightening.

When you create a piece of software, you hold its copyright. This means that you can control what others can do with your code. To exercise this control, you create a license text. It serves as legal instruction regarding what permissions you grant to others.

When providing software to third-parties or selling it as a product, the license compliance of your package becomes important. Not only do you need to choose the appropriate license for your own code, you also need to make sure that all the packages your project depends on comply with your intended distribution needs.

Consider the following example:

You include a package with the Unlicense license in your project. This leaves you free to modify, sell, and distribute your software however you wish.

If, however, you had included a package with the GPL-3.0 license, this would force you to make the source code of your own package available to everybody! For software you sell, this can be a problem.

Thus, when relying on open-source software, it becomes crucial to check whether your dependencies include licenses like GPL-3.0 (so-called "strong copyleft" licenses) or other problematic legalities.

Nice to know: Licenses in the GPL family are sometimes referred to as "viral" licenses. They "infect" all downstream packages with their own license.

The SPDX License Format

The overwhelm that a lot of people feel when first reading about software licenses becomes even more pronounced when we consider that, in theory, every person can create their own license with its own name. This means that licenses like "# @_bfv My cool new license" are, in theory just as valid as "GPL-3.0".

In addition to the name, everybody can write their own legal instructions in the license text. If you suspect that this will result in absolute chaos, you are right.

Luckily, the SPDX initiative has created a standardized way to represent licenses. They feature a vast majority of existing licenses and provide a standardized format for them.

Nice to know: If you ever see a license identifier that includes whitespaces, e.g., BSD 3-Clause, you can be sure that it is not in SPDX format.

This layer of abstraction makes it easier to check for license compliance. Instead of manually reading the license text and checking whether it complies with our policies, you can now just check the SPDX identifier against a whitelist. This whitelist contains SPDX licenses we have already checked and are comfortable with.

conda-forge

QuantCo relies on the conda ecosystem, especially packages from the conda-forge distribution, for most of its dependencies. conda-forge helps to enforce the SPDX license format by requiring all packages to include a license_file field in their recipe file.

License file missing in conda-forge feedstock

It also encourages feedstock maintainers to ensure that the license file is being packaged and in SPDX format.

License file missing in conda-forge feedstock

conda-deny

conda-deny builds on the SPDX specification to provide a simple CLI for checking packages for license compliance. It leverages the local availability of license specifiers in pixi.lock files to check their corresponding licenses against a user-provided whitelist.

conda-deny demo

This is inspired by cargo-deny, which offers similar functionality for Rust projects.

Checking pixi projects

The most minimal version of conda-deny is running it with the --osi flag. This allows you to not provide a whitelist and just check the licenses for OSI approval. For details about the whitelist format, check out the section on Configuration.

# Assuming you have a pixi.lock file in the current directory
conda-deny check --osi

The metadata in pixi's project configuration, makes the retrieval of associated SPDX identifiers straightforward.

Checking non-pixi projects

In the case that you have a setup that doesn't specify its environment locations, e.g. by using micromamba or conda as a package manager, we provide the --prefix flag. You can use it to specify the path to the environment you want to check.

micromamba env list
> test-env                          /Users/user/micromamba/envs/test-env  

conda-deny check --prefix /Users/user/micromamba/envs/test-env  

Checking recipes

When building a conda package you need to define the environment that the package will be installed in. It can be helpful to check whether this environment complies with your license constraints. A setup along the following lines should work in most cases:

# recipe.yml
tests:
 - script:
     - if: unix
    then: conda-deny check --prefix $CONDA_PREFIX
    else: conda-deny check --prefix %CONDA_PREFIX%
    files:
      source: pixi.toml
    requirements:
      run: [conda-deny]

rattler-build allows you to run arbitrary scripts at test time in the build process. This example uses the conda-deny configuration in pixi.toml to check the environment created at ($CONDA_PREFIX).

Configuration

You can configure conda-deny in your pixi.toml or pyproject.toml. Alternatively, you can provide a custom configuration path with the --config flag.

The following configuration options are available:

[tool.conda-deny]
#--------------------------------------------------------
# General setup options:
#--------------------------------------------------------
license-whitelist = "https://raw.githubusercontent.com/QuantCo/conda-deny/main/tests/test_remote_base_configs/conda-deny-license_whitelist.toml" # or ["license_whitelist.toml", "other_license_whitelist.toml"]
platform = "linux-64" # or ["linux-64", "osx-arm64"]
environment = "default" # or ["default", "py39", "py310", "prod"]
lockfile = "environment/pixi.lock" # or ["environment1/pixi.lock", "environment2/pixi.lock"]

#--------------------------------------------------------
# License whitelist directly in configuration file:
#--------------------------------------------------------
safe-licenses = ["MIT", "BSD-3-Clause"]
ignore-packages = [
 { package = "make", version = "0.1.0" },
]

Conclusion

Licenses are an ever-present topic in the world of software engineering. We hope this short post has given you some idea of how you can deal with licenses in your projects and how conda-deny can help you with this task.