Making the conda(-forge) ecosystem ready for cybersecurity regulations

Supply chain security has become increasingly important for securing software environments and has gained the attention of regulators and customers. Especially, the SolarWinds supply chain attack of 2020 has put this issue of supply chain security into the public spotlight. This has also led to various regulations across the globe that mandate minimal requirements towards checking your supply chain. There are comparable regulations in all other parts of the world, but as examples from the EU, there are NIS2, DORA and the Cyber Resiliency Act (CRA). While most apply only to a select set of industries, the CRA covers, roughly speaking, every product and service with a commercial component that uses software.

At QuantCo, we extensively use the conda ecosystem to supply our dependencies. Here, conda-forge is the primary source of our dependencies. While we are deeply involved in this community by maintaining over 1000 feedstocks, helping maintain the automation, and employing core members, the sheer scale means that automation is needed to maintain compliance and ensure secure and stable build, development, and production environments. While on paper, one could achieve most of the requirements by running a set of well-known tools on our code bases and installed environments, this would be orthogonal to our striving for excellence and community engagement.

In this blog post, we want to share with the community our investments in making sure that common tools detect all our dependencies and how we want to solve various problems in general, not only for the subset of dependencies we care about. Our aim is not to build these things alone but to engage with the various communities involved and hope that other parties will join our efforts.

Phantom Dependencies & friends

In 2023, Endor Labs coined the term Phantom Dependencies for dependencies not covered by your manifest. In the specific case of wheels on PyPI, these dependencies are packaged into a wheel but not Python code themselves (JavaScript, C++, …). In the case of conda packages, we don't have the problem of missing support for multiple ecosystems, but we still have cases where dependencies are statically packed into packages (JavaScript, Go, Rust). Furthermore, conda environments are often invisible to standard tools because these only support the major ecosystems. Thus, often, only the Python packages of an environment are detected, skipping the dependencies from all other ecosystems.

Luckily, most systems can ingest a software bill of materials (SBOM) as their input. Thus, to add support for conda environments and include all Phantom dependencies of these environments, we need to ensure we can produce high-quality SBOMs.

The first step is to give visibility into the installed Python packages. We are using syft as our preferred scanning and SBOM generation tool. Thus, the natural step is to add parsing of the information in the conda-meta/ folder to it: syft#4002. This information already gives basic visibility into all dependencies installed in the environment, allowing CVE scanning tools to also find vulnerabilities in conda environments that were previously undetected.

Still, this doesn't fully address the problem of Phantom Dependencies in the conda ecosystem. While syft's binary inspections already discover vendored JavaScript and Go dependencies, Rust binaries and their dependencies are invisible to it. By default, you cannot infer the composition of a program from its binary. By using cargo-auditable, we can make it comparable to Go, i.e., the dependency tree is embedded into the binary itself. To use this for all binaries that contain Rust on conda-forge, we opened a new conda-forge enhancement proposal (CFEP) to make all Rust packages auditable.

With the combination of conda metadata and (binary) inspection of bundled dependencies, we can now precisely track the contents of our environments. Binary inspection is a costly process, and thus, we prefer that information be pre-computed in each conda package. Thus, we plan to open a conda enhancement proposal (CEP) on how packages can ship SBOMs and where they should be stored in the final environment. Similarly, we will probably write a follow-up CFEP on how these SBOMs should be computed in the conda-forge ecosystem.

CVE mapping

Now that we can easily list what is in our dependency tree, we also want to leverage that information to ensure that our current set of dependencies doesn't contain any vulnerabilities. Fortunately, most conda packages are directly mapped to their equivalents in the upstream ecosystem. Thus, the previously mentioned PR to syft brings us quite far. Still, there are cases where in conda-forge, some packages are split into smaller units than there are on, e.g. PyPI, or some dependency is provided by multiple and thus differently named packages.

One way to identify packages across ecosystem boundaries is to attach PURLs (Package URLs) to them. Thus, we look forward to progress on ceps#63, which aims to attach the PURL of the upstream package to the resulting conda package and define how the PURL of a conda package should look.

With PURLs, we will probably get as far as possible via automation. Still, there will be situations where packages must be manually matched to CVEs or even cases where some vulnerabilities only exist in conda-forge. For that, it would be helpful if conda-forge itself became a CVE Numbering Authority (CNA). Being a CNA would allow conda-forge to map existing CVEs to conda packages and publish their own. We want to work jointly with prefix.dev on these efforts.

Assessing the quality of a dependency

We are not only interested in whether any of our dependencies are currently vulnerable, but we would also like to assess the quality of them to minimise the likelihood of vulnerabilities and ensure that the packages are maintained, regularly updated and coded to modern standards. Sadly, this is not as binary as has-a-vulnerability in the above sections. Thus, we need to employ different heuristics to assess whether we want to take on a dependency.

For this, we will also differentiate between a direct and a transitive dependency. We can explicitly choose direct dependencies, whereas the direct ones freely choose the transitive ones. We build up a trust relationship with our direct dependencies.

Several tools already exist that run these checks to assess whether a direct dependency is in a good state and follows important (from a compliance Point of View) best practices, like having a vulnerability policy or being actively maintained. The most prominent (i.e. most cited in documents that help maintainers prepare for the mentioned regulations) is the OpenSSF Scorecard. While the scorecard project itself already scans the 1 million most critical projects, we want to expand this by adding the computation of the scorecard to relevant projects in the conda ecosystem directly and help these projects also get a good score there, e.g. by adding it to pixi.

The OSSF Scorecard may be a good heuristic for determining whether a project is on a good path, but in many cases, we want to be a bit more granular with what we check, or many projects will have an awful scorecard score despite being well-maintained. For these, we plan to develop tooling that lets us review them. Projecting this onto our efforts in the conda ecosystem, we plan to develop a tool to check conda packages post-build, i.e. work on packages instead of recipes (like conda-smithy does). The intended functionality will be a mix of automatic and manual audits. We expect the manual behaviour to be similar to cargo-vet.

License Checking

One final piece needed in the conda ecosystem to enable users to comply with the above-mentioned regulations is ensuring that all dependencies have a matching license. For us, this problem is already solved neatly by employing conda-deny to do automated license checking in our projects. While we are happy to maintain our internal allowlist of dependencies, we will continue our work here to ensure that the metadata in conda packages is written in the proper format (the SPDX notation).