In early 2017, we wrote up our strategy at the time for using Conda in combination with Docker and we wanted to issue a quick update based on some new experiences we’ve had over the intervening year and a half!
In general, we’ve been pretty happy with Conda. Conda very nicely manages environments a la virtualenvwrapper while also managing versions as in pyenv, and even keeping the versions and the virtualenvs in sync. We ended up doing a significant amount of research to try to find a way to have all of conda’s niceties with other tools, and we just weren’t able to swing it. Conda’s UX is comparatively really good.
We however have had a few problems. One is the issue we wrote about in the previously mentioned article back in 2017: Conda’s environment.yml doesn’t play nice with non-environment contexts like what’s typically seen in docker containers. This never quite sat right with us but the solution we found at the time was satisfactory.
There is one other, newer problem though, for which we never found a satisfactory solution: Conda fails to do installs in CI and production environments a non-negligible percent of the time. In particular, in some of our Jenkins jobs we’ve seen the following:
CondaError: An error occurred when loading cached repodata. Executing `conda clean —index-cache` will remove cached repodata files so they can be downloaded again.
“They must be doing something wrong, I’ve never heard of that,” you’re probably thinking. It’s possible! We haven’t noticed this cropping up during local installs, and think it might have something to do with CI using the same environment for multiple projects. We weren’t able to prove this, though! Moreover, the original core issue of how we were complicating our deploys stuck in our craw. Also, for the record, we did try naively running conda clean —index-cache in CI; no dice.
So here’s our alternate solution to this problem, now live in our data warehouse:
- Continue using Conda to manage python versions and environments, but remove all dependencies except pip and python from the environment.yml
- Move all of those dependencies into a requirements.txt file
- Our Dockerfiles are generated from templates, so we’re able to insert the python version from our environment.yml into the Docker image tag
- Our Dockerfiles now install the dependencies in the requirements file against the Docker container’s global python
We find that this makes for a nice compromise. We get the simplicity of pip installs with Docker, while gaining the benefits of Conda’s management of environments and python versions.
Our workflow hasn’t changed much either, since we generally already have been using requirements_test.txt files to manage testing dependencies. It also means that other projects which use Conda to manage all dependencies (and don’t need this Docker support) can continue to live alongside our data warehouse in development.