Python modules management for newbies

TL;DR. Python modules can cause many headaches due to packages being spread across different places in the OS. Golang is cool and installs modules only in one directory. Python solves its problems by utilizing the $PYTHONPATH variable

There is a well explored, common path that most aspiring software developers are forced to follow. Especially Python ones. It goes like this.

You search for a tutorial on the internet. You install needed tools, you set up your coding environment. You get to see this sweet Hello world! printed out on the console and then the fun part begins. You get to play with the new technology and you quickly learn all the new stuff. Then you go to sleep, go back to the normal life and forget about the new tech for a while. You’re fine.

Then, a week later you go back to the project. You forget to:

  • choose the proper directory
  • activate your virtual environment
  • checkout the right git branch
  • pray to the software overlords and sacrifice the loved ones in hopes that everything will work just as you left it

and in the result you get this sweet sweet error:

Why aren't you there???

You know it, right?

I know it too well. And after writing this article I hope to put an :end: to the struggle

How does Golang solve the problem?

This post will be mainly about Python, but recently I picked up Golang and immediately fell in love with the simplicity of its packaging.

In Go, you work in a single workspace, a folder in your (presumably) home directory that is structured like this:

bin/
    hello                 # command executable
src/
    github.com/user/	  # directory with code
	...

Generally speaking, bin stores all your executable binaries and src stores ALL the code that you need for your development. It is a place for both your own projects and needed libraries. So after you run

$ go get github.com/golang/example/hello

the project hello will land comfortably in directory

$GOPATH/src/github.com/golang/example/hello

Then you can import its parts by simply typing

import (
	"fmt"

	"github.com/golang/example/hello"
)

in your source code. Simple, effective and potentially fool proof.

Why does Python struggle so much?

In my opinion Python has two problems:

  1. The Python community consist of mostly beginners
  2. Coming from data science background, people are often forced to use two package managers at once

Hear me out.

A person starting to learn Python without prior programming background would often think that the packages they import are something like a magical entity, that once installed just “sits there” on a computer and simply works. It is all ok until they eventually try to jump to some other project, they clone the repo and all package dependencies are suddenly mess. They go to StackOverflow, they are told to run

$ pip install -r requirements.txt

but it doesn’t seem to work because they would only have worked with

$ conda install

previously. From this point the only reasonable option is to reinstall Python completely :sweat_smile:

Or maybe not?

How does managing modules in Python really work

First, let’s make sure you are familiar with system variables. Ever wondered how does the system know which program to run, after typing ls? The information about its location has to be stored somewhere. In Unix systems, one uses the PATH variable for that. Here is an example of a pretty standard PATH variable

Why aren't you there???

After typing any command into the shell, the interpreter would first search all the directories listed in the variable and execute the first one with a matching name. If it doesn’t find anything, it outputs command not found.

In the Python environment, there is a similar variable, called PYTHONPATH. It works exactly like PATH but for Python imports. Once you execute it, the python interpreter will search for the module in all locations specified in the PYTHONPATH. When it doesn’t find it, the user gets the dreaded ModuleNotFind error. Let’s put that to a test.

Currently, I use conda for virtual environment management. Virtual environemnts in Python allow you to control which version of given packages you use. For me it was always mysterious how is it obtained. Is each module installed sperately?

No! Leaving things as simple as possible, each virtual environment changes your PYTHONPATH and then searches for the needed packages. Here is a an example.

Regular Python:

Why aren't you there???

Python after activating a virtual environment:

Why aren't you there???

First, we check the current PYTHONPATH but importing sys and printing out the content of the variable. Then we try to import a popular numerical computing library, numpy.

Numpy is not installed by defualt in Python, so it is not to be found in default Python workspace. However, after installing it with command

conda activate blog_test
conda install numpy

we can use it. Conda installed the module in one of the folders specified by its PYTHONPATH.

Summary

Proper understanding of your os paths is essential for being a great, 10X programmer. I always struggled with this, especially being a Python newbie.

Turns out looking at other languages is sometimes a great way to learn your current one!

Golang has its own problems especially with version management, but no one can say it is complicated and I appreciate it a lot.

I hope you have learnt something today. As always, you can comment on the article by writing to me on twitter or on HackerNews

To the next one,

Wojtek