Python linting at Venmo

August 28, 2015
cross-posted at the Venmo blog

Quick! What’s wrong with this (contrived) Python 2 code?

import sys

class NotFoundError(Exception):
   pass

def enforce_presence(key, entries):
   """Raise NotFoundError if key is not in entries."""

   for entry in entries:
       if entry == key:
           break
   else:
       NotFoundError

If you said the unused import and missing raise keyword, you’re right! But, if you took longer than a quarter of a second to answer, sorry: you were outperformed by my linting tools.

Don’t feel bad! Linting is designed to detect these problems more quickly and consistently than a human. There are two ways to make use of it: manually or automatically. The former is flexible but not robust, while the latter risks getting in the way. We lint automatically at Venmo; here’s how we strike a balance between flexibility and enforcement.

We use a collection of linting tools. Currently we use flake8, pylint and a custom internal tool. They each address different needs: flake8 quickly catches simple errors (like the unused import), pylint slowly catches complex errors (like the missing raise), and our internal tool catches errors that are only relevant to Venmo. For easy use from the shell, we combine their output with git-lint and a short script. This setup catches a wide variety of errors and can easily accommodate new linters. Here’s what it looks like when run on the code from this post:

$ ./lint example.py
Linting file: example.py FAILURE
line 1, col 1: [F401]: 'sys' imported but unused
line 15, col 8: Warning: [W0104]: Statement seems to have no effect

Linting happens in three places during our workflow: in-editor, pre-commit, and during builds. The first step varies for each of us since we don’t all use the same editor (though vim with syntastic is a common choice). This is the step with the fastest feedback loop: if you don’t currently use linting, start with this.

The second step is implemented with a git pre-commit hook. It lints all the files about to be committed and aborts the commit if there are problems. Sometimes we opt out of this check - maybe we know about the problems and plan to address them later - by using git’s built in --no-verify flag.

Finally, any errors that survive to a pull request will be caught during build linting on Jenkins. It’s similar to the pre-commit check, but runs on all files that have been changed in the feature branch. However, unlike the pre-commit check, our build script uses GitHub Enterprise’s comparison api to find these files. This eliminates the need to download the repository’s history, allowing us to save bandwidth and disk space with a git shallow clone.

No matter when linting is run, we always operate it at the granularity of an entire file. This is necessary to catch problems such as unused imports or dead code; these aren’t localized to only modified lines. It also means that any file that’s been touched recently is free of problems, so it’s rare that we need to fix problems unrelated to our changes.

All of our configuration is checked into git, pinning our desired checks to a specific version of the codebase. Checks that we want to enable are whitelisted, allowing us to safely update our linters without worrying about accidentally enabling new, unwanted checks.

When enabling a new check, we also fix any existing violations. This avoids chilling effects: we don’t want to discourage small changes through fear of cleaning up lots of linting violations. It also incentivizes automated fixes, which saves engineering time compared to distributed manual editing.

Hopefully, sharing our linting workflow helps save you some time as well!

Simon Weber

Python linting at Venmo