Investigate your package dependencies with prdeps and Unix

At $DAYJOB I work on a very large Go application; hundreds and hundreds of packages. Recently I’ve been trying to untangle some code that has inadvertently grown huge trunks of dependencies. I suspect this is what is causing the time taken to link our tests to become the subject of ridicule.

I’ve tried previously to visualise the dependency graph of a package, and found the results unsatisfying. This time around I decided to write something simpler, and was pleased with the results. I present prdeps.

In traditional unix style, prdeps does very little, and expects to be part of a larger text processing pipeline.

% prdeps github.com/pkg/sftp
github.com/pkg/sftp:
  github.com/kr/fs:
  golang.org/x/crypto/ssh:
    golang.org/x/crypto/curve25519:

Because Go packages are not a DAG, but a graph not a tree, but a directed graph, the output of running prdeps on any non trivial package is going to be verbose–be prepared for this.

prdeps, like go list takes a -f flag to modify its output. In this example we alter the output format from the usual indented version (which is presented to the template as .Indent) and disable suppression of the stdlib with the -s flag.

% prdeps -s -f {{.ImportPath}} github.com/pkg/sftp | head -n5
github.com/pkg/sftp
bytes
errors
io
errors

Note that errors appears twice in the first five lines of output. errors actually appears 763 times in the output because almost every package either imports it directly and imports a package which also imports errors.

% prdeps -s -f {{.ImportPath}} github.com/pkg/sftp | grep -c errors
763

Another prdeps feature is to print the import graph from the perspective of a test (-t is for internal tests, -T is for external):

% prdeps -T github.com/pkg/sftp
github.com/pkg/sftp:
  github.com/pkg/sftp:
    github.com/kr/fs:
    golang.org/x/crypto/ssh:
      golang.org/x/crypto/curve25519:
  golang.org/x/crypto/ssh:
    golang.org/x/crypto/curve25519:

Compare this to the previous non test output above. This feature was why I built prdeps as I wanted to track down reason the linker was taking so long to link the tests for some of our packages.

prdeps took about 30 minutes to write, and another hour to address the performance issues from several million lines of output that are produced by a non trivial invocation. I’m sure the same result could be done with the right amount of go list, but the pleasure of being able to write exactly the tool I wanted for the job at hand was reason enough for me.

Unhelpful abstractions

Sandi Metz’s post on abstraction struck a chord with me recently. I was working with a piece of code which looked like this (in pseudo code):

func Start() {
        const filename = "..."
        createOuputFile(filename)
        go run(filename)
}

It turned out that createOutputFile was written in an obscure way which first caused me to look at it more closely. Why the code expected the file to exist before starting wasn’t immediately clear. It may have been because some other goroutine was expecting the file to exist on disk, even if nothing had been written yet (slight race smell), or more likely the necessary information was not available for the job itself to create the file with the correct permissions. This calls for a refactoring!

There is a well known UNIX utility that provides these semantics, touch(1). So, I reasoned, createOutputFile is really touch plus the ability to set file permissions, which the former was hard coded to do implicitly. This was a job for abstraction!

How would you write the signature for TouchFile? Here is what I came up with:

// TouchFile ensures path exists, or creates it using
// the supplied file mode.
func TouchFile(path string, mode os.FileMode) error

This looked pretty reasonable, and was a nice generalisation over the previous function. TouchFile makes setting the file mode on creation, the primary reason why this code existed in the first place, explicit.

However, this is precisely the train of thought that Metz warned of in her post. By generalising this function I had made its API worse.

Specifically, now every caller to this function has to pass in a mode value, even if the file exists, even if they don’t really care and are happy with a default file mode. Worse still, mode is only applied if the file is not already present. Not only had I made this function harder to use in its default use case, I’d added a footgun to the API that someone might call this function expecting it to update the mode of an existing file.

The second clue that I was heading in the wrong direction was the implementation of TouchFile itself:

func TouchFile(path string, mode os.FileMode) error {
        f, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE, mode)
        if err != nil {
                return err
        }
        return f.Close()
}

TouchFile just calls os.OpenFile passing in the right flags to get the create-if-missing semantics it wants. You still have to pass in the mode, because os.OpenFile requires a mode, so the utility of TouchFile as a wrapper is undermined by the cognitive overhead of having to remember its quirks.

Coming to my senses, I reverted my change and replaced createOutputFile with a direct call to os.OpenFile.

Whereas someone reading this code and seeing a call to TouchFile may think that the goal is to ensure the file exists, will miss the subtle point that purpose of Start was to ensure that the file exists with the right permission. By making a direct call to the os package in body of the function it becomes explicit to the next reader, who already knows the os package, that the file is being created explicitly is to set the mode.

I realised that I hadn’t made things simpler by adding this abstraction, instead I’d made them more opaque. Sometimes it’s better to be explicit than abstract.