Tag Archives: dependency management

Thinking about $GOPATH

This is a short blog post about my thoughts on using Go in anger through several workplaces, as a developer and an advocate.

What is $GOPATH?

Back when Go was first announced we used Makefiles to compile Go code. These Makefiles referenced some shared logic stored in the Go distribution. This is where $GOROOT comes from.

Back then, if you wrote Go code, you’d probably also used these Makefiles, and while you could check out your source code anywhere, most people would put their own Go code in what today we’d call $GOROOT/src as you must’ve compiled Go from source, so this directory was always going to be present.

Towards the 1.0 release goinstall, then go get, solidified the use of domain names in import paths to provide a globally unique namespace. These tools introduced a new location into which Go code would be fetched. This location was separate from $GOROOT to make clear the distinction between code provided by the Go project, and code written by the developer. By the time Go 1.1 was released in 2013, $GOROOT was removed as a fallback option.

Why does $GOPATH exist?

$GOPATH exists for two main reasons:

  1. In Go, the import declaration references a package via its fully qualified import path. $GOPATH exist so that from any directory inside $GOPATH/src the go tool can compute the absolute import path of the package in question.1
  2. A location to store dependencies fetched by go get.

Having a per user $GOPATH environment variable also means developers could use the go tool from any directory on their system to build, test and install code, but I suspect only a minority utilise this feature.

What’s wrong with $GOPATH?

In my experience, many newcomers to Go are frustrated with the single workspace $GOPATH model. They are confused that $GOPATH doesn’t let them check out the source of a project in a directory of their choice like they are used to with other languages. Additionally, $GOPATH does not let the developer have more than one copy of a project (or its dependencies)  checked out at the same time without having to update $GOPATH constantly.

I think it is important to recognise that these issues are legitimate points of confusion for many newcomers (including those on the Go team) and act as a drag on Go adoption. As we’re on the cusp of a blessed dependency management tool for Go, I think it’s equally important to continue to question the base assumptions that this new tool will build on, namely requiring a $GOPATH.

In my opinion, any Go build tool needs to provide (in addition to actually building and testing code) a way for Go code checked out in an arbitrary location on disk to recover its intended fully qualified import path; the path other code will import it as.

The $GOPATH model answers this question by subtracting the prefix of $GOPATH/src from the path to the directory of the current package; the remainder is the package’s fully qualified import path. This is why if you check out a package outside a $GOPATH workspace, the go tool cannot figure out the packages’ fully qualified import path and everything falls apart.

What are some alternatives to $GOPATH?

I attempted to address both issues with gb, which gives developers the ability to check out a project anywhere you want, but has no solution for libraries, and gb projects were not go gettable. However gb showed that writing a new build tool that did not wrap the go tool meant it was not forced to reorganise the world to fit into the $GOPATH model allowing gb users to include the source of all their dependencies in their project without the pitfalls of the Go 1.6’s vendor/ directory.

Recently, on a suggestion from Bill Kennedy, I built an experimental build tool that recorded the expected import prefix in a manifest file. That prefix, rather than one computed by $GOPATH directory arithmetic, is used to determine the fully qualified import path.

I’m working on a similar tool (unfinished) based on a suggestion from Brad Fitzpatrick that uses the .git directory as a sentinel to determine the root of the project and hopefully infer the full import path from the git remote configuration.

While these experiments are unfinished, both demonstrate that you can avoid the $GOPATH restrictions and retain compatibility with the go get ecosystem. Potentially in the case of Kodos, even avoid a manifest file.

Conclusion

Kang and Kodos use a lot of forked code from gb, which I hope to rectify over the new years’ break. If you are interesting in contributing or better yet, building your own Go tool to explore this problem space, Kang, Kodos, and gb are permissively licensed.


Notes:

  1. This is notably different from the way imports work in scripting languages like Python and Ruby, which use directly scanning and inserting onto a global search path source code directories.

Automatically fetch your project’s dependencies with gb

gb has been in development for just over a year now. Since the announcement in May 2015 the project has received over 1,600 stars, produced 16 releases, and attracted 41 contributors.

Thanks to a committed band of early adopters, gb has grown to be a usable day to day replacement for the go tool. But, there is one area where gb has not lived up to my hopes, and that is dependency management.

gb’s $PROJECT/vendor/ directory was the inspiration for the go tool’s vendor/ directory (although their implementations differ greatly) and has delivered on its goal of reproducible builds for Go projects. However, the success of gb’s project based model, and vendoring code in general, has a few problems. Specifically, wholesale copying (or forking if you prefer) of one code base into another continues to sidestep the issue of adoption of a proper release and versioning culture amongst Go developers.

To be fair, for Go developers using the tools they have access to today–including gb–there is no incentive to release their code. As a Go package author, you get no points for doing proper versioned releases if your build tool just pulls from HEAD anyway. There is similarly limited value in adopting a version numbering policy like SemVer if your tools only memorise the git revision you last copied your code at.

A second problem, equally poorly served by gb or the vendor/ support in the go tool, are developers and projects who cannot, usually for legal reasons, or do not wish to, copy code wholesale into their project. Suggestions of using git submodules have been soundly dismissed as unworkable.

With the release of gb 0.4.3, there is a new way to manage dependencies with gb. This new method does not replace gb vendor or $PROJECT/vendor as the recommended method for achieving reproducible builds, but it does acknowledge that vendoring is not appropriate for all use cases.

To be clear, this new mode of managing dependencies does not supersede or deprecate the existing mechanisms of cloning source code into $PROJECT/vendor. The automatic download feature is optional and is activated by the project author creating a file in their project’s root called, $PROJECT/depfile.

If you have a gb project that is currently vendoring code, or you’re using gb vendor restore to actively avoid cloning code into your project, you can try this feature today, with the following caveats:

  1. Currently only GitHub is supported. This is because the new facility uses the GitHub API to download release tarballs via https. Vanity urls that redirect to GitHub are also not supported yet, but will be supported soon.
  2. The repository must have made a release of its code, and that release must be tagged with a tag containing a valid SemVer 2.0.0 version number. The format of the tag is described in this proposal. If a dependency you want to consume in your gb project has not released their code, then please ask them to do so.

Polishing this feature will be the remainder of the 0.4.x development series. After this work is complete gb vendor will be getting some attention. Ultimately both gb vendor and $PROJECT/depfile do the same thing–one copies the source of your dependencies into your project, the other into your home directory.

Gophers, please tag your releases

What do we want? Version management for Go packages! When do we want it? Yesterday!

What does everyone want? We want our Go build tool of choice to fetch the latest stable version when you start using the package in your project. We want them to grab security updates and bug fixes automatically, but not upgrade to a version where the author deleted a method you were using.

But as it stands, today, in 2016, there is no way for a human, or a tool, to look at an arbitrary git (or mercurial, or bzr, etc) repository of Go code and ask questions like:

  • What versions of this project have been released?
  • What is the latest stable release of this software?
  • If I have version 1.2.3, is there a bugfix or security update that I should apply?

The reason for this is Go projects (repositories of Go packages) don’t have versions, at least not in the way that our friends in other languages use that word. Go projects do not have versions because there is no formalised release process.

But there’s vendor/ right?

Arguing about tools to manage your vendor/ directory, or which markup format a manifest file should be written in is eating the elephant from the wrong end.

Before you can argue about the format of a file that records the version of a package, you have to have some way of actually knowing what that version is. A version number has to be sortable, so you can ask, “is there a newer version available than the one you have on disk?” Ideally the version number should give you a clue to how large the jump between versions is, perhaps even give a clue to backwards or forwards compatibility between two versions.

SemVer is no one’s favourite, yet one format is everyone’s favourite.

I recommended that Go projects adopt SemVer 2.0.0. It’s a sound standard, it is well understood by many, not just Go programmers, and semantic versioning will let people write tools to build a dependency management ecosystem on top of a minimal release process.

Following the lead of the big three Go projects, Docker, Kubernetes, and CoreOS (and GitHub’s on releases page), the format of the tag must be:

v<SemVer>

That is, the letter v followed by a string which is SemVer 2.0.0 compliant. Here are some examples:

git tag -a v1.2.3
git tag -a v0.1.0
git tag -a v1.0.0-rc.1

Here are some incorrect examples:

git tag -a 1.2.3        // missing v prefix
git tag -a v1.0         // 1.0 is not SemVer compliant
git tag -a v2.0.0beta3  // also not SemVer compliant

Of course, if you’re using hg, bzr, or another version control system, please adjust as appropriate. This isn’t just for git or GitHub repos.

What do you get for this?

Imagine if godoc.org could show you the documentation for the version of the package you’re using, not just the latest from HEAD.

Now, imagine if godoc.org could not just show you the documentation, but also serve you a tarball or zip file of the source code of that version. Imagine not having to install mercurial just to go get that one dependency that is still on google code (rest in peace), or bitbucket in hg form.

Establishing a single release process for Go projects and adopting semantic versioning will let your favourite Go package management or vendoring tool provide you things like a real upgrade command. Instead of letting you figure out which revision to switch to, SemVer gives tool writers the ability to do things like upgrade a dependency to the latest patch release of version 1.2.

Build it and they will come

Tagging releases is pointless if people don’t write tools to consume the information. Just like writing tools that can, at the moment, only record git hashes is pointless.

Here’s the deal: If you release your Go projects with the correctly formatted tags, then there are a host of developers who are working dependency management tools for Go packages that want to consume this information.

How can I declare which versions of other packages my project depends on?

If you’ve read this far you are probably wondering how using tagging releases in your own repository is going to help specify the versions of your Go project’s dependencies.

The Go import statement doesn’t contain this version information, all it has is the import path. But whether you’re in the camp that wants to add version information to the import statement, a comment inside the source file, or you would prefer to put that information in a metadata file, everyone needs version information, and that starts with tagging your release of your Go projects.

No version information, no tools, and the situation never improves. It’s that simple.

Visualising dependencies

Juju is a pretty large project. Some of our core packages have large complex dependency graphs and this is undesirable because the packages which import those core packages inherit these dependencies raising the spectre of an inadvertent import loop.

Reducing the coupling between our core packages has been a side project for some time for me. I’ve written several tools to try to help with this, including a recapitulation of the venerable graphviz wrapper.

However none of these tools were particularly useful, in fact the graphical tools I wrote became unworkable visually well before their textual counterparts — at least you can grep the output of go list to verify if a package is part of the import set of not.

Visualising the import graph

I’d long had a tab in my browser open reminding me to find an excuse to play with d3. After a few false starts I came up with tool that took a package import path and produced a graph of the imports.

Graph of math/rand's imports

math/rand tree graph

In this simple example showing math/rand importing its set of five packages, the problem with my naive approach is readily apparent — unsafe is present three times.

This repetition is both correct, each time unsafe is mentioned it is because its parent package directly imports it, and incorrect as unsafe does not appear three times in the final binary.

After sharing some samples on twitter, rf and Russ Cox suggested that if an import was mentioned at several levels of the tree it could be pushed down to the lowest limb without significant loss of information. This is what the same graph looks like with a simple attempt to implement this push down logic.

math/rand pushdown tree

math/rand pushdown tree

This approach, or at least my implementation of it, was successful in removing some duplication. You can see that the import of unsafe by sync has been pruned as sync imports sync/atomic which in turn imports unsafe.

However, there remains the second occurrence of unsafe rooted in an unrelated part of the tree which has not been eliminated. Still, it was better than the original method, so I kept it and moved on to graphing more complex trees.

crypto/rand pushdown tree

crypto/rand pushdown tree

In this example, crypto/rand, though the pushdown transformation has been applied, the number of duplicated imports is overwhelming. What I realised looking at this graph was even though pushdown was pruning single limbs, there are entire forks of the import graph repeated many times. The clusters starting at sync, or io are heavily duplicated.

While it might be possible to enhance pushdown to prune duplicated branches of imports, I decided to abandon this method because this aggressive pruning would ultimately reduce the import grpah to a trunk with individual imports jutting out as singular limbs.

While an interested idea, I felt that it would obscure the information I needed to unclutter the Juju dependency hierarchy.

However, before moving on I wanted to show an example of a radial visualisation which I played with briefly

crypto/rand radial graph

crypto/rand radial graph

Force graphs

Although I had known intuitively that the set of imports of a package are not strictly a tree, it wasn’t clear to me until I started to visualise them what this meant. In practice, the set of imports of a package will fan out, then converge onto a small set of root packages in the standard library. A tree was the wrong method for visualising this.

math/rand force graph

math/rand force graph

While perusing the d3 examples I came across another visualisation which I felt would be useful to apply, the force directed graph. Technical this is a directed acyclic graph, but the visualisation applies a repulsion algorithm that forces nodes in the graph to move away from each other, hopefully forming a useful display. Here is a small example using the math/rand package

Comparing this to the tree examples above the force graph has dealt with the convergence on the unsafe package well. All three imports paths are faithfully represented without pruning.

But, when applied to larger examples, the results are less informative.

crypto/rand force graph

crypto/rand force graph

I’m pretty sure that part of the issue with this visualisation is my lack of ability with d3. With that said, after playing with this approach for a while it was clear to me that the force graph is not the right tool for complex import graphs.

Compared to this example, applying force graph techniques to Go import graphs is unsuccessful because the heavily connected core packages gravitate towards the center of the graph, rather than the edge.

Chord graphs

The third type I investigated is called a chord graph, or at least that is what it is called in the d3 examples. The chord graph focuses on the interrelationship between nodes, rather than the node itself, and has proved to be best, or at least most appealing way, of visualising dependencies so far.

crypto/rand chord graph

crypto/rand chord graph

While initially overwhelming, the chord graph is aided by d3’s ability to disable rendering of part of the graph as you mouse over them. In addition the edges have tool tips for each limb.

crypto/rand chord graph highlighting bufio

crypto/rand chord graph highlighting bufio

In this image i’ve highlighted bufio. All the packages that bufio imports directly are indicated by lines of the same color leading away from bufio. Likewise the packages that import bufio directly are highlighted, in different color and in a different direction, in this example there is only one, crypto/rand itself.

The size of the segments around the circumference of the circle is somewhat arbitrary, indicating the number of packages that each directly import.

For some packages in the standard library, their import graph is small enough to be interpreted directly. Here is an shot of fmt which shows all the packages that are needed to provide fmt.Println("Hello world!").

fmt chord graph

fmt chord graph

Application to large projects

In the examples I’ve shown so far I’ve been careful to always show small packages from the standard library, and this is for a good reason. This is best explained with the following image

github.com/juju/juju/state chord graph

github.com/juju/juju/state chord graph

This is a graph of the code that I spend most of my time in, the data model of Juju.

I’m hesitant to say that chord graphs work at this size, although it is more successful than the tree or force graph attempts. If you are patient, and have a large enough screen, you can use the chord graph method to trace from any package on the circumference to discover how it relates to the package at the root of the graph.

What did I discover ?

I think I’ve made some pretty pictures, but it’s not clear that chord graphs can replace the shell scripts I’ve been using up until this point. As I am neither a graph theorist, nor a visual designer, this isn’t much of a surprise to me.

Next steps

The code is available in the usual place, but at this stage I don’t intend to release or support it; caveat emptor.

Alan Donovan’s work writing tools for semantic analysis of Go programs is fascinating and it would be interesting to graph the use of symbols from one package by another, or the call flow between packages.

Why I think Go package management is important

One of the stated goals of Go was to provide a language which could be built without Makefiles or other kinds of external configuration. This was realised with the release of Go 1, and the go tool which incorporated an earlier tool, goinstall, as the go get subcommand.

go get uses a cute convention of embedding the remote location of the package’s source in the import path of the package itself. In effect, the import path of the package tells go get where to find the package, allowing the dependency tree a package requires to be discovered and fetched automatically.

Compared to Go’s contemporaries, this solution has been phenomenally successful. Authors of Go code are incentivised to make there code go getable, in much the same way peer pressure drives them to go fmt their code. The installation instructions for most of the Go code you find on godoc and go-search has become, in essence, go get $REPO.

So, what is the problem?

Despite the significant improvement over some other language’s built in dependency management solutions, many Go developers are disappointed when the discover the elegance of go get is a double edged sword.

The Go import declaration references an abstract path. If this abstract path represents to a remote DVCS repository, the declaration alone does not contain sufficient information to identify which particular revision should be fetched. In almost all cases go get defaults to head/tip/trunk/master, although there is a provision for fetching from a tag called go1.

The Go Authors recognize that this is an issue, and suggest that if you need this level of control over your dependencies you should consider using alternative tools, suggesting goven as a possible solution.

Over the past two years no fewer than 19 other tools have been announced, so there is clearly a need and a desire to solve the problem. That said, none of them have achieved significant mind share, let alone challenged go get for the title of default.

Making it personal

Here are three small examples from my work life where go get isn’t sufficient for our requirements as we deliver our rewrite of Juju in Go.

Keeping the team on the same page

Juju isn’t just one project, but more than a dozen projects that we have written or rely on. Often landing a fix or feature in Juju means touching one of these upstream libraries and then making a corresponding change in Juju.

Sometimes these changes make it easy for other developers on the team to detect that they don’t have the correct version of a dependency as Juju stops compiling. Other times the effects can be more subtle, a bug fix to a library can cause no observed breakage so there is no signal to others on the team that they are compiling against the wrong version of the package.

To date, we’ve resorted to an email semaphore whenever someone fixes a bug a package, imploring everyone else to run go get -u. You can probably imagine how successful this is, and how much time is being spent chasing bugs that were already fixed.

Reproducible builds

As a maintenance programmer on Juju, I regularly switch between stable, trunk and feature branches. Each of those expects to be compiled against exactly the right version of its dependencies, but go get doesn’t provide any way of capturing this so I may reliably reproduce a build of Juju from the past.

Being a good Debian citizen

As our product eventually ends up inside Debian based distributions we have to deliver a release artifact which relies only on other Debian packages in the archive. We currently do this with a blob of shell scripts and tags on repos that we control to produce a tarball that contains the complete $GOPATH of all the source for Juju and its dependencies for exactly this release.

While it gets the job done, our pragmatism is not winning us any favors with our Debian packaging overlords as our approach makes their security tin foil hats itch. We’ve been lucky so far, but will probably at some point have a security issue in a package that Juju depends on, and because that package has been copied into every release artifact we’ve delivered fixing it will be a very involved process.

What to do?

Recently William Kennedy, Nathan Youngman and I started a Google Group in the hope of harnessing these disparate efforts of the many people who have thought, argued, hit their head against, and worked on this problem.

If you care about this issue, please consider joining the [go-pm] mailing list and contributing to the Goals document. I am particularly interested in capturing the requirements of various consumers of Go packages via user stories.