Let’s talk about logging

This is a post inspired by a thread that Nate Finch started on the Go Forum. This post focuses on Go, but if you can see your way past that, I think the ideas presented here are widely applicable.

Why no love ?

Go’s log package doesn’t have leveled logs, you have to manually add prefixes like debug, info, warn, and error, yourself. Also, Go’s logger type doesn’t have a way to turn these various levels on or off on a per package basis. By way of comparison let’s look at a few replacements from third parties.

glog from Google provides the following levels:

  • Info
  • Warning
  • Error
  • Fatal (which terminates the program)

Looking at another library, loggo, which we developed for Juju, provides the following levels:

  • Trace
  • Debug
  • Info
  • Warning
  • Error
  • Critical

Loggo also provides the ability to adjust the verbosity of logging on a per package basis.

So here are two examples, clearly influenced by other logging libraries in other languages. In fact their linage can be traced back to syslog(3), maybe even earlier. And I think they are wrong.

I want to take a contradictory position. I think that all logging libraries are bad because the offer too many features; a bewildering array of choice that dazzles the programmer right at the point they must be thinking clearly about how to communicate with the reader from the future; the one who will be consuming their logs.

I posit that successful logging packages need far less features, and certainly fewer levels.

Let’s talk about warnings

Let’s start with the easiest one. Nobody needs a warning log level.

Nobody reads warnings, because by definition nothing went wrong. Maybe something might go wrong in the future, but that sounds like someone else’s problem.

Furthermore, if you’re using some kind of leveled logging then why would you set the level at warning ? You’d set the level at info, or error. Setting the level to warning is an admission that you’re probably logging errors at warning level.

Eliminate the warning level, it’s either an informational message, or an error condition.

Let’s talk about fatal

Fatal level is effectively logging the message, then calling os.Exit(1). In principal this means:

  • defer statements in other goroutines don’t run.
  • buffers aren’t flushed.
  • temporary files and directories aren’t removed.

In effect, log.Fatal is a less verbose than, but semantically equivalent to, panic.

It is commonly accepted that libraries should not use panic1, but if calling log.Fatal2 has the same effect, surely this should also be outlawed.

Suggestions that this cleanup problem can be solved by registering shutdown handlers with the logging system introduces tight coupling between your logging system and every place where cleanup operations happen; its also violates the separation of concerns.

Don’t log at fatal level, prefer instead to return an error to the caller. If the error bubbles all the way up to main.main then that is the right place to handle any cleanup actions before exiting.

Let’s talk about error

Error handling and logging are closely related, so on the face of it, logging at error level should be easily justifiable. I disagree.

In Go, if a function or method call returns an error value, realistically you have two options:

  • handle the error.
  • return the error to your caller. You may choose to gift wrap the error, but that is not important to this discussion.

If you choose to handle the error by logging it, by definition it’s not an error any more — you handled it. The act of logging an error handles the error, hence it is no longer appropriate to log it as an error.

Let me try to convince you with this code fragment:

err := somethingHard()
if err != nil {
        log.Error("oops, something was too hard", err)
        return err // what is this, Java ?
}

You should never be logging anything at error level because you should either handle the error, or pass it back to the caller.

To be clear, I am not saying you should not log that a condition occurred

if err := planA(); err != nil {
        log.Infof("could't open the foo file, continuing with plan b: %v", err)
        planB()
}

but in effect log.Info and log.Error have the same purpose.

I am not saying DO NOT LOG ERRORS! Instead the question is, what is the smallest possible logging API ? And when it comes to errors, I believe that an overwhelming proportion of items logged at error level are simple done that way because they are related to an error. They are in fact, just informational, hence we can remove logging at error level from our API.

What’s left ?

We’ve ruled out warnings, argued that nothing should be logged at error level, and shown that only the top level of the application should have some kind of log.Fatal behaviour. What’s left ?

I believe that there are only two things you should log:

  1. Things that developers care about when they are developing or debugging software.
  2. Things that users care about when using your software.

Obviously these are debug and info levels, respectively.

log.Info should simply write that line to the log output. There should not be an option to turn it off as the user should only be told things which are useful for them. If an error that cannot be handled occurs, it should bubble up main.main where the program terminates. The minor inconvenience of having to insert the FATAL prefix in front of the final log message, or writing directly to os.Stderr with fmt.Fprintf is not sufficient justification for a logging package growing a log.Fatal method.

log.Debug, is an entirely different matter. It is for the developer or support engineer to control. During development, debugging statements should be plentiful, without resorting to trace or debug2 (you know who you are) level. The log package should support fine grained control to enable or disable debug, and only debug, statements at the package or possibly even finer scope.

Wrapping up

If this were a twitter poll, I’d ask you to choose between

  • logging is important
  • logging is hard

But the fact is, logging is both. The solution to this problem must be to de-construct and ruthlessly pair down unnecessary distraction.

What do you think ? Is this just crazy enough to work, or just plain crazy ?


Notes

  1. Some libraries may use panic/recover as an internal control flow mechanism, but the overriding mantra is they must not let these control flow operations leak outside the package boundary.
  2. Ironically while it lacks a debug level output, the Go standard log package has both Fatal and Panic functions. In this package the number of functions that cause a program to exit abruptly outnumber those that do not.