A few days ago Fatih posted this question on twitter.
I’m going to attempt to give my answer, however to do that I need to apply some simplifications as my previous attempts to answer it involved a lot of phrases like a pointer to a pointer, and other unhelpful waffling. Hopefully my simplified answer can be useful in building a mental framework to answer Fatih’s original question.
Restating the question
Fatih’s original tweet showed four different variations of json.Unmarshal
. I’m going to focus on the last two, which I’ll rewrite a little:
package main
import (
"encoding/json"
"fmt"
)
type Result struct {
Foo string `json:"foo"`
}
func main() {
content := []byte(`{"foo": "bar"}`)
var result1, result2 *Result
err := json.Unmarshal(content, &result1)
fmt.Println(result1, err) // &{bar} <nil>
err = json.Unmarshal(content, result2)
fmt.Println(result2, err) // <nil> json: Unmarshal(nil *main.Result)
}
Restated in words, result1
and result2
are the same type; *Result
. Decoding into result1
works as expected, whereas decoding into result2
causes the json
package to complain that the value passed to Unmarshal
is nil
. However, both values were declared without an initialiser so both would have taken on the type’s zero value, nil
.
Eagle eyed readers will have spotted that the reason for the difference is the first
invocation is passed &result1
, while the second is passed result2
, but this explanation is unsatisfactory because the documentation for json.Unmarshal
states:
Unmarshal parses the JSON-encoded data and stores the result in the value pointed to by v. If v is nil or not a pointer, Unmarshal returns an InvalidUnmarshalError.
Which is confusing because result1
and result2
are pointers. Furthermore, without initialisation, both are nil
. Now, the documentation is correct (as you’d expect from a package that has been hammered on for a decade), but explaining why takes a little more investigation.
Functions receive a copy of their arguments
Every assignment in Go is a copy, this includes function arguments and return values.
package main
import (
"fmt"
)
func increment(v int) {
v++
}
func main() {
v := 1
increment(v)
fmt.Println(v) // 1
}
In this example, increment
is operating on a copy of main
‘s v
. This is because the v
declared in main
and increment
‘s v
parameter have different addresses in memory. Thus changes to increment
‘s v
cannot affect the contents of main
‘s v
.
package main
import (
"fmt"
)
func increment(v *int) {
*v++
}
func main() {
v := 1
increment(&v)
fmt.Println(v) // 2
}
If we wanted to write increment
in a way that it could affect the contents of its caller we would need to pass a reference, a pointer, to main.v
. This example demonstrates why json.Unmarshal
needs a pointer to the value to decode JSON into.
Pointers to pointers
Returning to the original question, both result1
and result2
are declared as *Result
, that is, pointers to a Result
value. We established that you have to pass the address of caller’s value to json.Unmarshal
otherwise it won’t be able to alter the contents of the caller’s value. Why then must we pass the address of result1
, a **Result
, a pointer to a pointer to a Result
, for the operation to succeed.
To explain this another detour is required. Consider this code:
package main
import (
"encoding/json"
"fmt"
)
type Result struct {
Foo *string `json:"foo"`
}
func main() {
content := []byte(`{"foo": "bar"}`)
var result1 *Result
err := json.Unmarshal(content, &result1)
fmt.Printf("%#v %v", result1, err) // &main.Result{Foo:(*string)(0xc0000102f0)} <nil>
}
In this example Result
contains a pointer typed field, Foo *string
. During JSON decoding Unmarshal
allocated a new string
value, stored the value bar
in it, then placed the address of the string in Result.Foo
. This behaviour is quite handy as it frees the caller from having to initialise Result.Foo
and makes it easier to detect when a field was not initialised because the JSON did not contain a value. Beyond the convenience this offers for simple examples it would be prohibitively difficult for the caller to properly initialise all the reference type fields in a structure before decoding unknown JSON without first inspecting the incoming JSON which itself may be problematic if the input is coming from an io.Reader
without the ability to rewind the input.
To unmarshal JSON into a pointer, Unmarshal first handles the case of the JSON being the JSON literal null. In that case, Unmarshal sets the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into the value pointed at by the pointer. If the pointer is nil, Unmarshal allocates a new value for it to point to.
json.Unmarshal
‘s handling of pointer fields is clearly documented, and works as you would expect, allocating a new value whenever there is a need to decode into a pointer shaped field. It is this behaviour that gives us a hint to what is happening in the original example.
We’ve seen that when json.Unmarshal
encounters a field which points to nil
it will allocate a new value of the correct type and assign its address the field before proceeding. Not only is does behaviour is applied recursively–for example in the case of a complex structure which contains pointers to other structures–but it also applies to the value passed to Unmarshal
.
package main
import (
"encoding/json"
"fmt"
)
func main() {
content := []byte(`1`)
var result *int
err := json.Unmarshal(content, &result)
fmt.Println(*result, err) // 1 <nil>
}
In this example result
is not a struct, but a simple *int
which, lacking an initialiser, defaults to nil
. After JSON decoding, result
now points to an int
with the value 1
.
Putting the pieces together
Now I think I’m ready to take a shot at answering Fatih’s question.
json.Unmarshal
requires the address of the variable you want to decode into, otherwise it would decode into a temporary copy which would be discard on return. Normally this is done by declaring a value, then passing its address, or explicitly initialising the the value
var result1 Result
err := json.Unmarshal(content, &result1) // this is fine
var result2 = new(Result)
err = json.Unmarshal(content, result2) // and this
var result3 = &Result{}
err = json.Unmarshal(content, result3) // this is also fine
In all three cases the address that the *Result
points too is not nil
, it points to initialised memory that json.Unmarshal
decodes into.
Now consider what happens when json.Unmarshal
encounters this
var result4 *Result
err = json.Unmarshal(content, result4) // err json: Unmarshal(nil *main.Result)
result2
, result3
, and the expression &result1
point to a Result
. However result4
, even though it has the same type as the previous three, does not point to initialised memory, it points to nil
. Thus, according to the examples we saw previously, before json.Unmarshal
can decode into it, the memory result4
points too must be initialised.
However, because each function receives a copy of its arguments, the caller’s result4
variable and the copy inside json.Unmarshal
are unique. json.Unmarshal
can allocate a new Result
value and decode into it, but it cannot alter result4
to point to this new value because it was not provided with a reference to result4
, only a copy of its contents.