Skip to content

encoding/json: add Encoder.EncodeToken method #40127

Open
@rogpeppe

Description

@rogpeppe

Currently there is a way to decode JSON token-by-token, but the JSON package does not support encoding tokens that way. It has been stated that just writing the tokens yourself is straightforward enough to do (citation needed), but it's not actually that easy to do.

Here's some code that streams JSON from input to output: https://play.golang.org/p/H6Xl_twRIyC. It's at least 50 lines of code to do the basic streaming functionality, it's easy to get wrong by forgetting to put a colon or a comma in the right place, and if you want indentation or HTML escaping, you have to do it yourself.

I propose that we add an EncodeToken method to json.Encoder:

// EncodeToken encodes a single JSON token to the encoder. The order of
// tokens must result in valid JSON syntax (as returned by the
// Decoder.Token method). That is, delimiters [ ] { } must be properly
// nested and matched, and object keys must be strings (but see the note
// on using Encode below).
//
// EncodeToken returns an error if the token cannot be encoded or is
// misplaced, for example because there's a delimiter mismatch, a
// non-string token is passed where an object key is expected, a string
// contains invalid UTF-8 or a Number isn’t formatted correctly.
//
// Note that it’s OK to mix calls to EncodeToken with calls to Encode.
// In particular, it’s also valid to call Encode when an object key is
// expected - in this case, the value is subject to the same conversion
// rules as for map keys.
func (enc *Encoder) EncodeToken(tok Token) error

There would be no way to produce syntactically invalid output (other than truncating the output by not completing the encoded value). The Encoder would keep track of the current state in a stack internally.

An example: if we wanted to stream a large array of JSON objects, we could do:

enc.EncodeToken(json.Delim('['))
for {
    enc.Encode(arrayItem)
}
enc.EncodeToken(json.Delim(']'))

A slightly more comprehensive example is here

The code to stream a set of JSON values would become considerably simpler: https://play.golang.org/p/Wec5wepCYbE

Completeness validation

It might be useful to provide callers with a final check that their encoded value is in fact complete (that is, no final closing tokens have been omitted). To do that, the following method could be provided:

// ClosingToken returns the token necessary to close the current encoding object or array.
// It returns either Delim('}'), Delim(']'), or nil if there is no current object or array.
func (enc *Encoder) ClosingToken() Token

This method makes it straightforward to check that the object is complete (if env.ClosingToken == nil), but also could be used for sanity checks when developing, better errors, or to implement a Flush method that automatically closes all open objects and arrays:

for enc.ClosingToken() != nil {
   enc.EncodeToken(enc.ClosingToken())
}

Discussion

As with the current Decoder.Token implementation, Encoder.EncodeToken will not be particularly efficient, as it requires passing strings and numbers as strings inside an interface value. I think that this can be considered an orthogonal issue: a solution to the Decoder.Token garbage issue may also be applied to Decoder.Token. The symmetry between Decoder.Token and Encoder.EncodeToken is a useful property which makes this API easier to explain and use, and shouldn’t be discarded lightly, in my view.

Once Encoder.Encode is invoked, there’s no way to return to streaming mode, for example to encode a large array within a MarshalJSON method. This is a limitation of the json.Marshaler interface, and could potentially be addressed by a fix in that area.

It would almost be possible to implement this without adding a method at all by changing the Encode method to recognize Delim. However, it’s currently valid to pass a Delim value to encode (it encodes as a number), so this would break backward compatibility. Also, it’s arguably cleaner to have a separation between the low level token API and the higher level Encode API.

Related proposals

#33714

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions