Description
Introduction
Even though the hotlinks proposal (#25444) was accepted, one reason it wasn't implemented is because detection of identifiers experienced many false positives. See #44447#related-proposals for details.
I propose modifying the hotlinks proposal to use an explicit syntax to indicate the user's intent that some string is a reference to a Go identifier. An explicit syntax should drastically reduce false positives.
Proposal
(The following proposal uses brackets, but other delimiters are reasonable to consider. See #45533 (comment) for alternatives.)
I propose modifying https://pkg.go.dev/ to support hot-linking using a syntax similar to the Markdown syntax for reference-style links. For example, [Buffer.Write]
in source code would rendered as Buffer.Write. The reference comprises of a left bracket (i.e, [
), some text in-between (e.g., Buffer.Write
), and a right bracket (i.e., ]
). When rendering, the brackets are removed and the text in-between is linked to the referent. Functionally, it operates just like a Markdown inlined reference where the referent is implicitly provided by GoDoc.
The reference must match one of the following:
[ExportedIdent]
where it references a method or field on the current type.- E.g.,
[Write]
within the documentation of thebytes.Buffer
type refers to thebytes.Buffer.Write
method.
- E.g.,
[ExportedIdent]
where it references a variable, constant, type, or function in the current package.- E.g.,
[Buffer]
in thebytes
package would refer to thebytes.Buffer
type.
- E.g.,
[ExportedIdent.ExportedIdent]
where it references a method or field on a type in the package.- E.g.,
[Buffer.Write]
in thebytes
package would refer to thebytes.Buffer.Write
method.
- E.g.,
[PackageName.ExportedIdent]
where it references variable, constant, type, or function in another package.- E.g.,
[io.EOF]
would refer to theio.EOF
variable.
- E.g.,
[PackageName.ExportedIdent.ExportedIdent]
where it references a method or field on a type in another package.- E.g.,
[io.Reader.Read]
would refer to theio.Reader.Read
method.
- E.g.,
["ImportPath"]
where it references a package or module.- E.g.,
["google.golang.org/protobuf/proto"]
would refer to theproto
package.
- E.g.,
["ImportPath".ExportedIdent]
where it references a method or field in another package.- E.g.,
["google.golang.org/protobuf/proto".Message]
would refer to theproto.Message
type.
- E.g.,
["ImportPath".ExportedIdent.ExportedIdent]
where it references a method or field on a type in another package.- E.g.,
["google.golang.org/protobuf/proto".MarshalOptions.Deterministic]
would refer to theproto.MarshalOptions.Deterministic
field.
- E.g.,
ExportedIdent
is a valid Go identifier that is exported.
The set of PackageName
s that are allowed is determined by what that Go source file imports (e.g., [io.EOF]
is only hot-linked if the io
package is imported). For example, if the code imports github.com/other/json
, then we will link [json.Marshal]
to "github.com/other/json".Marshal
and not "encoding/json".Marshal
.
ImportPath
must be a valid import path and either:
- be a standard library package path, or
- be a multi segment path where the first segment contains a dot and at least one letter (simple heuristic to detect valid domain names).
In order to avoid false positives, the left bracket must be preceded by whitespace and the right bracket must be succeeded by whitespace or punctuation (i.e., period, comma, or semi-colon). Hot-linking is not performed on pre-formatted code blocks (i.e., indented paragraphs).
Examples
This code snippet:
// FooMethod performs the foo functionality.
// It implements ["github.com/example/project".FooInterface].
// The arguments may be of either the map[Key]Foo or map[Key]Bar types.
// Under certain situation, it returns an error that matches [ErrCondition] according to [errors.Is].
// This method may print [INFO] messages to [os.Stderr].
//
// Example usage:
// t.FooMethod(map[Key]Value{...})
func (t *MyType) FooMethod(...) error
would render as:
FooMethod performs the foo functionality.
It implements "github.com/example/project".FooInterface.
The arguments may be of either the map[Key]Foo or map[Key]Bar types.
Under certain situation, it returns an error that matches ErrCondition according to errors.Is.
This method may print [INFO] messages to os.Stderr.Example usage:
t.FooMethod(map[Key]Value{...})
Observations:
- The brackets for successfully hot-linked references are removed and replaced with an HTML anchor:
- Supposing there is a local declaration for
Key
, the[Key]
is not hot-linked since it is not surrounded by whitespace. We avoid hot-linking this since removal of the brackets would render poorly:- mapKeyFoo
- If the reference does not match anything, then it is left as is:
- [Info]
- Indented blocks do not have hot-linking applied:
t.FooMethod(map[Key]Value{...})
See CL/309430 for possible changes made to the protobuf
module to make use of this feature.
Design and Analysis
(The following analysis uses the latest version (as of 2021-03-21) of all public modules.)
This feature can be broken down into two problems:
- How to identify references in Go documentation (e.g.,
io.EOF
), and - How to identify what these references refer to (e.g., https://pkg.go.dev/io#EOF).
Identifying references
The occurence of [...]
occurs ~229k times in Go documentation, most of which should not be hot-linked.
First, we restrict the text within brackets to valid identifiers and/or valid import paths. This reduces the number of matches to ~19k.
Second, we restrict the grammar to require leading whitespace and trailing whitespace (or punctuation). We do this since some of the matches:
- are Go code inlined within a paragraph of normal text:
- E.g., map[Key]bool
- E.g., [gcs.KeySize]byte
- While these are technically correct matches, the removal of the brackets would cause the documentation to render strangely (e.g., mapKeyFoo). We could improve this situation in the future, but it's probably better to not link these initially.
- are some pre-existing custom markup (usually Markdown):
- E.g., [ConsensusCreateTopicTransactionBody](#proto.ConsensusCreateTopicTransactionBody)
- E.g., [ListClusterAdminCredentials](https://docs.microsoft.com/en-us/rest/api/aks/managedclusters/listclusteradmincredentials)
- It appears that the most common markup syntax is some flavor of Markdown, but this proposal deliberately does not aim to handle arbitrary Markdown, which can be a future proposal.
This restriction reduces the number of matches to ~6k.
Third, we avoid performing hot-linking within indented paragraphs. We do this because:
- Indented paragraphs are printed as code blocks and should be preserved ad-verbatim if possible. The removal of brackets could break intentional character alignment that the user desired to be present.
- Indented paragraphs often contain code-like snippets, which more often includes the bracket character, leading to false-positives.
- This proposal aims to be subset of Markdown and
[...]
references are not respected within code blocks. - Not hot-linking indented paragraphs provides a means to opt-out of this feature.
This reduces the number of matches to ~3.7k. The list of results at this point can be found here.
For the remaining results:
- Most of them are from the same module with it's own markup syntax:
- ~1.1k from godot-go/godot-go
- ~700 from tiezhong2004/iecp5 and forks of it
- ~200 from modernc.org/sqlite
- ~200 from yandex-cloud/go-genproto
- Some are generated packages, which don't tend to aim for readable GoDoc packages anyways.
- Some already contain non-standard markup (e.g., HTML). The Go documentation already renders strangely and I don't think this feature is going to make it any worse.
- Some matches seem to do what the user intended:
Identifying referents
Of the ~3.7k results, the type of referents are as follows:
- ~3k are references to locally defined identifier (e.g.,
MyType.MyMethod
) - ~500 are references to locally defined identifier, but scoped within some type (e.g.,
MyMethod
) - ~60 are references to an identifier in an imported package by package name (e.g.,
io.EOF
) - ~5 are references to another package by import path (e.g.,
"google.golang.org/protobuf/proto"
)
Hot-linking local referents is relatively easy since we a can derive the set of explicitly defined identifiers within the package using the *doc.Package
we have on hand. These referents will never have false positives, but may have false negatives (since we can't easily know implicit declarations obtained through embedding or type aliases).
Hot-linking remote referents is challenging and may lead to false positives. The GoDoc implementation does not assume that it has type information available for other remote packages and we should maintain this property for the implementation. As such, it cannot verify whether some remote reference truly exists and whether some declaration is defined within it.
For references by import path (e.g., ["google.golang.org/protobuf"]
), we require that it be valid, that the first path segment must contain a dot since it is always a domain name, and that there be at least one letter. With this heuristic, there were no false positives in the above results.
For references by package name (e.g., [os.Exit]
), we determine the set of supported package names based on what packages are imported by the current package. For example, [os.Exit]
would not be hot-linked if the os
package was not imported. Unfortunately, the package name cannot always be determined given the import statement alone (see #29036). As such, we use a heuristic where the package name is the last path segment if it is a valid identifier. While this may lead to false positives, it is actually the same heuristic that go/doc
uses and the https://pkg.go.dev/ is already subject to this potential false positive. To explicitly avoid incorrectly identifying the correct package name, the user code can always use a named import if the real package name is different from the last import path segment.
Even if we could identify the package name, we don't know what declarations exist in that package. In theory we could fetch the documentation for that package, but that would incur significant complexity that doesn't currently exist. Furthermore, for cases where a go.mod
file is missing or incomplete, we won't even know what version of the remote package to load. Instead, we hot-link package-scoped identifiers according to the following heuristics:
- we don't hot-link standalone package names (e.g.,
[os]
will not be linked to https://pkg.go.dev/os), - we require that the identifiers must be exported (e.g.,
[os.Exit]
will be linked to https://pkg.go.dev/os#Exit, while[os.exit]
will not be linked to https://pkg.go.dev/os#exit).
Using these heuristics, none of the existing ~60 matches had non-existing referrents.
Summary:
- There is an unwritten goal that Go documentation remain readable without being passed through some type of renderer. The use of brackets adds two characters and does not seem to obstruct reading the documentation directly in source code.
- Compared to the original (x/tools/cmd/godoc: add support for hotlinks #25444) proposal, code will generally need to be modified to opt-into using this feature. However, relatively little code will need to be modified to opt-out of accidental matches due to the drastically reduced amount of false-positives.
- Over the years there has been requests for Go documentation to support markdown. This proposal does not prevent that possibility as this is a sub-set of Markdown.
Implementation
I can do the work for this since I implemented the original hot-linking proposal, which is actually already merged into the pkgsite
code-base, but currently disabled. I suspect that the modifications to the original hot-linking implementation will be relatively minimal.
Note: I originally considered using back-ticks as the marker. Credit goes to @rsc for suggesting the use of brackets to match the Markdown syntax for reference-style links.