Skip to content

proposal: go/doc: support explicit hotlinking syntax  #45533

Closed
@dsnet

Description

@dsnet

Introduction

Even though the hotlinks proposal (#25444) was accepted, one reason it wasn't implemented is because detection of identifiers experienced many false positives. See #44447#related-proposals for details.

I propose modifying the hotlinks proposal to use an explicit syntax to indicate the user's intent that some string is a reference to a Go identifier. An explicit syntax should drastically reduce false positives.

Proposal

(The following proposal uses brackets, but other delimiters are reasonable to consider. See #45533 (comment) for alternatives.)

I propose modifying https://pkg.go.dev/ to support hot-linking using a syntax similar to the Markdown syntax for reference-style links. For example, [Buffer.Write] in source code would rendered as Buffer.Write. The reference comprises of a left bracket (i.e, [), some text in-between (e.g., Buffer.Write), and a right bracket (i.e., ]). When rendering, the brackets are removed and the text in-between is linked to the referent. Functionally, it operates just like a Markdown inlined reference where the referent is implicitly provided by GoDoc.

The reference must match one of the following:

  • [ExportedIdent] where it references a method or field on the current type.
  • [ExportedIdent] where it references a variable, constant, type, or function in the current package.
    • E.g., [Buffer] in the bytes package would refer to the bytes.Buffer type.
  • [ExportedIdent.ExportedIdent] where it references a method or field on a type in the package.
  • [PackageName.ExportedIdent] where it references variable, constant, type, or function in another package.
    • E.g., [io.EOF] would refer to the io.EOF variable.
  • [PackageName.ExportedIdent.ExportedIdent] where it references a method or field on a type in another package.
  • ["ImportPath"] where it references a package or module.
    • E.g., ["google.golang.org/protobuf/proto"] would refer to the proto package.
  • ["ImportPath".ExportedIdent] where it references a method or field in another package.
    • E.g., ["google.golang.org/protobuf/proto".Message] would refer to the proto.Message type.
  • ["ImportPath".ExportedIdent.ExportedIdent] where it references a method or field on a type in another package.

ExportedIdent is a valid Go identifier that is exported.

The set of PackageNames that are allowed is determined by what that Go source file imports (e.g., [io.EOF] is only hot-linked if the io package is imported). For example, if the code imports github.com/other/json, then we will link [json.Marshal] to "github.com/other/json".Marshal and not "encoding/json".Marshal.

ImportPath must be a valid import path and either:

  • be a standard library package path, or
  • be a multi segment path where the first segment contains a dot and at least one letter (simple heuristic to detect valid domain names).

In order to avoid false positives, the left bracket must be preceded by whitespace and the right bracket must be succeeded by whitespace or punctuation (i.e., period, comma, or semi-colon). Hot-linking is not performed on pre-formatted code blocks (i.e., indented paragraphs).

Examples

This code snippet:

// FooMethod performs the foo functionality.
// It implements ["github.com/example/project".FooInterface].
// The arguments may be of either the map[Key]Foo or map[Key]Bar types.
// Under certain situation, it returns an error that matches [ErrCondition] according to [errors.Is].
// This method may print [INFO] messages to [os.Stderr].
//
// Example usage:
//    t.FooMethod(map[Key]Value{...})
func (t *MyType) FooMethod(...) error

would render as:

FooMethod performs the foo functionality.
It implements "github.com/example/project".FooInterface.
The arguments may be of either the map[Key]Foo or map[Key]Bar types.
Under certain situation, it returns an error that matches ErrCondition according to errors.Is.
This method may print [INFO] messages to os.Stderr.

Example usage:

t.FooMethod(map[Key]Value{...})

Observations:

  • The brackets for successfully hot-linked references are removed and replaced with an HTML anchor:
  • Supposing there is a local declaration for Key, the [Key] is not hot-linked since it is not surrounded by whitespace. We avoid hot-linking this since removal of the brackets would render poorly:
  • If the reference does not match anything, then it is left as is:
    • [Info]
  • Indented blocks do not have hot-linking applied:
    • t.FooMethod(map[Key]Value{...})

See CL/309430 for possible changes made to the protobuf module to make use of this feature.

Design and Analysis

(The following analysis uses the latest version (as of 2021-03-21) of all public modules.)

This feature can be broken down into two problems:

  1. How to identify references in Go documentation (e.g., io.EOF), and
  2. How to identify what these references refer to (e.g., https://pkg.go.dev/io#EOF).

Identifying references

The occurence of [...] occurs ~229k times in Go documentation, most of which should not be hot-linked.

First, we restrict the text within brackets to valid identifiers and/or valid import paths. This reduces the number of matches to ~19k.

Second, we restrict the grammar to require leading whitespace and trailing whitespace (or punctuation). We do this since some of the matches:

This restriction reduces the number of matches to ~6k.

Third, we avoid performing hot-linking within indented paragraphs. We do this because:

  • Indented paragraphs are printed as code blocks and should be preserved ad-verbatim if possible. The removal of brackets could break intentional character alignment that the user desired to be present.
  • Indented paragraphs often contain code-like snippets, which more often includes the bracket character, leading to false-positives.
  • This proposal aims to be subset of Markdown and [...] references are not respected within code blocks.
  • Not hot-linking indented paragraphs provides a means to opt-out of this feature.

This reduces the number of matches to ~3.7k. The list of results at this point can be found here.

For the remaining results:

Identifying referents

Of the ~3.7k results, the type of referents are as follows:

  • ~3k are references to locally defined identifier (e.g., MyType.MyMethod)
  • ~500 are references to locally defined identifier, but scoped within some type (e.g., MyMethod)
  • ~60 are references to an identifier in an imported package by package name (e.g., io.EOF)
  • ~5 are references to another package by import path (e.g., "google.golang.org/protobuf/proto")

Hot-linking local referents is relatively easy since we a can derive the set of explicitly defined identifiers within the package using the *doc.Package we have on hand. These referents will never have false positives, but may have false negatives (since we can't easily know implicit declarations obtained through embedding or type aliases).

Hot-linking remote referents is challenging and may lead to false positives. The GoDoc implementation does not assume that it has type information available for other remote packages and we should maintain this property for the implementation. As such, it cannot verify whether some remote reference truly exists and whether some declaration is defined within it.

For references by import path (e.g., ["google.golang.org/protobuf"]), we require that it be valid, that the first path segment must contain a dot since it is always a domain name, and that there be at least one letter. With this heuristic, there were no false positives in the above results.

For references by package name (e.g., [os.Exit]), we determine the set of supported package names based on what packages are imported by the current package. For example, [os.Exit] would not be hot-linked if the os package was not imported. Unfortunately, the package name cannot always be determined given the import statement alone (see #29036). As such, we use a heuristic where the package name is the last path segment if it is a valid identifier. While this may lead to false positives, it is actually the same heuristic that go/doc uses and the https://pkg.go.dev/ is already subject to this potential false positive. To explicitly avoid incorrectly identifying the correct package name, the user code can always use a named import if the real package name is different from the last import path segment.

Even if we could identify the package name, we don't know what declarations exist in that package. In theory we could fetch the documentation for that package, but that would incur significant complexity that doesn't currently exist. Furthermore, for cases where a go.mod file is missing or incomplete, we won't even know what version of the remote package to load. Instead, we hot-link package-scoped identifiers according to the following heuristics:

  1. we don't hot-link standalone package names (e.g., [os] will not be linked to https://pkg.go.dev/os),
  2. we require that the identifiers must be exported (e.g., [os.Exit] will be linked to https://pkg.go.dev/os#Exit, while [os.exit] will not be linked to https://pkg.go.dev/os#exit).

Using these heuristics, none of the existing ~60 matches had non-existing referrents.

Summary:

  • There is an unwritten goal that Go documentation remain readable without being passed through some type of renderer. The use of brackets adds two characters and does not seem to obstruct reading the documentation directly in source code.
  • Compared to the original (x/tools/cmd/godoc: add support for hotlinks #25444) proposal, code will generally need to be modified to opt-into using this feature. However, relatively little code will need to be modified to opt-out of accidental matches due to the drastically reduced amount of false-positives.
  • Over the years there has been requests for Go documentation to support markdown. This proposal does not prevent that possibility as this is a sub-set of Markdown.

Implementation

I can do the work for this since I implemented the original hot-linking proposal, which is actually already merged into the pkgsite code-base, but currently disabled. I suspect that the modifications to the original hot-linking implementation will be relatively minimal.

Note: I originally considered using back-ticks as the marker. Credit goes to @rsc for suggesting the use of brackets to match the Markdown syntax for reference-style links.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions