-
Notifications
You must be signed in to change notification settings - Fork 236
Use .hie
files for the Hyperlinker backend
#977
Use .hie
files for the Hyperlinker backend
#977
Conversation
Some bigger TODO items:
|
There are a few more issues about the interaction between the haddock tokenizer and
|
Good point for infix identifiers! Haddock uses GHC's tokenizer - the problem is that However, I don't think we should let this be blocked by position-altering pragmas. I think that's a more fundamental issue about how GHC conflates positions in files with positions used for error reporting. |
Here's more information on the performance (both time and allocations) for all boot libs: https://docs.google.com/spreadsheets/d/1xmUdUoVqbYtIRYdz5TgWT6rwHsVGjYEgEmMwWClU9-c/edit?usp=sharing. |
@harpocrates Is this for building haddock docs with With an SSD, I see a slowdown of ~15% compiling ghc stage2 with I measured this by building ghc normally, then blowing away Once Hi Haddock lands, haddocks performance should increase significantly as it wouldn't need to invoke ghc at all. |
Yes this is building full docs with --hyperlinked-source.
Performance won’t necessarily get that much better. We’ll still be calling out to GHC - just in a way that GHC can avoid compiling modules if it already has what it needs. I suspect that when folks do build docs with hyperlinked sources, they’ll probably still need compilation for HIE files (unless those are already produced and up to date - maybe for IDE reasons).
Until the performance for generating HIE files gets better, I don’t think we should enable them by default in Cabal either (the way I think we will want to enable `-haddock` by default).
… On Dec 27, 2018, at 10:12 PM, wz1000 ***@***.***> wrote:
@harpocrates <https://github.com/harpocrates> Is this for building haddock docs with --hyperlinked-source?
With an SSD, I see a slowdown of ~15% compiling ghc stage2 with -fwrite-ide-info: https://gist.github.com/wz1000/8678464b4c88be8359d39462ae7f0a90 <https://gist.github.com/wz1000/8678464b4c88be8359d39462ae7f0a90>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#977 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKRHAdISRD_ijiiRwiaKm-AxG8lqTe8gks5u9bZTgaJpZM4ZQA9N>.
|
It would be useful to know whether the bulk of the slowdown in in the |
Here is the script I ran twice - once for set -e
for run in run1 run2 run3 run4
do
mkdir "profs/$run"
for file in $(find _build -name "*.haddock")
do
echo "Processing $file..."
echo "| Removing $file"
rm $file
echo "| Rebuilding"
./hadrian/build.sh -c --flavour=user $file
echo "| Moving haddock.prof"
mv haddock.prof "profs/$run/$(basename $file | cut -d. -f1)"
done
done The custom userFlavour :: Flavour
userFlavour = defaultFlavour
{ name = "user"
, args = mconcat [ builder (Haddock BuildPackage) ? pure [ "+RTS", "-p", "-RTS" ]
, builder (Ghc CompileHs) ? package haddock ? pure [ "-prof", "-fprof-auto", "-rtsopts" ]
, builder (Ghc LinkHs) ? package haddock ? pure [ "-prof", "-fprof-auto", "-rtsopts" ]
, args defaultFlavour
]
} Here are the .prof files I got: https://owncloud-tng.galois.com/index.php/s/1nt96QTchrWeTNZ |
I meant that I get a 15% slowdown building
|
Given the first lines of pretty much all the new
|
That says 25% of the time is spent in The traversal of the renamed source enables hyperlinking of imports/exports/annotations/foreign declarations and other stuff that is not in the TypecheckedSource. Maybe there is a better way to extract those specifically, and not extract the rest of AST that is already there in the typechecked source. I believe it would be worth testing to see if not including RenamedSource in the hie file generation stuff(and thus not calling |
Isn’t `extract_renamed_stuff` the top-level function for where HIE files get created and written out (so that entry includes not just time spent traversing the renamed source, but also time spent compacting the AST and writing it to a file)? That said, not having to traverse the RenamedSource would probably make the HIE file generation faster.
Also, aren’t HIE files supposed to be consumed elsewhere (like Haskell-ide-engine)? In which case, we can’t just keep changing the format, right?
… On Dec 28, 2018, at 6:17 AM, wz1000 ***@***.***> wrote:
That says 25% of the time is spent in extract_renamed_stuff. However, now I doubt whether we need renamed source at all to generate .hie files. Initially there were some issues with some name uniques not matching due to differences introduced by the typechecker, but I believe I should have solved those with the name_remappping stuff.
The traversal of the renamed source enables hyperlinking of imports/exports/annotations/foreign declarations and other stuff that is not in the TypecheckedSource. Maybe there is a better way to extract those specifically, and not extract the rest of AST that is already there in the typechecked source.
I believe it would be worth testing to see if not including RenamedSource in the hie file generation stuff yields a significant performance improvement.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#977 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKRHAWnx8vUbECu4xiDV1Pdirm4JOR75ks5u9igLgaJpZM4ZQA9N>.
|
Oh, yes, I misremembered. However, the format will not change if we don't traverse the RenamedSource. All that needs to be changed in flat_asts = concat
[ tasts
- , rasts
- , imps
- , exps
] and we can also avoid setting |
By "format" I don't mean the Haskell types in Would you like to make a GHC patch for this? Then we can post some performance numbers there, as well as check that nobody is relying on the information that is about to be removed from |
There are not that many cost centres defined in the GHC code base so I wouldn't trust the information to precisely point to what is taking up the time. |
Parts of the AST that are unaccounted for without using renamed source:
|
@wz1000 Any updates on potential performance improvements from the GHC side? I'm gonna see if I can optimize the rest of the stuff in |
@harpocrates the approach I suggested earlier(not traversing the renamed AST) will not work. Type signatures can be buried arbitrarily deep in the AST, and to hyperlink those, we need the Renamed Source(since The other items in my checklist above should be possible to extract without a full traversal of the renamed AST. To see the possible gains, you can benchmark the branch I linked above. If the gains are substantial, we might want to add a flag that enables this behaviour at the cost of information for type signatures. |
Move the logic for handling temporary output directories up and make .hie files write to there. Although having shorter lived temp directories is better, they were so short lived that the .hie files were gone by the time the hyperlinker backend was ready to run!
This is working around a quirk of the GHC lexer. It seems to be working well enough though. See the attached test case for more motivation.
Optimized some of the batch HIE transformations to share intermediate results. This sort of thing probably belongs as a GHC-side utility. Also, avoid recomputing a module name based SrcMap. This should be a performance-only change.
Also avoid an unnecessary call to splice URL in those cases.
This is yet another example of why we need to stop using CPP. Clang's CPP inserts a newline after lines that _look_ like (but don't end up actually being) preprocessor directives. See the comment on `needPragHack` for more on this sad state of affairs.
Good news. By limiting (for now) which expressions have their types serialized into the HIE files, performance returns to levels that make this mergeable (2-3% percent faster but allocate 2-3% more) without sacrificing any features. We'll now see if anyone else objects: https://gitlab.haskell.org/ghc/ghc/merge_requests/214. |
@harpocrates 2-3% faster than the previous version with .hie files or 2-3% faster than the old way of generating hyperlinked source? |
@wz1000 Than the old way, otherwise it wouldn't really matter, right? |
Actually, with the latest tweaks to that merge request, both allocations and total time drop by a couple percentage points compared to the old way. I do love changes that are net wins in all dimensions: time, allocations, bug fixes, and features! 😀 I intend to accept the new output and merge this as soon as the GHC MR gets merged. |
This is twofold: 1. ask GHC to treat these as regular tokens 2. we manually update the internal position as well as returning tokens If we don't do 1, we end up with huge chunks of text that get considered as whitespace (because a `LINE`/`COLUMN` pragma caused us to jump to that next place). If we do 1 but not 2, the positions we associate with tokens won't match those from HIE files, and we consequently won't get any links/type annotations.
Fixes some annoyances around LINE/COLUMN pragmas not being recognized properly due to the lexer inserting extra zero length tokens. Also, avoid rendering a zero length token.
Calls out to 'spliceURL' are often useless. Also, given the size of these functions, we should tell GHC to inline them.
Final numbers, in case anyone is interested:
Thanks to everyone for being patient! |
Very nice! Thank you sir!
…On Thu, Jan 31, 2019, 10:39 Alec Theriault ***@***.***> wrote:
Final numbers
<https://docs.google.com/spreadsheets/d/1xmUdUoVqbYtIRYdz5TgWT6rwHsVGjYEgEmMwWClU9-c/edit#gid=847080098>,
in case anyone is interested:
- 9-10% faster on the sum of boot libs running --html and
--hyperlinked-sources
- 6-7% less allocations on the sum of boot libs running --html and
--hyperlinked-sources
Thanks to everyone for being patient!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#977 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AByiiSW8ok2FMjlSvFyn86R4Vw93CLNDks5vIrm3gaJpZM4ZQA9N>
.
|
This is a large architectural change to the Hyperlinker. * extract link (and now also type) information from `.hie` instead of doing ad-hoc SYB traversals of the `RenamedSource`. Also adds a superb type-on-hover feature (#715). * re-engineer the lexer to avoid needless string conversions. By going directly through GHC's `P` monad and taking bytestring slices, we avoid a ton of allocation and have better handling of position pragmas and CPP. In terms of performance, the Haddock side of things has gotten _much_ more efficient. Unfortunately, much of this is cancelled out by the increased GHC workload for generating `.hie` files. For the full set of boot libs (including `ghc`-the-library) * the sum of total time went down by 9-10% overall * the sum of total allocations went down by 6-7% Haddock is moving towards working entirely over `.hi` and `.hie` files. This change means we no longer need the `RenamedSource` from `TypecheckedModule` (something which is _not_ in `.hi` files). Along the way a bunch of things were fixed: * Cross package (and other) links are now more reliable (#496) * The lexer tries to recover from errors on every line (instead of at CPP boundaries) * `LINE`/`COLUMN` pragmas are taken into account * filter out zero length tokens before rendering * avoid recomputing the `ModuleName`-based `SrcMap` * remove the last use of `Documentation.Haddock.Utf8` (see #998) * restructure temporary folder logic for `.hi`/`.hie` model
This is a large architectural change to the Hyperlinker. * extract link (and now also type) information from `.hie` instead of doing ad-hoc SYB traversals of the `RenamedSource`. Also adds a superb type-on-hover feature (#715). * re-engineer the lexer to avoid needless string conversions. By going directly through GHC's `P` monad and taking bytestring slices, we avoid a ton of allocation and have better handling of position pragmas and CPP. In terms of performance, the Haddock side of things has gotten _much_ more efficient. Unfortunately, much of this is cancelled out by the increased GHC workload for generating `.hie` files. For the full set of boot libs (including `ghc`-the-library) * the sum of total time went down by 9-10% overall * the sum of total allocations went down by 6-7% Haddock is moving towards working entirely over `.hi` and `.hie` files. This change means we no longer need the `RenamedSource` from `TypecheckedModule` (something which is _not_ in `.hi` files). Along the way a bunch of things were fixed: * Cross package (and other) links are now more reliable (#496) * The lexer tries to recover from errors on every line (instead of at CPP boundaries) * `LINE`/`COLUMN` pragmas are taken into account * filter out zero length tokens before rendering * avoid recomputing the `ModuleName`-based `SrcMap` * remove the last use of `Documentation.Haddock.Utf8` (see #998) * restructure temporary folder logic for `.hi`/`.hie` model
This is a large architectural change to the Hyperlinker. * extract link (and now also type) information from `.hie` instead of doing ad-hoc SYB traversals of the `RenamedSource`. Also adds a superb type-on-hover feature (#715). * re-engineer the lexer to avoid needless string conversions. By going directly through GHC's `P` monad and taking bytestring slices, we avoid a ton of allocation and have better handling of position pragmas and CPP. In terms of performance, the Haddock side of things has gotten _much_ more efficient. Unfortunately, much of this is cancelled out by the increased GHC workload for generating `.hie` files. For the full set of boot libs (including `ghc`-the-library) * the sum of total time went down by 9-10% overall * the sum of total allocations went down by 6-7% Haddock is moving towards working entirely over `.hi` and `.hie` files. This change means we no longer need the `RenamedSource` from `TypecheckedModule` (something which is _not_ in `.hi` files). Along the way a bunch of things were fixed: * Cross package (and other) links are now more reliable (#496) * The lexer tries to recover from errors on every line (instead of at CPP boundaries) * `LINE`/`COLUMN` pragmas are taken into account * filter out zero length tokens before rendering * avoid recomputing the `ModuleName`-based `SrcMap` * remove the last use of `Documentation.Haddock.Utf8` (see #998) * restructure temporary folder logic for `.hi`/`.hie` model
This is a large architectural change to the Hyperlinker. * extract link (and now also type) information from `.hie` instead of doing ad-hoc SYB traversals of the `RenamedSource`. Also adds a superb type-on-hover feature (#715). * re-engineer the lexer to avoid needless string conversions. By going directly through GHC's `P` monad and taking bytestring slices, we avoid a ton of allocation and have better handling of position pragmas and CPP. In terms of performance, the Haddock side of things has gotten _much_ more efficient. Unfortunately, much of this is cancelled out by the increased GHC workload for generating `.hie` files. For the full set of boot libs (including `ghc`-the-library) * the sum of total time went down by 9-10% overall * the sum of total allocations went down by 6-7% Haddock is moving towards working entirely over `.hi` and `.hie` files. This change means we no longer need the `RenamedSource` from `TypecheckedModule` (something which is _not_ in `.hi` files). Along the way a bunch of things were fixed: * Cross package (and other) links are now more reliable (#496) * The lexer tries to recover from errors on every line (instead of at CPP boundaries) * `LINE`/`COLUMN` pragmas are taken into account * filter out zero length tokens before rendering * avoid recomputing the `ModuleName`-based `SrcMap` * remove the last use of `Documentation.Haddock.Utf8` (see #998) * restructure temporary folder logic for `.hi`/`.hie` model
Fixes haskell#1481. There were two bugs in this: * We were assuming that we were always getting a relative path to the module in question, while Nix gives us file:// URLs sometimes. This change checks for those and stops prepending `..` to them. * We were not linking to the file under the module. This seems to have been a regression introduced by haskell#977. That is, the URLs were going to something like file:///nix/store/3bwbsy0llxxn1pixx3ll02alln56ivxy-ghc-9.0.2-doc/share/doc/ghc/html/libraries/base-4.15.1.0/src which does not have the appropriate HTML file or fragment for the item in question at the end. There is a remaining instance of the latter bug, but not in the hyperlinker: the source links to items reexported from other modules are also not including the correct file name. e.g. the reexport of Entity in esqueleto, from persistent. NOTE: This needs to get tested with relative-path located modules. It seems correct for Nix based on my testing. Testing strategy: ``` nix-shell '<nixpkgs>' --pure -A haskell.packages.ghc922.aeson mkdir /tmp/aesonbuild && cd /tmp/aesonbuild export out=/tmp/aesonbuild/out genericBuild ln -sf $HOME/co/haddock/haddock-api/resources . ./Setup haddock --with-haddock=$HOME/path/to/haddock/exec --hyperlink-source ```
Fixes #1481. There were two bugs in this: * We were assuming that we were always getting a relative path to the module in question, while Nix gives us file:// URLs sometimes. This change checks for those and stops prepending `..` to them. * We were not linking to the file under the module. This seems to have been a regression introduced by #977. That is, the URLs were going to something like file:///nix/store/3bwbsy0llxxn1pixx3ll02alln56ivxy-ghc-9.0.2-doc/share/doc/ghc/html/libraries/base-4.15.1.0/src which does not have the appropriate HTML file or fragment for the item in question at the end. There is a remaining instance of the latter bug, but not in the hyperlinker: the source links to items reexported from other modules are also not including the correct file name. e.g. the reexport of Entity in esqueleto, from persistent. NOTE: This needs to get tested with relative-path located modules. It seems correct for Nix based on my testing. Testing strategy: ``` nix-shell '<nixpkgs>' --pure -A haskell.packages.ghc922.aeson mkdir /tmp/aesonbuild && cd /tmp/aesonbuild export out=/tmp/aesonbuild/out genericBuild ln -sf $HOME/co/haddock/haddock-api/resources . ./Setup haddock --with-haddock=$HOME/path/to/haddock/exec --hyperlink-source ```
Fixes haskell#1481. There were two bugs in this: * We were assuming that we were always getting a relative path to the module in question, while Nix gives us file:// URLs sometimes. This change checks for those and stops prepending `..` to them. * We were not linking to the file under the module. This seems to have been a regression introduced by haskell#977. That is, the URLs were going to something like file:///nix/store/3bwbsy0llxxn1pixx3ll02alln56ivxy-ghc-9.0.2-doc/share/doc/ghc/html/libraries/base-4.15.1.0/src which does not have the appropriate HTML file or fragment for the item in question at the end. There is a remaining instance of the latter bug, but not in the hyperlinker: the source links to items reexported from other modules are also not including the correct file name. e.g. the reexport of Entity in esqueleto, from persistent. NOTE: This needs to get tested with relative-path located modules. It seems correct for Nix based on my testing. Testing strategy: ``` nix-shell '<nixpkgs>' --pure -A haskell.packages.ghc922.aeson mkdir /tmp/aesonbuild && cd /tmp/aesonbuild export out=/tmp/aesonbuild/out genericBuild ln -sf $HOME/co/haddock/haddock-api/resources . ./Setup haddock --with-haddock=$HOME/path/to/haddock/exec --hyperlink-source ``` (cherry picked from commit ab53ccf)
Fixes haskell/haddock#1481. There were two bugs in this: * We were assuming that we were always getting a relative path to the module in question, while Nix gives us file:// URLs sometimes. This change checks for those and stops prepending `..` to them. * We were not linking to the file under the module. This seems to have been a regression introduced by haskell/haddock#977. That is, the URLs were going to something like file:///nix/store/3bwbsy0llxxn1pixx3ll02alln56ivxy-ghc-9.0.2-doc/share/doc/ghc/html/libraries/base-4.15.1.0/src which does not have the appropriate HTML file or fragment for the item in question at the end. There is a remaining instance of the latter bug, but not in the hyperlinker: the source links to items reexported from other modules are also not including the correct file name. e.g. the reexport of Entity in esqueleto, from persistent. NOTE: This needs to get tested with relative-path located modules. It seems correct for Nix based on my testing. Testing strategy: ``` nix-shell '<nixpkgs>' --pure -A haskell.packages.ghc922.aeson mkdir /tmp/aesonbuild && cd /tmp/aesonbuild export out=/tmp/aesonbuild/out genericBuild ln -sf $HOME/co/haddock/haddock-api/resources . ./Setup haddock --with-haddock=$HOME/path/to/haddock/exec --hyperlink-source ```
Fixes haskell/haddock#1481. There were two bugs in this: * We were assuming that we were always getting a relative path to the module in question, while Nix gives us file:// URLs sometimes. This change checks for those and stops prepending `..` to them. * We were not linking to the file under the module. This seems to have been a regression introduced by haskell/haddock#977. That is, the URLs were going to something like file:///nix/store/3bwbsy0llxxn1pixx3ll02alln56ivxy-ghc-9.0.2-doc/share/doc/ghc/html/libraries/base-4.15.1.0/src which does not have the appropriate HTML file or fragment for the item in question at the end. There is a remaining instance of the latter bug, but not in the hyperlinker: the source links to items reexported from other modules are also not including the correct file name. e.g. the reexport of Entity in esqueleto, from persistent. NOTE: This needs to get tested with relative-path located modules. It seems correct for Nix based on my testing. Testing strategy: ``` nix-shell '<nixpkgs>' --pure -A haskell.packages.ghc922.aeson mkdir /tmp/aesonbuild && cd /tmp/aesonbuild export out=/tmp/aesonbuild/out genericBuild ln -sf $HOME/co/haddock/haddock-api/resources . ./Setup haddock --with-haddock=$HOME/path/to/haddock/exec --hyperlink-source ``` (cherry picked from commit ab53ccf089ea703b767581ac14be0f6c78a7678a)
Now that
.hie
files can be generated by GHC, Haddock should use them in the hyperlinker backend instead of going via theTypecheckedModule
(which will have to go away for Hi Haddock to work).Fixes #496.
Fixes #715.