Skip to content
This repository was archived by the owner on Oct 1, 2019. It is now read-only.

Re-enable AST_COMPARE #17

Merged
merged 4 commits into from
Jan 4, 2018
Merged

Re-enable AST_COMPARE #17

merged 4 commits into from
Jan 4, 2018

Conversation

azz
Copy link
Member

@azz azz commented Jan 3, 2018

Printers can now specify custom clean functions for AST massaging.

Fixes #12

@@ -21,5 +21,4 @@ before_install:
- pyenv install -s 3.6.3
- pyenv global 2.7.11 3.6.3
script:
- yarn lint
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @j-f1 The linting is integrated into the Jest run.

"test": "jest"
"lint": "jest -c jest.eslint.config.js",
"test": "jest",
"prettier": "prettier --plugin=. --parser=python"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful for testing. yarn prettier test.py or echo 'foo' | yarn prettier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't need the explicit --parser?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always, but --debug-check requires it, so it can't hurt.

"test": "jest"
"lint": "jest -c jest.eslint.config.js",
"test": "jest",
"prettier": "prettier --plugin=. --parser=python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't need the explicit --parser?

const parent = path.getParentNode();
// This is overly cautious, we may want to revisit and normalize more
// cases here.
return /[ex.]/i.test(parent.source) ? parent.source : `${n.n}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is probably no reason to ever normalize these. consider e.g. PEP 515 section dividers, plus in general you can't represent all python numbers exactly with the JS number type anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to always return parent.source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just apply this check for Num then, and don't bother with the special-case number handling

e.g. you can't differentiate 1.0j from 1j in the AST anyway, but those should not be reprinted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean in astexport as well

@@ -305,7 +305,7 @@ function genericPrint(path, options, print) {
const parent = path.getParentNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reaching to the parent here, just drop visit_field_Num_n from astexport, then just print path.source in the handler for Num. As noted above, this is the right thing to do for complex values too.

In general it's probably best not to try to normalize numbers in Python. People do lots of numerical work, and if I'm writing out something like 1.00, it's probably for a good reason – like maybe it's np.array([0.99, 1.00, 1.01]) or something.

Copy link
Member Author

@azz azz Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't quite do that as complex is not JSON serializable. Having the source type information may be useful too, so I'll just coerce them to str on the Python side (assuming that is safe?).

Copy link
Contributor

@taion taion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo that one comment on visit_field_Num_n but it sort of doesn't matter

@@ -87,18 +87,18 @@ def visit_field_Num_n(self, val):
if isinstance(val, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay – this should probably just be

def visit_field_Num_n(val):
    return str(val)

this value won't actually be used in the printer (which as with the JS printer will only look at the raw string value), so anything JSON-serializable that will work for the AST comparison will be fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lose a bit of information in doing this, does it hurt to keep around the source type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you do, actually, relative to what's in the AST:

In [1]: str(1)
Out[1]: '1'

In [2]: str(1.)
Out[2]: '1.0'

In [3]: str(0xDEADBEEF)
Out[3]: '3735928559'

In [4]: str(1_000_001)
Out[4]: '1000001'

In [5]: str(5j)
Out[5]: '5j'

In [6]: str(3-4j)
Out[6]: '(3-4j)'

there might be some edge cases on float rounding, but you'd already encounter them with the above

let ppastMassaged;
let pperr = null;
try {
const ppast = parse(
prettyprint(source, path, mergedOptions),
mergedOptions
);
ppastMassaged = massageAST(ppast);
ppastMassaged = massageAST(ppast, normalizedOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is definitely a value-add, but it's not an ideal API. for purposes of comparison, given a python AST, you can call

ast.dump(tree, annotate_fields=False)

this will give you a string representation of the AST – not the most useful thing for what we're formatting, but you can then just do string equality to compare.

e.g. for this python 2 example:

In [15]: print(s)

with A() as a, B() as b:
    pass


In [16]: print(rt)
with A() as a:
    with B() as b:
        pass


In [17]: print(ast.dump(ast.parse(s), annotate_fields=False))
Module([With(Call(Name('A', Load()), [], [], None, None), Name('a', Store()), [With(Call(Name('B', Load()), [], [], None, None), Name('b', Store()), [Pass()])])])

In [18]: print(ast.dump(ast.parse(rt), annotate_fields=False))
Module([With(Call(Name('A', Load()), [], [], None, None), Name('a', Store()), [With(Call(Name('B', Load()), [], [], None, None), Name('b', Store()), [Pass()])])])

(the two dumped AST strings above are identical)

the difference being in this case that we'd be delegating everything to the ground truth AST handler, such that if there were a bug with the AST -> JSON conversion, we would know

the sort of essential problem from a correctness perspective is that we're not really dealing with a canonical python AST here

funny enough you can't actually assert abstract equality between ASTs in python without dumping them – at not least until python/cpython#1368 lands

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr it would be a better upstream API to have a parseAndMassage hook instead of a massageAST hook. this is because there's a built-in way to canonicalize a python AST to a string that can be used for equality comparisons, but the functionality to convert the python AST to JSON is code under test here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, astor.strip_tree will clean up Python ASTs such that equality checks on them in Python are possible, but I guess there's not a good way to pretty-print the diff for test failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested with case is something we wouldn't change in Prettier for JS. e.g. we would never change:

if (foo) {
  if (bar) {
  }
}

to

if (foo && bar) {
}

There's a reason we massage the AST after we have printed it, as the cleanup often removes properties or nodes that are critical information required to print the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was an illustrative example, showing how this process produces identical dumped ASTs even when the source is quite different in terms of concrete syntax

as it happens, for with, the python 2 AST doesn't distinguish at all between those cases – hence the heuristic in https://github.com/prettier/prettier-python/pull/15/files#diff-ab7972325670f1753462f9b0e6c57900R138

like literally those two things above are identical from the perspective of the parser in python 2 (though not so in python 3)

@azz azz merged commit a58a6ce into prettier:master Jan 4, 2018
@azz azz deleted the clean-ast branch January 4, 2018 07:09
@azz
Copy link
Member Author

azz commented Jan 4, 2018

If you want to use astor.dump_tree instead of/as well as massageAst, feel free to open a PR for that.

@taion
Copy link
Contributor

taion commented Jan 4, 2018

It wouldn't be straightforward to use massageAst, since dump_tree wants a Python AST. I could replace that area of the source comparison entirely. Does that make sense if run_spec gets pulled upstream, though? I guess as an extra check?

@azz
Copy link
Member Author

azz commented Jan 4, 2018

Not sure I'm following, why would it be difficult to send two strings (input and output) to Python to run parse and dump_tree on both, then send back the two trees as strings?

@taion
Copy link
Contributor

taion commented Jan 4, 2018

I mean with massageAstNode – that only gets the AST node and not the original source, no?

@azz
Copy link
Member Author

azz commented Jan 4, 2018

Correct. If we want this to work for both run_spec and --debug-check we'll need to add another hook to use a custom checker rather than the generic massageAst.

@taion
Copy link
Contributor

taion commented Jan 4, 2018

Yes – that's what I meant in prettier/prettier#3610 (comment).

Though it turns out the AST here does include the full original source code (usually many times over), but we should probably remove that behavior: 59dcdbe#r26632592.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants