Skip to content

[Java/C++/C#] Fix a bug in overzealous DTO validation #1073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 20, 2025

Conversation

ZachBray
Copy link
Contributor

@ZachBray ZachBray commented Jun 17, 2025

Previously, in Java, the generated validation methods would not permit
the null value, even for fields with optional presence. Now, we do allow
these correctly.

In this change, I've also attempted to clean up, centralise, and test
the logic that decides when validation for a particular side of a range
needs to be included. We omit the validation when the native type cannot
represent values outside of the range.

There are still some gaps around validation, e.g., we do not validate
array values.


EDIT:

The changes above exposed some areas where the OTF JSON printing support in Java did not produce valid JSON, particularly around escaping characters (but also in the representation of some floating-point concepts and some delimiters). I've attempted to improve this area; however, I believe there are still some gaps. For example, when dealing with a fixed-size char array, the implementation makes no attempt to understand the character encoding. Therefore, a character set whose encoded values don't map directly onto unicode code points will have issues, e.g., Windows-1252.

ZachBray added 3 commits June 17, 2025 18:49
We had a recent bug report where the generated Java DTOs were not
correctly handling null values for optional fields.

It turns out the property-based tests were not encoding these values.
Now, they do, and they fail with messages like this one:

```
java.lang.IllegalArgumentException: member1: value is out of allowed range: 255

	at uk.co.real_logic.sbe.properties.TestMessageDto.validateMember1(TestMessageDto.java:33)
	at uk.co.real_logic.sbe.properties.TestMessageDto.member1(TestMessageDto.java:55)
	at uk.co.real_logic.sbe.properties.TestMessageDto.decodeWith(TestMessageDto.java:112)
	at uk.co.real_logic.sbe.properties.TestMessageDto.decodeFrom(TestMessageDto.java:135)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at uk.co.real_logic.sbe.properties.DtosPropertyTest.javaDtoEncodeShouldBeTheInverseOfDtoDecode(DtosPropertyTest.java:114)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
```
I spotted a failure in the DTO round-trip test locally where the DTO's
(and flyweight codec's) String-typed accessor would return `""` if the
first character in an array was the `null` character. Therefore, it
would not round-trip the "hidden bytes" after the `null`.

In this commit, I've added a generation mode that disables the
generation of strings with this format. Given the representation of
char[N] in DTOs it would not be sensible to support round-tripping these
"hidden bytes".
Previously, in Java, the generated validation methods would not permit
the null value, even for fields with optional presence. Now, we do allow
these correctly.

In this commit, I've also attempted to clean up, centralise, and test
the logic that decides when validation for a particular side of a range
needs to be included. We omit the validation when the native type cannot
represent values outside of the range.

There are still some gaps around validation, e.g., we do not validate
array values.
@ZachBray ZachBray force-pushed the bugfix/dto-validation branch from 3bb5466 to ddd42b9 Compare June 18, 2025 10:21
@vyazelenko vyazelenko changed the title [Java] Fix a bug in overzealous DTO validation [Java/C++/C#] Fix a bug in overzealous DTO validation Jun 18, 2025
Previously, there were several problems:

- (delimeters) single `char` values were output with single quotes, but this is not
valid JSON.

- (escaping) escaping was not implemented as per the specification in
Section 7 of RFC 8259. For example, special-cased control codes, e.g.,
`\b` were encoded as `\\` followed by `\b` rather than `\\` followed by
`b`. Also, the non-special-cased control characters were not encoded
using JSON's mechanism for doing so, e.g., `\u0020`.

- (numbers) Section 6 of the specification says "Numeric values that
cannot be represented in the grammar below (such as Infinity and NaN)
are not permitted." However, these were encoded as expressions of
numbers with multiple terms, e.g., `-1/0` for positive infinity. While
these are quite logical encodings of such "numbers", it is not valid
JSON. Therefore, I have kept the expressions, but enclosed them within
quotes.

Also, in this commit:

- Replaced custom compilation logic with Agrona's CompilerUtil. Note
that `CompilerUtil#compileOnDisk` is broken. I attempted to use it to
see if I could see classes in IntelliJ rather than just stack frames,
but it fails with an NPE.
@vyazelenko vyazelenko merged commit 092c1af into aeron-io:master Jun 20, 2025
34 checks passed
marciorasf pushed a commit to marciorasf/simple-binary-encoding that referenced this pull request Jun 30, 2025
* [Java] Cover null values in PBTs.

We had a recent bug report where the generated Java DTOs were not
correctly handling null values for optional fields.

It turns out the property-based tests were not encoding these values.
Now, they do, and they fail with messages like this one:

```
java.lang.IllegalArgumentException: member1: value is out of allowed range: 255

	at uk.co.real_logic.sbe.properties.TestMessageDto.validateMember1(TestMessageDto.java:33)
	at uk.co.real_logic.sbe.properties.TestMessageDto.member1(TestMessageDto.java:55)
	at uk.co.real_logic.sbe.properties.TestMessageDto.decodeWith(TestMessageDto.java:112)
	at uk.co.real_logic.sbe.properties.TestMessageDto.decodeFrom(TestMessageDto.java:135)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at uk.co.real_logic.sbe.properties.DtosPropertyTest.javaDtoEncodeShouldBeTheInverseOfDtoDecode(DtosPropertyTest.java:114)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
```

* [Java] Support arbitrary char arrays in PBTs without data after null.

I spotted a failure in the DTO round-trip test locally where the DTO's
(and flyweight codec's) String-typed accessor would return `""` if the
first character in an array was the `null` character. Therefore, it
would not round-trip the "hidden bytes" after the `null`.

In this commit, I've added a generation mode that disables the
generation of strings with this format. Given the representation of
char[N] in DTOs it would not be sensible to support round-tripping these
"hidden bytes".

* [Java,C++,C#] Improve DTO value validation.

Previously, in Java, the generated validation methods would not permit
the null value, even for fields with optional presence. Now, we do allow
these correctly.

In this commit, I've also attempted to clean up, centralise, and test
the logic that decides when validation for a particular side of a range
needs to be included. We omit the validation when the native type cannot
represent values outside of the range.

There are still some gaps around validation, e.g., we do not validate
array values.

* [Java] Fix JSON escaping of string characters, delimeters, etc.

Previously, there were several problems:

- (delimeters) single `char` values were output with single quotes, but this is not
valid JSON.

- (escaping) escaping was not implemented as per the specification in
Section 7 of RFC 8259. For example, special-cased control codes, e.g.,
`\b` were encoded as `\\` followed by `\b` rather than `\\` followed by
`b`. Also, the non-special-cased control characters were not encoded
using JSON's mechanism for doing so, e.g., `\u0020`.

- (numbers) Section 6 of the specification says "Numeric values that
cannot be represented in the grammar below (such as Infinity and NaN)
are not permitted." However, these were encoded as expressions of
numbers with multiple terms, e.g., `-1/0` for positive infinity. While
these are quite logical encodings of such "numbers", it is not valid
JSON. Therefore, I have kept the expressions, but enclosed them within
quotes.

Also, in this commit:

- Replaced custom compilation logic with Agrona's CompilerUtil. Note
that `CompilerUtil#compileOnDisk` is broken. I attempted to use it to
see if I could see classes in IntelliJ rather than just stack frames,
but it fails with an NPE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants