-
Notifications
You must be signed in to change notification settings - Fork 7.9k
ext/bcmath: Optimize bcdiv
processing
#14660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a mistake in some of the assumptions, which made the code unnecessarily complicated, so I will correct it. |
3e0c1df
to
f77cdf8
Compare
done |
5762267
to
4647323
Compare
ext/bcmath/libbcmath/src/convert.c
Outdated
BC_VECTOR bc_parse_chunk_chars(const char *str) | ||
{ | ||
BC_VECTOR tmp; | ||
memcpy(&tmp, str, sizeof(tmp)); | ||
#if !BC_LITTLE_ENDIAN | ||
tmp = BC_BSWAP(tmp); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I see correctly this part is the same for both sizes, so it can be excluded from the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is originally Niels' code, and is written this way for readability.
I would like to leave his code untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it's more readable when we repeat the second level if condition in both functions, but it's a subjective opinion. For sure cognitive complexity is lower without nesting if conditions. It's not a big deal though.
It could probably be made a bit less complicated. I'll fix it and rebase it. |
67b9e10
to
e442f59
Compare
done |
7549b7b
to
6714a3b
Compare
ext/bcmath/libbcmath/src/div.c
Outdated
BC_VECTOR div_carry = 0; | ||
|
||
/* | ||
* If calculate the temporary quotient using only one array of n1 and n2, the error will be greater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rule have a name? If so, please let me know, as I can simply provide the URL.
First two commits are definitely fine to already land on master. I'll try to have a look at the actual algorithmic changes today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bc_do_div and bc_fast_div logic seem fine.
I couldn't quite follow bc_standard_div and am waiting for clarifications on that. I'll continue when my current remarks/questions have been answered.
I also think this should've been done in multiple steps, e.g. you already did a few optimizations in the base algorithm which complicates things.
What are the similarities and differences / advantages and disadvantages of your algorithm vs the previously implemented algorithm?
ext/bcmath/libbcmath/src/div.c
Outdated
|
||
/* | ||
* If calculate the temporary quotient using only one array of n1 and n2, the error will be greater | ||
* than 1 and the number of additions and backs will increase significantly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you mean with backs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried changing the wording of the comment. Is this correct in English?
ext/bcmath/libbcmath/src/div.c
Outdated
* If calculate the temporary quotient using only one array of n1 and n2, the error will be greater | ||
* than 1 and the number of additions and backs will increase significantly. | ||
* Therefore, when calculating the quotient, adjust the number of digits for n2 so that it becomes | ||
* BC_VECTOR_SIZE + 1 digit, and adjust the number of digits for n1 by the same amount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that you add padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be slightly different from padding. I've edited the comment to be a little more detailed.
ext/bcmath/libbcmath/src/div.c
Outdated
* than 1 and the number of additions and backs will increase significantly. | ||
* Therefore, when calculating the quotient, adjust the number of digits for n2 so that it becomes | ||
* BC_VECTOR_SIZE + 1 digit, and adjust the number of digits for n1 by the same amount. | ||
* By doing so, the error in the quotient will be within 1, so if need to add back, only need to add it once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what you mean with "add back"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression has been corrected.
ext/bcmath/libbcmath/src/div.c
Outdated
* | ||
* Details: | ||
* Suppose that when calculating the temporary quotient, only the high-order elements of the BC_VECTOR array are used. | ||
* At this time, n1 and n2 can be considered to be decomposed as follows. (Here, B = 10^b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does b
come from, or is this just "any natural number" ?
EDIT: okay I see, it's any natural number afaict, please clarify this by adding e.g. (any b ∈ ℕ)
in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added it.
ext/bcmath/libbcmath/src/div.c
Outdated
BC_VECTOR *n2_vector = n1_vector + n1_arr_size; | ||
BC_VECTOR *quot_vector = n2_vector + n2_arr_size; | ||
|
||
/* Fill with zeros and convert as many vector elements as need */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Fill with zeros and convert as many vector elements as need */ | |
/* Fill with zeros and convert as many vector elements as needed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
ext/bcmath/libbcmath/src/div.c
Outdated
* n2 = n2_high * B^k + n2_low | ||
* | ||
* The error E can be expressed by the following formula. | ||
* E = n1_high * B^k / n2_high * B^k - (n1_high * B^k + n1_low) / (n2_high * B^k + n2_low) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you mean (n1_high * B^k) / (n2_high * B^k)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the definition or the error in this case? I know things like relative error and absolute error in numeric approximations but I'm not sure what you really mean here or where this is going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses were missing in many places. The result of the expression is correct. Fixed.
ext/bcmath/libbcmath/src/div.c
Outdated
* E = n1_high / n2_high - (n1_high * B^k + n1_low) / (n2_high * B^k + n2_low) | ||
* E = (n1_high * (n2_high * B^k + n2_low) - (n1_high * B^k + n1_low) * n2_high) / n2_high * (n2_high * B^k + n2_low) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From line 97 to line 98 there is a mistake I think. Isn't the last part supposed to be / (n2_high * B^k + n2_low)
instead of * (n2_high * B^k + n2_low)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consequently the steps that follow on this could be wrong too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
I replied and edited the comment.
I should have broken it down a little more, sorry.
Is this about Due to the change in radix of calculation, the method used to calculate the "temporary quotient" is significantly different from the previous implementation. |
Thanks for the changes, I'll have a look again this weekend.
I see. The reason I asked the question was because I didn't fully grasp that you were using the restoring division technique, as I didn't manage to get that far because of the issues with the expressions in the comments. Thanks for clarifying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to understand most of the derivations but still have some questions about it.
ext/bcmath/libbcmath/src/div.c
Outdated
* Since B = 10^b, the formula becomes: | ||
* MAX_E = (n2_high * B * (B^k - 1)) / (n2_high * (n2_high * B^k + B^k - 1)) | ||
* MAX_E = (10^x * 10^b * ((10^b)^k - 1)) / (10^x * (10^x * (10^b)^k + (10^b)^k - 1)) | ||
* MAX_E = (10^(x + b) * (10^(k * b) - 1)) / (10^(2x + k * b) + 10^(x + k * b) + 10^x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sign error?
* MAX_E = (10^(x + b) * (10^(k * b) - 1)) / (10^(2x + k * b) + 10^(x + k * b) + 10^x) | |
* MAX_E = (10^(x + b) * (10^(k * b) - 1)) / (10^(2x + k * b) + 10^(x + k * b) - 10^x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't immediately see how you go from this line (145) to line 146
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made a pretty elementary mistake. Fixed the approximation procedure.
ext/bcmath/libbcmath/src/div.c
Outdated
* | ||
* Approximate again and remove +1 in the denominator | ||
* MAX_E < 10^(x + k + 2b) / (10^x * (10^(x + k * b) + 1)) | ||
* MAX_E < 10^(x + k + 2b) / (10^x * 10^(x + k * b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you go from this line to line 157?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
n1_calc_bottom[j] -= sub_low; | ||
} else { | ||
n1_calc_bottom[j] += BC_VECTOR_BOUNDARY_NUM - sub_low; | ||
borrow++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the borrow never reset? I think I'm missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is always overwritten with borrow = sub / BC_VECTOR_BOUNDARY_NUM
in L209, so it's okay.
ext/bcmath/libbcmath/src/div.c
Outdated
char *qptr = (*quot)->n_value; | ||
char *qend = qptr + quot_len - 1; | ||
|
||
i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless line? The for loop initializes this to zero in the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thx. I fixed it
611fbb7
@nielsdos |
Since the property hook was merged, I merged master just to be sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, or at least I don't see anything wrong and some stress testing doesn't reveal issues.
It would still be good if someone else had a look as well.
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working through it
ext/bcmath/libbcmath/src/div.c
Outdated
/* | ||
* Used when the divisor is split into 2 or more chunks. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
/* | |
* Used when the divisor is split into 2 or more chunks. | |
*/ | |
/* Used when the divisor is split into 2 or more chunks. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
859a9db
ext/bcmath/libbcmath/src/div.c
Outdated
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1. | ||
* In other words, in the restoring division, the count of additions for restore increases significantly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not exactly sure what you mean by
using only one array for numerator and divisor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing, which I would like to know, is how much of an impact in performance do the repeated restores actual cause.
Because if it is negligible, I would just not bother, it would also make the initial implementation easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not exactly sure what you mean by
This means that an error will occur if calculations are made using only the high parts.
The other thing, which I would like to know, is how much of an impact in performance do the repeated restores actual cause.
Because if it is negligible, I would just not bother, it would also make the initial implementation easier to review.
I'll try measuring it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1. | |
* In other words, in the restoring division, the count of additions for restore increases significantly. | |
* Because we use the restoring division we need to perform E restorations, which can be significant if E is large. |
Feels like it conveys the same meaning, just way more succinctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Girgias
Measurements were made under conditions that would most likely make a difference.
code:
$num1 = '90000000000000000000000000000000000000000000000000000000000000';
$num2 = '109999999999999999999999999999999999999999';
$start = microtime(true);
for ($i = 0; $i < 500; $i++) {
bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;
Multiple restores: 13.932011127472s
This PR: 0.00018405914306641s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the comment
72f7f3a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Girgias Measurements were made under conditions that would most likely make a difference.
code:
$num1 = '90000000000000000000000000000000000000000000000000000000000000'; $num2 = '109999999999999999999999999999999999999999'; $start = microtime(true); for ($i = 0; $i < 500; $i++) { bcdiv($num1, $num2, 0); } echo microtime(true) - $start . PHP_EOL;
Multiple restores: 13.932011127472s This PR: 0.00018405914306641s
Do you have the implementation without a restore in a branch somewhere just so that I can have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Girgias
Made a PR in my repository :)
SakiTakamachi#10
Co-authored-by: Gina Peter Banyard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments, I think I'm happy with the algorithm somewhat
/* For sub, add the remainder to the high-order digit */ | ||
numerator_vectors[numerator_top_index - i] += div_carry * BC_VECTOR_BOUNDARY_NUM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "for sub" here is throwing me off, what are you trying to say precisely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numerator_high_part
is only used to calculate the quotient, and numerator_vectors
does not contain div_carry
here yet.
After this, add div_carry
to numerator_vectors
for numerator_vectors - (divisor_vectors * quot_guess)
.
ext/bcmath/libbcmath/src/div.c
Outdated
static void bc_do_div(const char *n1, size_t n1_readable_len, size_t n1_bottom_extension, const char *n2, size_t n2_len, bc_num *quot, size_t quot_len) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to rename n1 and n2 here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
ffb6e8e
ext/bcmath/libbcmath/src/div.c
Outdated
/* remove n2 trailing zeros */ | ||
while (*n2end == 0 && n2_scale > 0) { | ||
n2end--; | ||
n2_scale--; | ||
} | ||
while (*n2end == 0) { | ||
n2end--; | ||
n2_int_right_zeros++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having this as a private function that is also use by _bc_rm_leading_zeros()
would make this less strange, especially as it used a couple of times.
Please have this in a separate commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop continuation conditions for _bc_rm_leading_zeros()
are as follows.
*num->n_value == 0 && num->n_len > 1
Meanwhile, the conditions for the loop that removes leading_zeros in here:
*n1ptr == 0 && n1_len > 0
Combining it with _bc_rm_leading_zeros()
may be a bit more complicated, as the conditions are slightly different.
I think the changes made an improvement in the understandability. Looks good. |
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments, but also I really would like to see what the difference in performance is between doing all the work for having a single restoration, or just having multiple restorations.
ext/bcmath/libbcmath/src/div.c
Outdated
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1. | ||
* In other words, in the restoring division, the count of additions for restore increases significantly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1. | |
* In other words, in the restoring division, the count of additions for restore increases significantly. | |
* Because we use the restoring division we need to perform E restorations, which can be significant if E is large. |
Feels like it conveys the same meaning, just way more succinctly
* numerator_arr = [0, 5678, 1234] | ||
* divisor_arr = [6789, 2345, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the split for the divisor favouring the low digits whereas the numerator favours the high digits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the numerator favours the high digits
No, this isn't actually the case. Written in an easy-to-understand manner, it looks like this:
numerator_arr = [0000, 5678, 1234]
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
@Girgias |
Do you have any other concerns about this? |
Changed to use
BC_VECTOR
to calculate faster.Since there is a considerable speed difference, the benchmark was roughly measured.
The division code was changed in the third commit.
benchmark code:
before:
after: