Skip to content

ext/bcmath: Optimize bcdiv processing #14660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Aug 18, 2024

Conversation

SakiTakamachi
Copy link
Member

@SakiTakamachi SakiTakamachi commented Jun 25, 2024

Changed to use BC_VECTOR to calculate faster.
Since there is a considerable speed difference, the benchmark was roughly measured.
The division code was changed in the third commit.

benchmark code:

<?php

// 1
$num1 = '1.23';
$num2 = '2';

$start = microtime(true);
for ($i = 0; $i < 500000; $i++) {
    bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;

// 2
$num1 = '4.123456';
$num2 = '0.001';

$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
    bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;

// 3
$num1 = '1.23';
$num2 = '2.12';

$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
    bcdiv($num1, $num2, 4);
}
echo microtime(true) - $start . PHP_EOL;

// 4
$num1 = '5.2345678';
$num2 = '2.1234567';

$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
    bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;

// 5
$num1 = '5.23456789';
$num2 = '2.12345678';

$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
    bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;

// 6
$num1 = '1.234567812345678';
$num2 = '2.123456712345678';

$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
    bcdiv($num1, $num2, 50);
}
echo microtime(true) - $start . PHP_EOL;

// 7
$num1 = '1.234567812345678';
$num2 = '2.1';

$start = microtime(true);
for ($i = 0; $i < 1000000; $i++) {
    bcdiv($num1, $num2, 50);
}
echo microtime(true) - $start . PHP_EOL;

// 8
$num1 = str_repeat('1234567890', 300);
$num2 = str_repeat('9876543210', 300);

$start = microtime(true);
for ($i = 0; $i < 5000; $i++) {
    bcdiv($num1, $num2, 50);
}
echo microtime(true) - $start . PHP_EOL;

before:

0.13476610183716 // 1
0.28906989097595 // 2
0.29561901092529 // 3
0.26175594329834 // 4
0.25259518623352 // 5
2.5445368289948 // 6
1.1578629016876 // 7
1.598653793335 // 8

after:

0.080783128738403 // 1
0.15704202651978 // 2
0.15926694869995 // 3
0.16794800758362 // 4
0.17711997032166 // 5
0.32082390785217 // 6
0.28861498832703 // 7
0.048795223236084 // 8

@SakiTakamachi
Copy link
Member Author

There was a mistake in some of the assumptions, which made the code unnecessarily complicated, so I will correct it.

@SakiTakamachi
Copy link
Member Author

done

Comment on lines 73 to 79
BC_VECTOR bc_parse_chunk_chars(const char *str)
{
BC_VECTOR tmp;
memcpy(&tmp, str, sizeof(tmp));
#if !BC_LITTLE_ENDIAN
tmp = BC_BSWAP(tmp);
#endif
Copy link
Contributor

@jorgsowa jorgsowa Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see correctly this part is the same for both sizes, so it can be excluded from the condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is originally Niels' code, and is written this way for readability.

I would like to leave his code untouched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it's more readable when we repeat the second level if condition in both functions, but it's a subjective opinion. For sure cognitive complexity is lower without nesting if conditions. It's not a big deal though.

@SakiTakamachi
Copy link
Member Author

It could probably be made a bit less complicated.

I'll fix it and rebase it.

@SakiTakamachi SakiTakamachi force-pushed the refactor_bcdiv branch 7 times, most recently from 67b9e10 to e442f59 Compare June 28, 2024 07:53
@SakiTakamachi
Copy link
Member Author

done

@SakiTakamachi SakiTakamachi force-pushed the refactor_bcdiv branch 2 times, most recently from 7549b7b to 6714a3b Compare June 28, 2024 11:51
BC_VECTOR div_carry = 0;

/*
* If calculate the temporary quotient using only one array of n1 and n2, the error will be greater
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rule have a name? If so, please let me know, as I can simply provide the URL.

@SakiTakamachi
Copy link
Member Author

@Girgias @nielsdos

This is all the changes in this PR, so I would appreciate it if you could check it out when you have time :)

@nielsdos
Copy link
Member

nielsdos commented Jul 4, 2024

First two commits are definitely fine to already land on master. I'll try to have a look at the actual algorithmic changes today.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bc_do_div and bc_fast_div logic seem fine.
I couldn't quite follow bc_standard_div and am waiting for clarifications on that. I'll continue when my current remarks/questions have been answered.
I also think this should've been done in multiple steps, e.g. you already did a few optimizations in the base algorithm which complicates things.
What are the similarities and differences / advantages and disadvantages of your algorithm vs the previously implemented algorithm?


/*
* If calculate the temporary quotient using only one array of n1 and n2, the error will be greater
* than 1 and the number of additions and backs will increase significantly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean with backs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried changing the wording of the comment. Is this correct in English?

* If calculate the temporary quotient using only one array of n1 and n2, the error will be greater
* than 1 and the number of additions and backs will increase significantly.
* Therefore, when calculating the quotient, adjust the number of digits for n2 so that it becomes
* BC_VECTOR_SIZE + 1 digit, and adjust the number of digits for n1 by the same amount.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that you add padding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be slightly different from padding. I've edited the comment to be a little more detailed.

* than 1 and the number of additions and backs will increase significantly.
* Therefore, when calculating the quotient, adjust the number of digits for n2 so that it becomes
* BC_VECTOR_SIZE + 1 digit, and adjust the number of digits for n1 by the same amount.
* By doing so, the error in the quotient will be within 1, so if need to add back, only need to add it once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what you mean with "add back"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression has been corrected.

*
* Details:
* Suppose that when calculating the temporary quotient, only the high-order elements of the BC_VECTOR array are used.
* At this time, n1 and n2 can be considered to be decomposed as follows. (Here, B = 10^b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does b come from, or is this just "any natural number" ?
EDIT: okay I see, it's any natural number afaict, please clarify this by adding e.g. (any b ∈ ℕ) in the comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added it.

BC_VECTOR *n2_vector = n1_vector + n1_arr_size;
BC_VECTOR *quot_vector = n2_vector + n2_arr_size;

/* Fill with zeros and convert as many vector elements as need */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Fill with zeros and convert as many vector elements as need */
/* Fill with zeros and convert as many vector elements as needed */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* n2 = n2_high * B^k + n2_low
*
* The error E can be expressed by the following formula.
* E = n1_high * B^k / n2_high * B^k - (n1_high * B^k + n1_low) / (n2_high * B^k + n2_low)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you mean (n1_high * B^k) / (n2_high * B^k)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the definition or the error in this case? I know things like relative error and absolute error in numeric approximations but I'm not sure what you really mean here or where this is going.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses were missing in many places. The result of the expression is correct. Fixed.

Comment on lines 97 to 98
* E = n1_high / n2_high - (n1_high * B^k + n1_low) / (n2_high * B^k + n2_low)
* E = (n1_high * (n2_high * B^k + n2_low) - (n1_high * B^k + n1_low) * n2_high) / n2_high * (n2_high * B^k + n2_low)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 97 to line 98 there is a mistake I think. Isn't the last part supposed to be / (n2_high * B^k + n2_low) instead of * (n2_high * B^k + n2_low) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consequently the steps that follow on this could be wrong too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jul 5, 2024

@nielsdos

I couldn't quite follow bc_standard_div and am waiting for clarifications on that. I'll continue when my current remarks/questions have been answered.

I replied and edited the comment.

I also think this should've been done in multiple steps, e.g. you already did a few optimizations in the base algorithm which complicates things.

I should have broken it down a little more, sorry.

What are the similarities and differences / advantages and disadvantages of your algorithm vs the previously implemented algorithm?

Is this about bc_standard_div (or bc_fast_div)?
It is similar to the previous implementation in that it is a using restoring division algorithm. The difference is that the previous implementation calculated one digit at a time, whereas the new implementation calculates BC_VECTOR_SIZE digits at a time.

Due to the change in radix of calculation, the method used to calculate the "temporary quotient" is significantly different from the previous implementation.

@nielsdos
Copy link
Member

nielsdos commented Jul 5, 2024

Thanks for the changes, I'll have a look again this weekend.

What are the similarities and differences / advantages and disadvantages of your algorithm vs the previously implemented algorithm?

Is this about bc_standard_div (or bc_fast_div)? It is similar to the previous implementation in that it is a using restoring division algorithm. The difference is that the previous implementation calculated one digit at a time, whereas the new implementation calculates BC_VECTOR_SIZE digits at a time.

Due to the change in radix of calculation, the method used to calculate the "temporary quotient" is significantly different from the previous implementation.

I see. The reason I asked the question was because I didn't fully grasp that you were using the restoring division technique, as I didn't manage to get that far because of the issues with the expressions in the comments. Thanks for clarifying.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to understand most of the derivations but still have some questions about it.

* Since B = 10^b, the formula becomes:
* MAX_E = (n2_high * B * (B^k - 1)) / (n2_high * (n2_high * B^k + B^k - 1))
* MAX_E = (10^x * 10^b * ((10^b)^k - 1)) / (10^x * (10^x * (10^b)^k + (10^b)^k - 1))
* MAX_E = (10^(x + b) * (10^(k * b) - 1)) / (10^(2x + k * b) + 10^(x + k * b) + 10^x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign error?

Suggested change
* MAX_E = (10^(x + b) * (10^(k * b) - 1)) / (10^(2x + k * b) + 10^(x + k * b) + 10^x)
* MAX_E = (10^(x + b) * (10^(k * b) - 1)) / (10^(2x + k * b) + 10^(x + k * b) - 10^x)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't immediately see how you go from this line (145) to line 146

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a pretty elementary mistake. Fixed the approximation procedure.

fa0b577

*
* Approximate again and remove +1 in the denominator
* MAX_E < 10^(x + k + 2b) / (10^x * (10^(x + k * b) + 1))
* MAX_E < 10^(x + k + 2b) / (10^x * 10^(x + k * b))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you go from this line to line 157?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

n1_calc_bottom[j] -= sub_low;
} else {
n1_calc_bottom[j] += BC_VECTOR_BOUNDARY_NUM - sub_low;
borrow++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the borrow never reset? I think I'm missing something here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is always overwritten with borrow = sub / BC_VECTOR_BOUNDARY_NUM in L209, so it's okay.

char *qptr = (*quot)->n_value;
char *qend = qptr + quot_len - 1;

i = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless line? The for loop initializes this to zero in the next line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thx. I fixed it
611fbb7

@SakiTakamachi
Copy link
Member Author

@nielsdos
Fixed comments and removed unnecessary lines. I apologize for making so many beginner's mistakes.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jul 16, 2024

Since the property hook was merged, I merged master just to be sure.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right, or at least I don't see anything wrong and some stress testing doesn't reveal issues.
It would still be good if someone else had a look as well.

SakiTakamachi and others added 4 commits July 18, 2024 12:12
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working through it

Comment on lines 70 to 72
/*
* Used when the divisor is split into 2 or more chunks.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
/*
* Used when the divisor is split into 2 or more chunks.
*/
/* Used when the divisor is split into 2 or more chunks. */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed
859a9db

Comment on lines 98 to 99
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1.
* In other words, in the restoring division, the count of additions for restore increases significantly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not exactly sure what you mean by

using only one array for numerator and divisor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing, which I would like to know, is how much of an impact in performance do the repeated restores actual cause.

Because if it is negligible, I would just not bother, it would also make the initial implementation easier to review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not exactly sure what you mean by

This means that an error will occur if calculations are made using only the high parts.

The other thing, which I would like to know, is how much of an impact in performance do the repeated restores actual cause.

Because if it is negligible, I would just not bother, it would also make the initial implementation easier to review.

I'll try measuring it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1.
* In other words, in the restoring division, the count of additions for restore increases significantly.
* Because we use the restoring division we need to perform E restorations, which can be significant if E is large.

Feels like it conveys the same meaning, just way more succinctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Girgias
Measurements were made under conditions that would most likely make a difference.

code:

$num1 = '90000000000000000000000000000000000000000000000000000000000000';
$num2 = '109999999999999999999999999999999999999999';

$start = microtime(true);
for ($i = 0; $i < 500; $i++) {
    bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;

Multiple restores: 13.932011127472s
This PR: 0.00018405914306641s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the comment
72f7f3a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Girgias Measurements were made under conditions that would most likely make a difference.

code:

$num1 = '90000000000000000000000000000000000000000000000000000000000000';
$num2 = '109999999999999999999999999999999999999999';

$start = microtime(true);
for ($i = 0; $i < 500; $i++) {
    bcdiv($num1, $num2, 0);
}
echo microtime(true) - $start . PHP_EOL;

Multiple restores: 13.932011127472s This PR: 0.00018405914306641s

Do you have the implementation without a restore in a branch somewhere just so that I can have a look?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Girgias
Made a PR in my repository :)
SakiTakamachi#10

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments, I think I'm happy with the algorithm somewhat

Comment on lines +198 to +199
/* For sub, add the remainder to the high-order digit */
numerator_vectors[numerator_top_index - i] += div_carry * BC_VECTOR_BOUNDARY_NUM;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "for sub" here is throwing me off, what are you trying to say precisely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numerator_high_part is only used to calculate the quotient, and numerator_vectors does not contain div_carry here yet.

After this, add div_carry to numerator_vectors for numerator_vectors - (divisor_vectors * quot_guess).

Comment on lines 253 to 254
static void bc_do_div(const char *n1, size_t n1_readable_len, size_t n1_bottom_extension, const char *n2, size_t n2_len, bc_num *quot, size_t quot_len)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to rename n1 and n2 here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed
ffb6e8e

Comment on lines 351 to 359
/* remove n2 trailing zeros */
while (*n2end == 0 && n2_scale > 0) {
n2end--;
n2_scale--;
}
while (*n2end == 0) {
n2end--;
n2_int_right_zeros++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this as a private function that is also use by _bc_rm_leading_zeros() would make this less strange, especially as it used a couple of times.

Please have this in a separate commit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop continuation conditions for _bc_rm_leading_zeros() are as follows.

*num->n_value == 0 && num->n_len > 1

Meanwhile, the conditions for the loop that removes leading_zeros in here:

*n1ptr == 0 && n1_len > 0

Combining it with _bc_rm_leading_zeros() may be a bit more complicated, as the conditions are slightly different.

@nielsdos
Copy link
Member

I think the changes made an improvement in the understandability. Looks good.

SakiTakamachi and others added 5 commits July 20, 2024 18:15
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments, but also I really would like to see what the difference in performance is between doing all the work for having a single restoration, or just having multiple restorations.

Comment on lines 98 to 99
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1.
* In other words, in the restoring division, the count of additions for restore increases significantly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If calculate the temporary quotient using only one array of numerator and divisor, the error E can be larger than 1.
* In other words, in the restoring division, the count of additions for restore increases significantly.
* Because we use the restoring division we need to perform E restorations, which can be significant if E is large.

Feels like it conveys the same meaning, just way more succinctly

Comment on lines +112 to +113
* numerator_arr = [0, 5678, 1234]
* divisor_arr = [6789, 2345, 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the split for the divisor favouring the low digits whereas the numerator favours the high digits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the numerator favours the high digits

No, this isn't actually the case. Written in an easy-to-understand manner, it looks like this:

numerator_arr = [0000, 5678, 1234]

SakiTakamachi and others added 4 commits July 22, 2024 11:06
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
Co-authored-by: Gina Peter Banyard <[email protected]>
@SakiTakamachi
Copy link
Member Author

@Girgias
Fixed comments and took measurements for comparison (#14660 (comment)).

@SakiTakamachi
Copy link
Member Author

@Girgias

Do you have any other concerns about this?

@SakiTakamachi SakiTakamachi merged commit 8c704ab into php:master Aug 18, 2024
11 checks passed
@SakiTakamachi SakiTakamachi deleted the refactor_bcdiv branch August 18, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants