Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arb mulhigh #1802

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Arb mulhigh #1802

wants to merge 5 commits into from

Conversation

albinahlback
Copy link
Collaborator

I don't think we should have different precisions on different systems (currently flint_mpn_mulhigh is only available on some x86-64 systems), but this is my very preliminary draft on how I want arb_mul to look like.

Left to fix:

  • Make flint_mpn_mulhigh available on all systems.
  • Implement _arb_mul_special
  • Make sure that the mag is has the correct bound -- arf_mul_rnd_sloppy rounds input in order to be able to perform a n-by-n high multiplication. So one has to account for rounding before the multiplication is done as well as the fact that the multiplication is inexact.
  • Discuss if we should account for the precision for the normal case (i.e. when arf_mul_rnd_sloppy is used) -- should we round the result or not?

@fredrik-johansson
Copy link
Collaborator

Yeah, I wouldn't do this before having a generic C fallback for and fixing the performance issues for flint_mpn_mulhigh / flint_mpn_mulhigh_normalised, so that these functions can be relied upon without conditional directives creeping into higher level code.

It will certainly not work to just take the shorter of the input lengths as the precision in arf_mul_rnd_sloppy. In general, the inputs can be shorter or longer than the precision, or a mixture. A bit of logic is needed for this. It will even be optimal to zero-pad operands in some cases.

Note that it is also necessary to normalise the arf_t output by removing trailing zero limbs.

@fredrik-johansson
Copy link
Collaborator

I think this work would be aided by having a general flint_mpn_mulhigh which computes the top n (+ 1) limbs of an m x p product, m + p >= n (+ 1).

Such a method would be a bit complex, but it would offload complexity from the arf/arb side.

@albinahlback
Copy link
Collaborator Author

albinahlback commented Feb 28, 2024

In general, isn't it true that an arb_t is on the form $x = a 2^{m} \pm b 2^{n}$ with $a, b, m, n \in \mathbb{Z}$ and $|b 2^{n}| > 2^{m}$? Or am I thinking wrong that only calculating an balanced high multiplication for unbalanced multiplicants will not give a so-much-bigger error bound?

@tthsqe12
Copy link
Contributor

mulhi? I agree arbs would like this, but Newton's method likes mulmid more.
I have been experimenting with the function

ulong mpn_mulmid_approx(ulong* x, ulong xlo, ulong xhi, const ulong* a, ulong an, const ulong* b, ulong bn);

which requires space for all an+bn limbs in x but is only responsible for writing [xlo,xhi). If the return is e, then the true middle limbs lie somewhere inclusively between the produced limbs and the produced limbs + e. Usually the return is 0.

@fredrik-johansson
Copy link
Collaborator

mulhi? I agree arbs would like this, but Newton's method likes mulmid more. I have been experimenting with the function

ulong mpn_mulmid_approx(ulong* x, ulong xlo, ulong xhi, const ulong* a, ulong an, const ulong* b, ulong bn);

which requires space for all an+bn limbs in x but is only responsible for writing [xlo,xhi). If the return is e, then the true middle limbs lie somewhere inclusively between the produced limbs and the produced limbs + e. Usually the return is 0.

Yep, that should improve the division and square root code quite a bit.

mulhigh is already doing good things for nfloat though. I think I want to iron out all the basic algorithms there before trying to improve arf/arb.

@albinahlback
Copy link
Collaborator Author

mulhi? I agree arbs would like this, but Newton's method likes mulmid more. I have been experimenting with the function

ulong mpn_mulmid_approx(ulong* x, ulong xlo, ulong xhi, const ulong* a, ulong an, const ulong* b, ulong bn);

which requires space for all an+bn limbs in x but is only responsible for writing [xlo,xhi). If the return is e, then the true middle limbs lie somewhere inclusively between the produced limbs and the produced limbs + e. Usually the return is 0.

Depends on what Newton's method you are talking about. I believe that precomputed inverses and reciprocal square roots via Newton's method really favors mulhi.

@tthsqe12
Copy link
Contributor

mulhi is a special case of mulmid, and some of your mulhis for inversion don't need all of the high bits. It is even in the TODO.md:

nmod_poly

  • Implement fast mulmid and use to improve Newton iteration

fmpzs are just nmod_polys with carries, so the same principles apply.

@fredrik-johansson
Copy link
Collaborator

You can see this in

/* should be mulmid */
where the low m coefficients of the mullow output are never used.

(Of course high/low Newton divison are the same but reversed.)

@tthsqe12
Copy link
Contributor

tthsqe12 commented Jul 16, 2024

For beyond 20M bits, I am seeing a ~30% improvement in arb_inv with this mulmid over what is currently in flint. (Still have to tune it in the medium range.)

@fredrik-johansson
Copy link
Collaborator

For beyond 20M bits, I am seeing a ~30% improvement in arb_inv with this mulmid over what is currently in flint. (Still have to tune it in the medium range.)

That sounds excellent.

Of course, the current arb code is a placeholder implementation using arf arithmetic. For medium precision, having everything in mpn form should be a bit better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants