Modular arithmetic in Montgomery form

Question

I am trying to apply an algorithm that requires significant modular arithmetic, but in such a way as to avoid as many costly mod or division operations as possible.

I found the Montgomery multiplication method and its form of integer residues (via wikipedia and thus the original paper), but I am having difficulty understanding how to apply it practically.

Using the example in the wikipedia writeup, with $n = 17$ and an $R^{-1} = 8$, I can do very simple arithmetic with an implementation of redc that matches the expected value $mod\ n$:

$ (7 * 15) `mod` n
3
$ redc (3 * 4 * r')
3
$ (7 + 15) `mod` n
5
$ redc (3 + 4)
5

However, while the original paper and the wikipedia page say explicitly that most arithmetic operations can be applied over residues as "normally used", I'm not seeing that borne out:

$ (7 * 15 * 15) `mod` n
11
$ redc (3 * 4 * 4 * r')
12

(I know this might be on the edge of this being a stackoverflow question, but I suspect I'm suffering from a formal misunderstanding here, rather than a simple programming error.)

I already read three different sources of this thing, and I haven't yet found one single interesting, worked-out example...Interesting. The Wikipedia article also mentions that using this Montgomery form can make multiplication harder...so I really don't understand what's this method's great advantage. Perhaps some other uses... — DonAntonio, Mar 19 '18 at 14:38
Incidentally, "use an existing library" might be better advice than how to fix your implementation. Also, if you're writing in an interpreted language, you are unlikely to have any hope of outperforming your language's native mod operation. I'd even be somewhat skeptical of your chances in a language that gets compiled to an intermediate bytecode. — , Mar 19 '18 at 15:24
@Hurkyl The implementation is destined for an FPGA, so hopefully that explains why "use a library" isn't really pertinent. — Diego Maü, Mar 19 '18 at 15:28

score 1 · Answer 1 · 2018-03-19T15:17:16.370

$\def\redc{\mathrm{redc}}$ Throughout this post, the parameters $N$, $R$, and $R'$ will be implicit in function calls.

What is the operation?

The arithmetic primitive $\redc$ is meant to perform the operation:

$\redc(x) \equiv x R^{-1} \pmod N$. Additionally, if $x$ is large, then $\redc(x)$ will be smaller than $x$.

It's primary advantage over a more direct modular reduction algorithm is that the only divisions this algorithm does is division by $R$. When $R$ is a power of 2, this can be especially convenient for computer implementation. $R = 2^{32}$ and $R=2^{64}$ are popular choices, since the corresponding division operations are essentially free due to how the processor stores and manipulates integers in registers.

Using it for modular multiplication

Using $\redc$ for reductions can be awkward since you have to account for that extra factor of $R^{-1}$ in your calculations. "Montgomery form" is a clever trick that makes $\redc$ more convenient to use. The relevant point is that you have an algebraic identity

$$ \redc(xR \cdot yR) \equiv xyR \pmod N $$

In an actual program, the trick is that you don't bother keeping around the values $x \bmod N$ and $y \bmod N$ — you only store the values (equivalent to) $xR \bmod N$ and $yR \bmod N$. Every time you multiply two values, you accumulate an extra factor of $R$ in the product, which is cancelled out when you invoke $\redc$, so the bookkeeping is nearly trivial.

Note that when you multiply three values, you have to invoke $\redc$ twice. The reason for this is three-fold.

First of all, you need to invoke redc twice to keep the bookkeeping lined up; if you only invoke it once your result has an extra factor of $R$:

$$\begin{align} \redc(xR \cdot yR \cdot zR) &\equiv xyzR^2 &\pmod N \\ \redc(\redc(xR \cdot yR) \cdot zR) &\equiv xyzR &\pmod N \end{align}$$

The second reason is that larger numbers require more work to multiply, since multiprecision integer operations get more expensive as the numbers get larger. It's faster to insert an extra reduction step in the middle to keep the operand sizes down. (If you're doing single precision calculations, you still need to keep operand sizes down in order to avoid overflow!)

The third reason is the amount of reduction that $\redc$ does. With typical parameters, the product of three numbers will be too large for $\redc$ to fully reduce in one invocation — you need to invoke it twice to get the result to be about the same size as $N$.

(Not) using it for modular addition

You don't want to invoke $\redc$ for additions. Addition doesn't make numbers much larger, so its faster just to test if you've exceeded $N$ and subtract $N$ if you have. (Actually, replacing $N$ with a small multiple like $2N$ or $4N$ is often more convenient)

Furthermore, check the algebra: $$\begin{align} \redc(xR + yR) &\equiv x + y &\pmod N \\ xR + yR &\equiv (x + y)R &\pmod N \end{align}$$ If you were planning on storing the result of addition in Montgomery form, then the latter identity is the one you want to implement!

Consider Barrett reduction as an alternative

A complementary algorithm to $\redc$ is that of Barrett reduction, which is based on essentially the same idea¹ as $\redc$. Barrett reduction has the advantage that there is no extra factors in the result; the output is precisely what you want: $$ \mathrm{BarrettReduction}(x) \equiv x \pmod N $$ Thus, using it efficiently doesn't require any of this business of alternate forms.

Unfortunately, it tends to be slower than $\redc$.

_{1: Montgomery reduction and Barrett reduction are performing the same series of steps, but in different arithmetic contexts. Barrett reduction works in the real numbers, whereas Montgomery reduction works in the $2$-adic numbers.}

Thank you for the comprehensive answer. I'm about to read it properly, but just a note that I am aware of Barrett reduction, but it appears to be impractical for the parameters I need when implemented for FPGAs, per Deschamps & Sutter: http://www.scielo.org.ar/pdf/laar/v37n1/v37n1a17.pdf My fallback if Montgomery (or some other residue system) doesn't work out is probably the reduction $mod\ 2^k-a$ approach described there. — Diego Maü, Mar 19 '18 at 15:33
Okay, so your points re: the $R$ factors make sense, and indeed, ensuring that each multiplication operation is paired with a $redc$ does work. What I'm confused by now though is: how can it be that I can use $redc(xR⋅yR)$ as a multiplicand of $zR$? This appears to be mixing terms in Montgomery form ($zR$) and not (any output of $redc$), which seems to be counter to the ground rules of working with residues in general? — Diego Maü, Mar 19 '18 at 16:51
Re: modular addition, it seems essential insofar as the algorithm in question involves solving a series of polynomials (hundreds of times for a single iteration). My hope was to be able to convert the inputs into Montgomery form once, do the necessary arithmetic, and then convert back to my normal basis at the end. Using Montgomery residues just for the multiplicative portions of the polynomials would be very inefficient, since I'd need to go back to Montgomery form (via $aR\ mod\ N$) on each iteration. — Diego Maü, Mar 19 '18 at 16:57

Modular arithmetic in Montgomery form

1 Answers1

What is the operation?

Using it for modular multiplication

(Not) using it for modular addition

Consider Barrett reduction as an alternative