https://github.com/dalek-cryptography/curve25519-dalek/blob/master/src/backend/serial/u64/scalar.rs 中的montgomery_reduce()中的算法细节和标准的算法细节有所差异,进一步减少了内部算法的位数要求。
1. montgomery_reduction标准算法及举例
INPUT: integers m = (mn-1…m1m0)b with gcd(m,b) = 1, R= bn, m’ = -m-1 mod b, and T = (t2n-1…t1t0)b < mR.
OUTPUT: TR-1 mod m.
1. A ← T. (Notation: A = (a2n-1…a1a0)b)
2. For i from 0 to (n-1) do the following:
2.1 ui ← aim’ mod b.
2.2 A ← A + uimbi.
3. A ← A/bn.
4. if A >= m then A ← A - m.
5. Return(A).
注意,其中的m’与m 的模运算关系是b,而不是R,其中R=bn。
举例:
假设 m = 72639, b = 10, R = 105, 且 T = 7118368.
则有 n = 5,m’ = −m−1 mod 10 = 1, T mod m = 72385, 且 TR−1 mod m = 39796.
对应的上述算法执行细节为:
| i | ui=aim’ mod b | uimbi | A |
|---|---|---|---|
| - | - | - | 7118368 |
| 0 | 8 | 581112 | 7699480 |
| 1 | 8 | 5811120 | 13510600 |
| 2 | 6 | 43583400 | 57094000 |
| 3 | 4 | 290556000 | 347650000 |
| 4 | 5 | 3631950000 | 3979600000 |
A/bn mod m =39796
与预期相符。
2. montgomery_reduction改进算法
在上述算法中,在step2.2中,用于存储数据A的位数会膨胀。
基于以下事实:
m = (mn-1…m1m0)b = mn-1bn-1 + … + m1b1 + m0
T = (t2n-1…t1t0)b = t2n-1b2n-1 + … + t1b1 + t0
A = (a2n-1…a1a0)b = a2n-1b2n-1 + … + a1b1 + a0
而 y*bx mod b = 0, step2.1/2.2/3均可优化!
仍然基于上述算法进行如下改进可知:
INPUT: integers m = (mn-1…m1m0)b with gcd(m,b) = 1, R= bn, m’ = -m-1 mod b, and T = (t2n-1…t1t0)b < mR.
OUTPUT: TR-1 mod m.
1. A ← T. (Notation: A = (a2n-1…a1a0)b)
2. carry ← 0.
3. For i from 0 to (n-1) do the following:
3.1
a
i
=
c
a
r
r
y
+
t
i
+
∑
k
=
0
i
−
1
,
p
=
i
−
k
,
0
=
<
p
<
n
u
k
∗
m
p
\ a_i = carry + t_i + \sum_{k=0}^{i-1, p=i-k, 0 =< p < n}{u_k*m_p}
ai=carry+ti+k=0∑i−1,p=i−k,0=
