----------------------------------------------------------------------------------
@MSGID: 1@dont-email.me> eb281512
@REPLY:
<c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com> 9fdb66b3
@REPLYADDR Terje Mathisen
<terje.mathisen@tmsw.no>
@REPLYTO 2:5075/128 Terje Mathisen
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID: 1@dont-email.me>
@RFC-References:
<57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com> 1@newsreader4.netcologne.de> 1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> 1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com> <f2fd635d-71e6-4757-877a-5bedb276afc0n@googlegroups.com>
<c2f2f9ca-0789-48b5-9047-024f69e2116cn@googlegroups.com>
@TZUTC: 0200
@PID: Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.17
@TID: FIDOGATE-5.12-ge4e8b94
robf...@gmail.com wrote:
> I have many more than 3 cycles for an iteration. An FMA takes
8 cycles and there are multiple per iteration.
> However, I should have looked at my micro-code more closely.
There is indeed no difference in between
> calculating out to 64 bit or 48 bits because of the number of
bits reached in each iteration.
>
> To get 48 bits an iteration faster would require a much more
accurate initial approximation which probably
> is not practical.
> // RSQRT initial approximation 0
> // y = y*(1.5f - xhalf *y*y); // first NR iteration9.16 bits accurate
What if I told you that you can get up to 1.7 more bits after the first
NR iteration? You use a slightly different magic number in the bit hack,
then you also modify the two constants in that first NR step: I.e. not
exactly 1.5 and 0.5 but modified to give a cheby style error
distribution over the (0.5 to 2.0) input range.
The result is about 10.8 bits!
> // y = y*(1.5f - xhalf *y*y); // second NR iteration 17.69 bits accurate
~19 bits
> // y = y*(1.5f - xhalf *y*y); // third NR iteration 35 bits accurate
~38 bits
> // y = y*(1.5f - xhalf *y*y); // fourth NR iteration 70 bits accurate
~75 bits
BTW, I independently came up with the idea to modify multiple constants
and got more than a a bit extra, then somebody tipped me off about a guy
from Poland who had done the full optimization of all three at the same
time and gotten half a bit more than me. :-)
Terje
--
-
"almost all programming can be viewed as an exercise in caching"
--- Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
* Origin: A noiseless patient Spider (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441