Golded

- COMP.ARCH---------------- < Пред. | След. > -- < @ > -- < Сообщ. > -- < Эхи > --

Nп/п : 3 из 100

От : Terje Mathisen 2:5075/128 23 сен 23 09:03:18

К : Robf 23 сен 23 10:05:02

Тема : Re: Solving the Floating-Point Conundrum

----------------------------------------------------------------------------------

@MSGID: 1@dont-email.me> 48d537e7
@REPLY:
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com> aaeb379e
@REPLYADDR Terje Mathisen
<terje.mathisen@tmsw.no>
@REPLYTO 2:5075/128 Terje Mathisen
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID: 1@dont-email.me>
@RFC-References:
<57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <8a5563da-3be8-40f7-bfb9-39eb5e889c8an@googlegroups.com>
<f097448b-e691-424b-b121-eab931c61d87n@googlegroups.com> 1@newsreader4.netcologne.de> 1@gal.iecc.com>
<9f5be6c2-afb2-452b-bd54-314fa5bed589n@googlegroups.com> 1@newsreader4.netcologne.de>
<deeae38d-da7a-4495-9558-f73a9f615f02n@googlegroups.com> <9141df99-f363-4d64-9ce3-3d3aaf0f5f40n@googlegroups.com>
<78cd4ff6-d715-4886-950d-cb1a8d3c6654n@googlegroups.com>
@TZUTC: 0200
@PID: Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.17
@TID: FIDOGATE-5.12-ge4e8b94
robf...@gmail.com wrote:
> On Friday, September 22, 2023 at 10:26:38???AM UTC-4, MitchAlsup wrote:
>> One builds FP calculation resources as big as longest container
needed at full throughput.
>> In a 64-bit machine, this is one with a 11-bit exponent and a
52-bit fraction.
>> On such a machine, the latency is set by the calculations on
this sized number.
>> AND
>> Smaller width numbers do not save any cycles.
>> <
>> So, the only advantage one has with 48-bit, ... numbers is memory footprint.
>> There is NO (nada, zero, zilch) advantage in calculation latency.
>> <
> Does that include complicated calculations too? What about trig
> functions, square root, or other iterative functions? As I have
> implemented reciprocal square root in micro-code it takes longer for
> greater precision. Makes me think there is some benefit to supporting
> varying precisions.
This is easy to verify: Lookup the latency for both 32 and 64-bit
versions of the function you are interested in!

If they differ by less then 25%, then anything intermediate really
doesn`t make sense for a HW op.

In software you can more easily play tricks like the infamous InvSqrt()
function of Quake III fame, where just 10 bits was sufficient to make
lighting calculations look OK. Today you can do the same, using the
approximate reciprocal square root vector op, and simply skip all the
normal NR stages to follow.

Terje

--
-
"almost all programming can be viewed as an exercise in caching"
--- Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17
* Origin: A noiseless patient Spider (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441

GoldED+ VK │ │ 09:55:30