Nп/п : 35 из 100
От : Stefan Monnier 2:5075/128 25 сен 23 14:11:54
К : BGB 25 сен 23 21:17:03
Тема : Re: Solving the Floating-Point Conundrum
----------------------------------------------------------------------------------
@MSGID: <jwvil7yces1.fsf-monnier+comp.arch@gnu.org>
beab195e
@REPLY: 2@dont-email.me> 115839a6
@REPLYADDR Stefan Monnier
<monnier@iro.umontreal.ca>
@REPLYTO 2:5075/128 Stefan Monnier
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID:
<jwvil7yces1.fsf-monnier+comp.arch@gnu.org>
<
57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com>
-3ce5e24aec0cn@googlegroups.com><5fa92a78-d27c-4dff-a3dc-35ee7b43cbfan@googlegro
ups.com><c9131381-2e9b-4008-bc43-d4df4d4d8ab4n@googlegroups.com>
4b4-ae81-5ab1ef234f8en@googlegroups.com><43901a10-4859-43d7-b500-70030047c8b2n@g
ooglegroups.com><jwvzg1acja6.fsf-monnier+comp.arch@gnu.org>
email.me>
@TZUTC: -0400
@PID: Gnus/5.13 (Gnus v5.13)
@TID: FIDOGATE-5.12-ge4e8b94
> I am now evaluating the possible use of a 48-bit floating-point format, but
> this is (merely) in terms of memory storage (in registers, it will still use
> Binary64).
I suspect this is indeed the only sane way to go about it.
Also, I suspect that such 48bit floats would only be worthwhile when you
have some large vectors/matrices and care about the 33% bandwidth
overhead of using 64bit rather than 48bit. So maybe the focus should be
on "load 3 chunks, then spread turn them into 4" since the limiting
factor would presumably be the memory bandwidth.
E.g. load 3 chunks (C1, C2, and C3) of 256bits each using standard SIMD
load, and then add an instruction to turn C1+C2 into two 256bit vectors
of 4x64bit floats, and another to do the same with C2+C3 (basically, the
same instruction except it uses the other half of the bits of C2).
Stefan
--- Gnus/5.13 (Gnus v5.13)
* Origin: A noiseless patient Spider (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441