Golded

- COMP.ARCH---------------- < Пред. | След. > -- < @ > -- < Сообщ. > -- < Эхи > --

Nп/п : 75 из 100

От : BGB 2:5075/128 29 сен 23 21:58:24

К : MitchAlsup 29 сен 23 06:01:01

Тема : Re: Misc: Another (possible) way to more MHz...

----------------------------------------------------------------------------------

@MSGID: 1@dont-email.me> cb4b15d8
@REPLY:
<0f60c2f2-f44d-408b-806b-609aba926f03n@googlegroups.com> 7c816197
@REPLYADDR BGB <cr88192@gmail.com>
@REPLYTO 2:5075/128 BGB
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID: 1@dont-email.me>
@RFC-References: 1@dont-email.me>
<6zCRM.67038$fUu6.58754@fx47.iad> 1@dont-email.me>
<0f60c2f2-f44d-408b-806b-609aba926f03n@googlegroups.com>
@TZUTC: -0500
@PID: Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:102.0) Gecko/20100101 Thunderbird/102.15.1
@TID: FIDOGATE-5.12-ge4e8b94
On 9/29/2023 6:58 PM, MitchAlsup wrote:
> On Friday, September 29, 2023 at 12:07:47 PM UTC-5, BGB wrote:
>> On 9/29/2023 11:02 AM, EricP wrote:
>>>
>>>
>> For running stats from a running full simulation (predates to these
>> tweaks, running GLQuake with the HW rasterizer):
>> ~ 0.48 .. 0.54 bundles clock;
>> ~ 1.10 .. 1.40 instructions/bundle.
> <
> So, about equal to the 1-wide 1st generation RISC machines, which got
> 0.7 I/C {including cache misses, delay slots, interlocks, TLB misses.}
>>
>> Seems to be averaging around 29..32 MIPs at 50MHz (so, ~ 0.604 MIPs/MHz).
>>
> Probably good for a 1-wide, not so good for a 3-wide.

Plain 1-wide operation is still a little worse here...

As noted, in practice, it is only averaging around 1.1 to 1.4
instructions per bundle, which (in general) means it is mostly running
1-wide code with the occasional WEX`ed instruction glued on.

It is generally easier to make more effective use of 2 or 3 wide
operation in ASM, but not so much from my C compiler output (unless the
C code is written in a way that allows the compiler to make more
effective use of it).

The 3rd lane is infrequently used in practice, so mostly just exists as
an excuse to have a 6R3W register file (mostly useful for 128-bit SIMD
ops and similar).

For a considered "GPU Mode" profile, had considered dropping to 2-lane
with a 6R2W register file, but the savings from dropping the 3rd lane
were fairly small (where, as-is, the 3rd lane only really does CONV and
ALU ops; and optionally can do integer shift ops as well). Had also
considered making this profile XG2-only.

Couldn`t get it cheap enough to be worthwhile, and had since
(eventually) ended up writing a fixed-function rasterizer module instead
(which gets better performance and only needs around 1/6 as many LUTs as
a CPU core).

Granted, a fixed-function module can`t run fragment shaders (could in
theory still run vertex shaders as this part still runs on the CPU;
maybe using the CPU for fragment shaders, if I ever get around to a GLSL
compiler).

...

--- Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
* Origin: A noiseless patient Spider (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441

GoldED+ VK │ │ 09:55:30