----------------------------------------------------------------------------------
@MSGID: <2023Sep29.095547@mips.complang.tuwien.ac.at>
13549f04
@REPLY: <8734z3auj4.fsf@localhost> 4af9d243
@REPLYADDR Anton Ertl
<anton@mips.complang.tuwien.ac.at>
@REPLYTO 2:5075/128 Anton Ertl
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID:
<2023Sep29.095547@mips.complang.tuwien.ac.at>
@RFC-References:
<57c5e077-ac71-486c-8afa-edd6802cf6b1n@googlegroups.com> <2023Sep23.123024@mips.complang.tuwien.ac.at> 2@gal.iecc.com>
<09798d75-4962-47b8-8816-d554d201a522n@googlegroups.com> 1@gal.iecc.com> 1@newsreader4.netcologne.de>
<e2196cd2-5268-4b04-b7ad-19b5b2a4cb8fn@googlegroups.com> <8734z3auj4.fsf@localhost>
@TZUTC: 0000
@TID: FIDOGATE-5.12-ge4e8b94
Lynn Wheeler <
lynn@garlic.com> writes:
>they other thing part of z10->z196 was claim that at least half of the
>per processor thruoughput increase was introduction of out-of-order
>execution, branch prediction, etc
Speculative execution and OoO can provide very big speedups, but it
depends on benchmarks. E.g., the fastest in-order CPU on the LaTeX
benchmark is the Cortex-A55; on the LaTeX benchmark it is slower than
a Cortex-A76 (on the same RK3588 SoC) by a factor 3.6 and slower than
Firestorm (on Apple M1) by a factor 7.8.
OTOH, I expect that for dense matrix multiplication one can get the
same throughput with in-order cores as with out-of-order cores, as
long as the resources are the same.
- anton
--
`Anyone trying for "industrial quality" ISA should avoid undefined behavior.`
Mitch Alsup, <
c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- xrn 10.11
* Origin: Institut fuer Computersprachen, Technische Universitaet (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441