Golded

- COMP.OS.LINUX.MISC------- < Пред. | След. > -- < @ > -- < Сообщ. > -- < Эхи > --

Nп/п : 22 из 100

От : Theo 2:5075/128 15 сен 23 11:58:12

К : The Natural Philosopher 15 сен 23 14:00:02

Тема : Re: Weird code crash

----------------------------------------------------------------------------------

@MSGID: X4qqz@news.chiark.greenend.org.uk>
a553293e
@REPLY: 1@dont-email.me> f9a7c029
@REPLYADDR Theo
<theom+news@chiark.greenend.org.uk>
@REPLYTO 2:5075/128 Theo
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID:
X4qqz@news.chiark.greenend.org.uk>
@RFC-References: 1@dont-email.me>
53kqz@news.chiark.greenend.org.uk> <wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> 1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> 1@dont-email.me>
@TZUTC: 0100
@PID: tin/1.8.3-20070201 ("Scotasay") (UNIX)
(Linux/5.10.0-22-amd64 (x86_64))
@TID: FIDOGATE-5.12-ge4e8b94
In comp.sys.raspberry-pi The Natural Philosopher <tnp@invalid.invalid> wrote:
> Tell me in what way a corrupted - say - libc file, or a faulty bit of
> memory would show up in the kernel logs?

Well, it could be a cosmic ray.  The Pi doesn`t have ECC memory to it`s
possible to bit-flip in RAM or storage without it noticing.  I don`t know
which part of the galaxy you inhabit, but cosmic rays are rare enough down
here that random bit flips like this don`t happen often - ballpark once a
year for a server (which has a much greater surface area to absorb them than
a Pi).

It is also possible to be marginal on signal integrity for PCB interconnect,
but that would mostly be a design fault: either they all work or many of
them don`t.  Since we don`t have a lot of people complaining of the same
problem, we can assume the design is not marginal in that respect.

If computers were that unreliable they would be failing all the time - and
we`d fit ECC to everything.  That they aren`t suggests bit-flip corruption
isn`t a problem.  In general random bit-flip errors are not a statistically
major source of crashes, unless you`re running a hyper-redundant mainframe
and have eliminated all the other sources.

What are a well-known class of bugs are concurrency/timing races and memory
safety violations.  Which is odds-on what`s happening here, especially given
we`ve already picked up on potentially risky code like failing to check for
NULL from fopen().

> And that means thousands of faultless iterations in a day.
>
> So this bug ( if it is a bug) is a one in a million or worse.
>
> I suppose I could make the thing loop ten times a second (or even
> faster) and see if it happens more often..

That would be a useful thing to try.

> its not as though its chewing up CPU...
>
> The problem I have is that these crashes only recently started
> happening: prior to that the code ran for days. And two things happened,
> a massive brownout, and then a full power cut, and I trod on it.

Most of those things would cause it to fail hard (ie not power up), rather
than have a very rare random fault.

> And I made systemd start it...

It is possible that being run from systemd changes the timing or environment
that provokes the fault in some way, but I doubt it would be the cause of
the fault.

Theo
--- tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
* Origin: University of Cambridge, England (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441

GoldED+ VK │ │ 09:55:30