----------------------------------------------------------------------------------
@MSGID: 1@dont-email.me> f9a7c029
@REPLY: <wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk>
d6e88f12
@REPLYADDR The Natural Philosopher
<tnp@invalid.invalid>
@REPLYTO 2:5075/128 The Natural Philosopher
@CHRS: CP866 2
@RFC: 1 0
@RFC-Message-ID: 1@dont-email.me>
@RFC-References: 1@dont-email.me>
53kqz@news.chiark.greenend.org.uk> <wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> 1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk>
@TZUTC: 0100
@PID: Mozilla/5.0 (X11; Linux x86_64; rv:102.0)
Gecko/20100101 Thunderbird/102.15.1
@TID: FIDOGATE-5.12-ge4e8b94
On 15/09/2023 08:30, Richard Kettlewell wrote:
> The Natural Philosopher <
tnp@invalid.invalid> writes:
>> On 14/09/2023 09:23, Richard Kettlewell wrote:
>>> Also:
>>> * I would also have a look at the kernel log; if it`s a
>>> kernel-generated signal then there`s usually a log message about it.
>>>
>> Nothing in kern.log after the boot process finishes.
>
> Most likely a bug in your program then.
>
>>> * Run the application under valgrind; depending what the issue is, that
>>> will provide a backtrace and perhaps more detailed information. If it
>>> is a memory corruption issue then it may identify where the corruption
>>> happens, rather than the later point where malloc failed a consistency
>>> check (or whatever it is).
>>>
>>> Using valgrind (and/or compiler sanitizer features) is a good idea
>>> even before running into trouble, really.
>>
>> The strange thing is that it failed once after a minute, then I
>> rebooted and it failed after 20 minutes, and its been running several
>> days now with no issues at all.
>>
>> I am not sure valgrind would actually help unless it failed.
>
> It`s extremely good at identifying memory corruption even in cases where
> that doesn`t immediately lead to a crash; that`s what it`s for. But if
> it doesn`t, you leave it running until the crash happens.
>
Well that is an option for sure.
> Up to you, of course, whether you use the tools available, or debug with
> one hand tied behind your back.
>
Tell me in what way a corrupted - say - libc file, or a faulty bit of
memory would show up in the kernel logs?
The problem is that this thing is looping very frequently.
loop()
{
while (1)
{
int i;
readThermometers();
readZones();
readOverrides();
readTimerData();
setRelayState();
setRelays();
usleep (1120000);
}
}
And that means thousands of faultless iterations in a day.
So this bug ( if it is a bug) is a one in a million or worse.
I suppose I could make the thing loop ten times a second (or even
faster) and see if it happens more often..
its not as though its chewing up CPU...
The problem I have is that these crashes only recently started
happening: prior to that the code ran for days. And two things happened,
a massive brownout, and then a full power cut, and I trod on it.
And I made systemd start it...
I see it crashed again last night, again with zero errors apart from
SIGABRT...
I will start it manually and cut systemd out.
--
The lifetime of any political organisation is about three years before
its been subverted by the people it tried to warn you about.
Anon.
--- Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
* Origin: A little, after lunch (2:5075/128)
SEEN-BY: 5001/100 5005/49 5015/255 5019/40 5020/715
848 1042 4441 12000
SEEN-BY: 5030/49 1081 5058/104 5075/128
@PATH: 5075/128 5020/1042 4441