OVH Community, your new community space.

Monthly unexpected hard reboots (same day)


Kamilleri
24-08-2015, 12:09
Current uptime is 151 days :]
No reboots since then.

Thank you very much, Support Team! <3
I want to believe reasons for those mysterious hard reboots were solved.

@xvd
Many thanks for advices!
I probably use them if it would happen ever again.

Productive week for everybody!

xvd
30-03-2015, 22:14
Quote Originally Posted by Kamilleri
Share with me, how long was the maintenance, and what it was like? Sudden shutdown and engineer intervention, or email reply with exact date? I run some production stuff here and want to be prepared if it would be possible.
Few observations:

1. Using the online ticket system (in the control panel) lead nowhere - it would typically take 2-3 days to get an answer and mostly they do not bother reading the ticket. It is more like a robot replying with the next answer from the list. Altogether they asked me 3 times for the results of the hardware tests (run in rescue mode) and 4 times for the permission to do the intervention.

2. In the end the only way how to convince SYS to do anything was to call them and to insist that there is something wrong with the server even though their tests show otherwise (at that point I was 99% convinced that it was a hw problem).

3. I called them on Friday morning and I was promised that a motherboard will be replaced before the end of the business day. Few hours later I saw another ticket "Power Supply Replacement" in the control panel. So you will probably not get the exact hour when the intervention will take place. Moreover, you will have to confirm that all data is backed up. The fact that in the end they replaced the power supply rather than the motherboard makes me think that once somebody who knows what they are doing gets their hand on the server they are quick to determine what is really wrong and fix it ... but to get to that point was an ordeal.

4. I don't know how long the intervention took place but I think it was reasonably quick.

Good luck with your server!

Kamilleri
30-03-2015, 21:14
Quote Originally Posted by xvd
Kamilleri - FYI in my case it seems that replacing the power supply resolved the issue. Fingers crossed.
Wow! Glad to hear that!
Wish your SYS server would never go offline again. Thanks for reply, and thanks SYS support for attention to your ticket.

Share with me, how long was the maintenance, and what it was like? Sudden shutdown and engineer intervention, or email reply with exact date? I run some production stuff here and want to be prepared if it would be possible.

xvd
30-03-2015, 17:43
Kamilleri - FYI in my case it seems that replacing the power supply resolved the issue. Fingers crossed.

xvd
26-03-2015, 18:06
Quote Originally Posted by Kamilleri
How often do you experience them?
It is quite random. I was not able to find any pattern. Sometimes it is OK for few days and sometimes I get into an infinite loop of reboots. When I ping the server I get around 10 replies before there is another reboot. It lasts until somebody from SYS gets the email from the real time monitoring system. Then they do "something" which stops the loop. They never tell me what - they just send email with "Server on login, Ping ok, Services started".

I am quite convinced that it is hardware related. Especially when I run nearly identical servers which are fine.

Have you tried to contact the support? I did. Quite unsuccessfully though - according to them there is nothing wrong with the server and after that they ignore me.

Kamilleri
26-03-2015, 17:32
Quote Originally Posted by xvd
How is your server Kamilleri? Did turning off RTM work for you?
I don't know yet, waiting for next hard reboot.
But I'm almost sure that RTM wasn't the case, and I begin to think this is somethind hardware related (power supply or something related to broken motherboard/cpu/ram or compability issues). Run out of ideas.
No signs of software stop or ACPI command.

How often do you experience them?

xvd
26-03-2015, 17:03
How is your server Kamilleri? Did turning off RTM work for you?

Kamilleri
26-03-2015, 10:22
Quote Originally Posted by Careimages
Just a heads-up , before you disable the RTM process on your server you should switch off monitoring for your sever in the SYS control panel, otherwise the OVH software will think the lack of RTM data means the server has stopped responding and do an auto-reboot for you...
Woah! So thaats what "Monitoring" do. I thought it was related to RTM data (cpu/ram/swap graphs), so disabled it too. Actually this is making sense now, we were already rebooted previously (long time ago) because enabled firewall for ICMP. Ability to opt-out from it is nice, but can length hardware unavailability repairment process.
Anyway, thanks for your time.

Careimages
26-03-2015, 08:31
Just a heads-up , before you disable the RTM process on your server you should switch off monitoring for your sever in the SYS control panel, otherwise the OVH software will think the lack of RTM data means the server has stopped responding and do an auto-reboot for you...

Kamilleri
26-03-2015, 08:04
The only events happening before reboot in syslog is... RTM cron (real-time monitoring of OVH).
Don't know, trying to disable it.

Kamilleri
26-03-2015, 07:59
Ok, this is bongus.
Another reboot happened this morning, without any sign of software issue in syslog.
It's just happened. And that's all.

Very strange and annoying behavior.

Running another memtester session. Previous was OK, and this is making me worry:
If it's not an user/OS request, not a HDD failure (SMART is OK), not OOM, not a temperature issue, then... what could it be?
Power supply issue? Some kind of hardware non-easy-to-reproduce ghostly self-rebooting issue? That's a bummer.
I'm not ready to pay installation fee for another server to migrate.

Kamilleri
24-03-2015, 22:46
Doing memtester... hmm, looks like RAM is fine...

Code:
root@hv2:~# memtester 16000 1
memtester version 4.2.2 (64-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 16000MB (16777216000 bytes)
got  16000MB (16777216000 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Done.

Kamilleri
24-03-2015, 20:54
Quote Originally Posted by alvaroag
Searching around, there's some live memory tester:

https://blog.wpkg.org/2010/11/23/tes...run-memtest86/

Note that you may change the arguments according to your server's free memory.
Woah! That's something new.
I would definitely try this. Thanks for your time!

alvaroag
24-03-2015, 19:51
Searching around, there's some live memory tester:

https://blog.wpkg.org/2010/11/23/tes...run-memtest86/

Note that you may change the arguments according to your server's free memory.

Kamilleri
24-03-2015, 19:22
Quote Originally Posted by alvaroag
Do you have the possibility to run memtest86?
That would be a nice start. Sadly it's kinda production server, so I need to find some maintenance window for that.
Thanks for reply. Would hope it wouldn't reboot itself again, as I'm almost sure it's not a software issue.

alvaroag
24-03-2015, 14:28
Do you have the possibility to run memtest86?

Kamilleri
24-03-2015, 08:14
Before reboot, syslog.
When this happened, there was some iowait (hdd lagged)

Code:
Mar 24 07:42:01 hv2 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 24 07:42:01 hv2 kernel: ata1.00: failed command: FLUSH CACHE
Mar 24 07:42:01 hv2 kernel: ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 4
Mar 24 07:42:01 hv2 kernel:         res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 24 07:42:01 hv2 kernel: ata1.00: status: { DRDY }
Mar 24 07:42:01 hv2 kernel: ata1: hard resetting link
Mar 24 07:42:01 hv2 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 24 07:42:01 hv2 kernel: ata1.00: configured for UDMA/133
Mar 24 07:42:01 hv2 kernel: ata1.00: device reported invalid CHS sector 0
Mar 24 07:42:01 hv2 kernel: ata1: EH complete
After reboot:
Code:
Mar 24 08:14:25 hv2 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Mar 24 08:14:25 hv2 rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2656" x-info="http://www.rsyslog.com"] start
Mar 24 08:14:25 hv2 kernel: Initializing cgroup subsys cpuset
Mar 24 08:14:25 hv2 kernel: Initializing cgroup subsys cpu
Mar 24 08:14:25 hv2 kernel: Linux version 2.6.32-34-pve (root@lola) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 SMP Fri Dec 19 07:42:04 CET 2014
Mar 24 08:14:25 hv2 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32-34-pve root=/dev/md1 ro quiet
Mar 24 08:14:25 hv2 kernel: KERNEL supported cpus:
Mar 24 08:14:25 hv2 kernel: Intel GenuineIntel
Mar 24 08:14:25 hv2 kernel: AMD AuthenticAMD
Mar 24 08:14:25 hv2 kernel: Centaur CentaurHauls
Mar 24 08:14:25 hv2 kernel: BIOS-provided physical RAM map:
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000000100000 - 0000000020000000 (usable)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000020000000 - 0000000020200000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000020200000 - 0000000040000000 (usable)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000040000000 - 0000000040200000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000040200000 - 00000000da852000 (usable)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000da852000 - 00000000da8a0000 (ACPI NVS)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000da8a0000 - 00000000da8a8000 (ACPI data)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000da8a8000 - 00000000dabd6000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000dabd6000 - 00000000dabe4000 (ACPI NVS)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000dabe4000 - 00000000dac0c000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000dac0c000 - 00000000dac4f000 (ACPI NVS)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000dac4f000 - 00000000dae8a000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000dae8a000 - 00000000db000000 (usable)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000db800000 - 00000000dfa00000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000fed1c000 - 00000000fed20000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
Mar 24 08:14:25 hv2 kernel: BIOS-e820: 0000000100000000 - 000000081f600000 (usable)
Mar 24 08:14:25 hv2 kernel: DMI 2.7 present.
Mar 24 08:14:25 hv2 kernel: SMBIOS version 2.7 @ 0xF0480
Mar 24 08:14:25 hv2 kernel: DMI: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
Mar 24 08:14:25 hv2 kernel: e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
Mar 24 08:14:25 hv2 kernel: e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
Mar 24 08:14:25 hv2 kernel: last_pfn = 0x81f600 max_arch_pfn = 0x400000000
Mar 24 08:14:25 hv2 kernel: MTRR default type: uncachable
Mar 24 08:14:25 hv2 kernel: MTRR fixed ranges enabled:
Mar 24 08:14:25 hv2 kernel: 00000-9FFFF write-back
Mar 24 08:14:25 hv2 kernel: A0000-BFFFF uncachable
Mar 24 08:14:25 hv2 kernel: C0000-CFFFF write-protect
Mar 24 08:14:25 hv2 kernel: D0000-E7FFF uncachable
Mar 24 08:14:25 hv2 kernel: E8000-FFFFF write-protect
Mar 24 08:14:25 hv2 kernel: MTRR variable ranges enabled:
Mar 24 08:14:25 hv2 kernel: 0 base 000000000 mask 800000000 write-back
Mar 24 08:14:25 hv2 kernel: 1 base 800000000 mask FE0000000 write-back
Mar 24 08:14:25 hv2 kernel: 2 base 0DB800000 mask FFF800000 uncachable
Mar 24 08:14:25 hv2 kernel: 3 base 0DC000000 mask FFC000000 uncachable
Mar 24 08:14:25 hv2 kernel: 4 base 0E0000000 mask FE0000000 uncachable
Mar 24 08:14:25 hv2 kernel: 5 base 81F600000 mask FFFE00000 uncachable
Mar 24 08:14:25 hv2 kernel: 6 base 81F800000 mask FFF800000 uncachable
Mar 24 08:14:25 hv2 kernel: 7 disabled
Mar 24 08:14:25 hv2 kernel: 8 disabled
Mar 24 08:14:25 hv2 kernel: 9 disabled
Mar 24 08:14:25 hv2 kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Mar 24 08:14:25 hv2 kernel: original variable MTRRs
Mar 24 08:14:25 hv2 kernel: reg 0, base: 0GB, range: 32GB, type WB
Mar 24 08:14:25 hv2 kernel: reg 1, base: 32GB, range: 512MB, type WB
Mar 24 08:14:25 hv2 kernel: reg 2, base: 3512MB, range: 8MB, type UC
Mar 24 08:14:25 hv2 kernel: reg 3, base: 3520MB, range: 64MB, type UC
Mar 24 08:14:25 hv2 kernel: reg 4, base: 3584MB, range: 512MB, type UC
Mar 24 08:14:25 hv2 kernel: reg 5, base: 33270MB, range: 2MB, type UC
Mar 24 08:14:25 hv2 kernel: reg 6, base: 33272MB, range: 8MB, type UC
Mar 24 08:14:25 hv2 kernel: total RAM covered: 32686M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 64K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 128K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 256K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 512K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 1M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 2M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 64K chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 128M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 256M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 512M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 64K chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 128K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 256K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 512K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 1M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 2M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 128K chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 128M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 256M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 512M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 128K chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Mar 24 08:14:25 hv2 kernel: gran_size: 256K chunk_size: 256K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 256K chunk_size: 512K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 256K chunk_size: 1M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 256K chunk_size: 2M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 256K chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 256K chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 128M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 256M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 512M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 256K chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Mar 24 08:14:25 hv2 kernel: gran_size: 512K chunk_size: 512K num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 512K chunk_size: 1M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 512K chunk_size: 2M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 512K chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 512K chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 128M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 256M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 512M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 512K chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Mar 24 08:14:25 hv2 kernel: gran_size: 1M chunk_size: 1M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 1M chunk_size: 2M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 1M chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 1M chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 128M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 256M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 512M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 1G num_reg: 10 lose cover RAM: -512M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 1M chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Mar 24 08:14:25 hv2 kernel: gran_size: 2M chunk_size: 2M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 2M chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 2M chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 128M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 256M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 512M num_reg: 10 lose cover RAM: -8M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 1G num_reg: 10 lose cover RAM: -512M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 2M chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Mar 24 08:14:25 hv2 kernel: gran_size: 4M chunk_size: 4M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 4M chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 16M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 128M num_reg: 10 lose cover RAM: -6M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 256M num_reg: 10 lose cover RAM: -6M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 512M num_reg: 10 lose cover RAM: -6M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 1G num_reg: 10 lose cover RAM: -510M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 4M chunk_size: 2G num_reg: 10 lose cover RAM: -1534M
Mar 24 08:14:25 hv2 kernel: gran_size: 8M chunk_size: 8M num_reg: 10 lose cover RAM: 502M
Mar 24 08:14:25 hv2 kernel: gran_size: 8M chunk_size: 16M num_reg: 10 lose cover RAM: 246M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 8M chunk_size: 32M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 8M chunk_size: 64M num_reg: 10 lose cover RAM: -10M
Mar 24 08:14:25 hv2 kernel: gran_size: 8M chunk_size: 128M num_reg: 10 lose cover RAM: 6M
Mar 24 08:14:25 hv2 kernel: gran_size: 8M chunk_size: 256M num_reg: 10 lose cover RAM: 6M
Mar 24 08:14:25 hv2 kernel: gran_size: 8M chunk_size: 512M num_reg: 10 lose cover RAM: 6M
Mar 24 08:14:25 hv2 kernel: gran_size: 8M chunk_size: 1G num_reg: 10 lose cover RAM: 6M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 8M chunk_size: 2G num_reg: 10 lose cover RAM: -1018M
Mar 24 08:14:25 hv2 kernel: gran_size: 16M chunk_size: 16M num_reg: 10 lose cover RAM: 254M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 16M chunk_size: 32M num_reg: 10 lose cover RAM: -2M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 16M chunk_size: 64M num_reg: 10 lose cover RAM: -2M
Mar 24 08:14:25 hv2 kernel: gran_size: 16M chunk_size: 128M num_reg: 10 lose cover RAM: 14M
Mar 24 08:14:25 hv2 kernel: gran_size: 16M chunk_size: 256M num_reg: 10 lose cover RAM: 14M
Mar 24 08:14:25 hv2 kernel: gran_size: 16M chunk_size: 512M num_reg: 10 lose cover RAM: 14M
Mar 24 08:14:25 hv2 kernel: gran_size: 16M chunk_size: 1G num_reg: 10 lose cover RAM: 14M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 16M chunk_size: 2G num_reg: 10 lose cover RAM: -1010M
Mar 24 08:14:25 hv2 kernel: gran_size: 32M chunk_size: 32M num_reg: 10 lose cover RAM: 142M
Mar 24 08:14:25 hv2 kernel: gran_size: 32M chunk_size: 64M num_reg: 10 lose cover RAM: 14M
Mar 24 08:14:25 hv2 kernel: gran_size: 32M chunk_size: 128M num_reg: 10 lose cover RAM: 46M
Mar 24 08:14:25 hv2 kernel: gran_size: 32M chunk_size: 256M num_reg: 10 lose cover RAM: 46M
Mar 24 08:14:25 hv2 kernel: gran_size: 32M chunk_size: 512M num_reg: 10 lose cover RAM: 46M
Mar 24 08:14:25 hv2 kernel: gran_size: 32M chunk_size: 1G num_reg: 10 lose cover RAM: 46M
Mar 24 08:14:25 hv2 kernel: *BAD*gran_size: 32M chunk_size: 2G num_reg: 10 lose cover RAM: -978M
Mar 24 08:14:25 hv2 kernel: gran_size: 64M chunk_size: 64M num_reg: 10 lose cover RAM: 110M
Mar 24 08:14:25 hv2 kernel: gran_size: 64M chunk_size: 128M num_reg: 9 lose cover RAM: 110M
Mar 24 08:14:25 hv2 kernel: gran_size: 64M chunk_size: 256M num_reg: 9 lose cover RAM: 110M
Mar 24 08:14:25 hv2 kernel: gran_size: 64M chunk_size: 512M num_reg: 9 lose cover RAM: 110M
Mar 24 08:14:25 hv2 kernel: gran_size: 64M chunk_size: 1G num_reg: 9 lose cover RAM: 110M
Mar 24 08:14:25 hv2 kernel: gran_size: 64M chunk_size: 2G num_reg: 10 lose cover RAM: 110M
Mar 24 08:14:25 hv2 kernel: gran_size: 128M chunk_size: 128M num_reg: 9 lose cover RAM: 174M
Mar 24 08:14:25 hv2 kernel: gran_size: 128M chunk_size: 256M num_reg: 9 lose cover RAM: 174M
Mar 24 08:14:25 hv2 kernel: gran_size: 128M chunk_size: 512M num_reg: 9 lose cover RAM: 174M
Mar 24 08:14:25 hv2 kernel: gran_size: 128M chunk_size: 1G num_reg: 9 lose cover RAM: 174M
Mar 24 08:14:25 hv2 kernel: gran_size: 128M chunk_size: 2G num_reg: 10 lose cover RAM: 174M
Mar 24 08:14:25 hv2 kernel: gran_size: 256M chunk_size: 256M num_reg: 7 lose cover RAM: 430M
Mar 24 08:14:25 hv2 kernel: gran_size: 256M chunk_size: 512M num_reg: 9 lose cover RAM: 430M
Mar 24 08:14:25 hv2 kernel: gran_size: 256M chunk_size: 1G num_reg: 9 lose cover RAM: 430M
Mar 24 08:14:25 hv2 kernel: gran_size: 256M chunk_size: 2G num_reg: 10 lose cover RAM: 430M
Mar 24 08:14:25 hv2 kernel: gran_size: 512M chunk_size: 512M num_reg: 5 lose cover RAM: 942M
Mar 24 08:14:25 hv2 kernel: gran_size: 512M chunk_size: 1G num_reg: 5 lose cover RAM: 942M
Mar 24 08:14:25 hv2 kernel: gran_size: 512M chunk_size: 2G num_reg: 5 lose cover RAM: 942M
Mar 24 08:14:25 hv2 kernel: gran_size: 1G chunk_size: 1G num_reg: 5 lose cover RAM: 942M
Mar 24 08:14:25 hv2 kernel: gran_size: 1G chunk_size: 2G num_reg: 5 lose cover RAM: 942M
Mar 24 08:14:25 hv2 kernel: gran_size: 2G chunk_size: 2G num_reg: 4 lose cover RAM: 1966M
Mar 24 08:14:25 hv2 kernel: mtrr_cleanup: can not find optimal value
Mar 24 08:14:25 hv2 kernel: please specify mtrr_gran_size/mtrr_chunk_size
then it's continued to load

Kamilleri
24-03-2015, 06:30
Did someone experienced unexpected cold reboots of Proxmox/Linux SYS dedication, happening almost in the same day every month?

Code:
root@hv2:~# last -x
runlevel (to lvl 2)   2.6.32-34-pve    Tue Mar 24 08:14 - 09:16  (01:01)
reboot   system boot  2.6.32-34-pve    Tue Mar 24 08:14 - 09:16  (01:01)
runlevel (to lvl 2)   2.6.32-34-pve    Mon Mar 23 08:47 - 08:14  (23:26)
reboot   system boot  2.6.32-34-pve    Mon Mar 23 08:47 - 09:16 (1+00:28)
Both of those wasn't issued by me or my systems,
and the bootlog is starting without any trace of reboot command.

Trying to find a reason for that.