OVH Community, your new community space.

Realtek NIC 8168B crashing


rickyday
05-10-2013, 00:06
Quote Originally Posted by NeddySeagoon
Phixion,

Good luck. I'm in the market for an mSP when OVH gets less like Monty Pythons cheese shop. Oh, I don't want your old server as I'll be running Linux on it
Thanks for mentioning that sketch in another thread as well, I did enjoy watching that on Youtube it did make me chuckle!

Phixion
04-10-2013, 21:55
I have the same server, they just changed the Mobo for me.

So don't worry my gimped server isn't out there ready for some poor bugger to buy :P

NeddySeagoon
04-10-2013, 21:31
Phixion,

Good luck. I'm in the market for an mSP when OVH gets less like Monty Pythons cheese shop. Oh, I don't want your old server as I'll be running Linux on it

Phixion
04-10-2013, 21:23
Thankfully my server now has an Intel motherboard and Intel NIC...

I've just reinstalled and hopefully won't have any more issues!

rickyday
04-10-2013, 21:09
I never thought I would see the day where a Windows based mSP was more stable than a Linux based machine.

Well today is that day.

This is shocking tbh and amazed OVH didnt pick this up during their testing of these new servers.

Phixion
30-09-2013, 20:59
Quote Originally Posted by Myatu
That's ridiculous I know this isn't the solution you're looking for, but have you tried setting the port speed to a lower speed - see how well it holds up then?
No, and no offence but I'm not really looking for work arounds now, just a fix!

This NIC is broken with Linux, it doesn't work. I need a new one.

Sadly, the technician handling the incident thinks it's because my server has been "attacked" for the last 2 months causing these errors.

It's rather annoying trying to convince them that the NIC is at fault here. I really don't know what else to do other than requesting a new NIC/Mobo over and over again.

Myatu
30-09-2013, 15:58
That's ridiculous I know this isn't the solution you're looking for, but have you tried setting the port speed to a lower speed - see how well it holds up then?

Phixion
30-09-2013, 15:46
Well, new ticket open...

cat /var/log/syslog

Sep 30 02:58:17 ns307877 kernel: r8168: eth0: link down
Sep 30 02:58:21 ns307877 kernel: r8168: eth0: link up
Sep 30 03:00:15 ns307877 kernel: r8168: eth0: link down
Sep 30 03:00:19 ns307877 kernel: r8168: eth0: link up

Fun times!

Phixion
30-09-2013, 00:20
You shouldn't need to order new servers to get a broken one fixed... But I'm sure I don't need to tell you that!

Only reason mine is working is this OVH Kernel, if I decide to use Proxmox or whatever in the future I'm screwed.

Doesn't seem to be working now, random freezes where it shows my network card has gone down. Requesting a hardware change tomorrow.

LinuxGam
28-09-2013, 16:07
Quote Originally Posted by raxxeh
Had another crash today after 12 lucky days after a ram replacement...

Requesting a credit for 2 months now, as this is an issue I cannot resolve with a kernel change.

I use proxmox.


that MTRR problem won't be related to the issue we're having Gam. OVH need to put a PCI nic in for people like us.
I had a new server ordered at RBX but got fed up of waiting so moved it to SBG as my server was crashing 5+ times a day!

However, my new server on 500mb/s connection has slower upload and download speeds than my mSP and certainly not even close to 500mb/s so I am pretty pissed off. Maybe SBG bandwidth is oversubscribed?

raxxeh
28-09-2013, 15:04
Quote Originally Posted by LinuxGam
Don't know if all these lines where causing my problem as early days, but not correctly mapping RAM is never gonna be good for an OS.

I have applied the fix in this article

http://my-fuzzy-logic.de/blog/index....-problems.html

It has completed cleaned this up and allocated all the RAM correctly.

Here's hoping that my problem now vanish :-)
Had another crash today after 12 lucky days after a ram replacement...

Requesting a credit for 2 months now, as this is an issue I cannot resolve with a kernel change.

I use proxmox.


that MTRR problem won't be related to the issue we're having Gam. OVH need to put a PCI nic in for people like us.

NeddySeagoon
25-09-2013, 23:11
LinuxGam,

Its time to unplug the mains lead from the server and fit a new server to it

LinuxGam
25-09-2013, 21:21
Quote Originally Posted by LinuxGam
Could this be causing an error, it shows up in the SysLog on boot many times


Sep 25 20:14:26 ns3362699 kernel: gran_size: 256K chunk_size: 2M num_reg: 10 lose cover RAM: 254M
Sep 25 20:14:26 ns3362699 kernel: *BAD*gran_size: 256K chunk_size: 4M num_reg: 10 lose cover RAM: -2M
Sep 25 20:14:26 ns3362699 kernel: *BAD*gran_size: 256K
Don't know if all these lines where causing my problem as early days, but not correctly mapping RAM is never gonna be good for an OS.

I have applied the fix in this article

http://my-fuzzy-logic.de/blog/index....-problems.html

It has completed cleaned this up and allocated all the RAM correctly.

Here's hoping that my problem now vanish :-)

LinuxGam
25-09-2013, 21:08
Ok. They are swapping the power supply now, fingers crossed and at least its more now than just reboot for the 14th time to fix the issue ;-)

LinuxGam
25-09-2013, 21:03
I'm not convinced my issue is even the network card as I successfully installed the latest 8168 module. Do they use a stock ProxMox setup or do they adjust the kernel as I did do a dist-upgrade (proxmox 3.1), but am sure it was crashing before that point.

LinuxGam
25-09-2013, 20:59
Quote Originally Posted by Phixion
LinuxGam are you using the OVH r8168 Kernel?
ProxMox 3 Install

Phixion
25-09-2013, 20:51
LinuxGam are you using the OVH r8168 Kernel?

LinuxGam
25-09-2013, 20:31
Could this be causing an error, it shows up in the SysLog on boot many times


Sep 25 20:14:26 ns3362699 kernel: gran_size: 256K chunk_size: 2M num_reg: 10 lose cover RAM: 254M
Sep 25 20:14:26 ns3362699 kernel: *BAD*gran_size: 256K chunk_size: 4M num_reg: 10 lose cover RAM: -2M
Sep 25 20:14:26 ns3362699 kernel: *BAD*gran_size: 256K

LinuxGam
25-09-2013, 20:17
I am having a nervous breakdown Felix.... It has continued to keep crashing and onsite just blindly reboot it.... No one seems to have listened to my suggestion to upgrading to the Latest 1401 BIOS in support, which says it improves system stability.

They have now swapped out RAM, which is always a good choice, it has now needed rebooting multiple times since cos of crashes and it's only recently been swapped out.

I have now paid for this server for over 1 month and never been able to trust it with public facing live data and it's just getting worse....

Felix@OVH
25-09-2013, 19:10
8.037.00

ftp://ftp.ovh.net/made-in-ovh/bzImag...t/r8168.patch:
Code:
#define RTL8168_VERSION "8.037.00" NAPI_SUFFIX

Phixion
25-09-2013, 18:47
Just curious, is the test kernel patched with r8168-8.037.00 or r8168-8.036.00?

In any case, I was running the r8168 kernel for a while and it was rock solid, I've now switched to the mptcp kernel and haven't crashed yet either. Time will tell.

Felix@OVH
25-09-2013, 18:17
Quote Originally Posted by Thelen
Not fine: 1510236 1509749
I just had a quick look on the first one... I see quite many segfaults of your torrent client. Maybe you should try to fix that first to exclude any possibility of it being the culprit.

Quite possible that this software and the GRS kernel don't "like" each other, so it might be worth trying with the standard "std" kernel instead of GRS. While at changing kernels, maybe also try the one patched with latest r8168, you can install it in 4 easy steps:
Code:
# cd /boot
# wget ftp://ftp.ovh.net/made-in-ovh/bzImag...68-std-ipv6-64
# grub2-mkconfig > /boot/grub2/grub.cfg
# reboot
Best regards,
Felix

Felix@OVH
25-09-2013, 18:07
Quote Originally Posted by Phixion
Just noticed there is new version of the Realtek drivers: r8168-8.037.00.tar
Yes, integrated into latest test kernel: ftp://ftp.ovh.net/made-in-ovh/bzImage/latest-test/

Thelen
25-09-2013, 03:35
So one of the 3 boxes that were crashing is fine, but other 2 crash every 2 days or so.

Fine: 1510372
Not fine: 1510236 1509749

Very strange.

LinuxGam
24-09-2013, 22:59
The BIOS update before this 1301 improved system compatibility which sounds good too. If they are not willing to do it then I want a new server that works... I have paid for one month already and never had a fully stable server that I can use for live.

LinuxGam
24-09-2013, 22:54
Well... being as my server is completely unstable... I am more than willing to take that chance! It can always be flashed back to an earlier version if it turns out to be a different problem in the end.

Chances are BIOS updates will fix more issues than they introduce, tested or not.

Let's see what they say :-)

Phixion
24-09-2013, 22:52
Mine isn't the Pro, it's the non-pro version.

But the BIOS version is 1101, I asked for update, which they did, and it was still 1101.

When I asked why they hadn't installed the latest they said they updated with their current BIOS update, they have the newer versions in test machines.

So until they class it as stable I guess they won't install it.

Just noticed there is new version of the Realtek drivers: r8168-8.037.00.tar

LinuxGam
24-09-2013, 22:46
I have just checked the Asus site to see what problems where fixed by BIOS updates and the latest BIOS update that has been out less than a month, is specifically labelled as

P8H77-M PRO BIOS 1401
Improve system stability.

So I have re-opened the ticket and asked them to download the latest one from the site and apply. I assume they are just using the the "latest" one they downloaded in the past :P

LinuxGam
24-09-2013, 22:20
Quote Originally Posted by Phixion
Been there, done that :P
Is yours giving anything at all in the logs, like the network card playing up or was it like mine, a pure freeze with no messages?

Also what MB BIOS version you on now? Mine is now 1101

Phixion
24-09-2013, 22:15
Been there, done that :P

LinuxGam
24-09-2013, 21:47
Ok. OVH have updated the BIOS to a new version. I will be sure to let people know if this makes any difference. LG

LinuxGam
24-09-2013, 19:25
Ok.. it said that pve-firmware replaces that package, which is already installed and up to date. I am running Proxmox :-)

I don't actually even have anything that suggests it's the network card as no logs when it freezes saying anything broke, just figured reading this thread worth a try. Hopefully OVH can swap out some hardware that could be causing it.

It's a bit worrying ProxMox themselves don't recommend SOFT RAID and that's the default set up I have. I wonder how that affects it?

I have another bigger server on order with hardware RAID, but as standard they can't give me any date when they'll have the parts in so having to pay for and deal with this server which I can't even use in production and now have renew it for another month... broken! As the new server is not ready.... SIGH!!

LinuxGam
24-09-2013, 19:19
Quote Originally Posted by Phixion
I think you have to enable the non-free repository first, then:

sudo apt-get update
sudo apt-get install firmware-realtek

Please let me know how it goes! I am unable to test on my server at the moment.

Also to add, if you lspci -v and it still says you are using r8169 drivers you need to blacklist r8169 and add r8168 to the modules list.
Thanks mate, the driver showed up as r8168 straight away so that's all good. I will try what's mentioned above.

with firmware.

OVH are on the case now with reboots and some problem solving. Seems to be at random :-)

Phixion
24-09-2013, 18:58
I think you have to enable the non-free repository first, then:

sudo apt-get update
sudo apt-get install firmware-realtek

Please let me know how it goes! I am unable to test on my server at the moment.

Also to add, if you lspci -v and it still says you are using r8169 drivers you need to blacklist r8169 and add r8168 to the modules list.

LinuxGam
24-09-2013, 17:42
Quote Originally Posted by Phixion
Felix, do you think this would be worth trying?

Installing the r8168 drivers from the Realtek site and then installing this package: http://packages.debian.org/wheezy/firmware-realtek

The OVH kernel with the build in patch works great, no crashes for me... but I would like the flexibility of using custom Kernels.
I did the latest Realtek web site drivers last night, but didn't do the firmware as didn't know how to.

Phixion
24-09-2013, 17:40
Felix, do you think this would be worth trying?

Installing the r8168 drivers from the Realtek site and then installing this package: http://packages.debian.org/wheezy/firmware-realtek

The OVH kernel with the build in patch works great, no crashes for me... but I would like the flexibility of using custom Kernels.

LinuxGam
24-09-2013, 17:29
Quote Originally Posted by Felix@OVH
Do you have a ticket number so I can look at it? (or the server number displayed in Manager interface)
Thanks Felix!

Re-opening of the ticket 1517099 - OVH Monitoring

I have been trying to deal with it, but I am clean out of ideas now, bar hardware.

I did run full rescue hardware checks and they seemed to pass, but I guess that's no guarantee.

Thanks again

Felix@OVH
24-09-2013, 17:22
Do you have a ticket number so I can look at it? (or the server number displayed in Manager interface)

LinuxGam
24-09-2013, 17:16
Just froze again. Absolutely nothing in SysLog at all before crash. Just usual RTM events then next logs I see are the reboot.

Are there any other logs I can look at where info may be available? It certainly seems like a hardware issue and I was planning on moving live sites and db's to it before the end of the month.. but I can't with freezes up to 3 times a day.

Sigh

LinuxGam
23-09-2013, 21:53
Thanks Neddy and Myatu for your help! Everything installed without a problem, so now to see if the stability improves.

@Neddy, I couldn't find any firmware updates with the drivers, but maybe I just am doing something wrong?

LinuxGam
23-09-2013, 21:26
Just noticed something.. my card lists as an 8168 in lspci -v and the module installed on mine is an 8169.... I assume I am downloading the 8168 from their site?

LinuxGam
23-09-2013, 21:10
Quote Originally Posted by NeddySeagoon
Many NICs covered by the r8169 module have firmware updates.
To get these to install, you need to build r8169 as a module and put the firmware in /lib/firmware. Its provided in the linux-firmware pacage.

If you want to make r8169 built into your kernel, you also need to build the firmware in too.

You can tell if the driver is looking for firmware for your card and can't find it as it stalls the boot process for 60 sec. Thats easy to spot in dmesg providing you have timestamps turned on.

I have no idea what the firmware updates do but that they exist at all is a vendor admission that things could be better.
Thanks I'll look into this

LinuxGam
23-09-2013, 21:09
Quote Originally Posted by Myatu
Have you compiled the new driver/module?
Do I follow your instructions from before and download the latest correct driver for my card. I saw some posts about trying the xxx8 driver?

Myatu
23-09-2013, 20:51
Quote Originally Posted by LinuxGam
Did anyone ever find a working fix for ProxMox
Have you compiled the new driver/module?

NeddySeagoon
23-09-2013, 20:06
Many NICs covered by the r8169 module have firmware updates.
To get these to install, you need to build r8169 as a module and put the firmware in /lib/firmware. Its provided in the linux-firmware pacage.

If you want to make r8169 built into your kernel, you also need to build the firmware in too.

You can tell if the driver is looking for firmware for your card and can't find it as it stalls the boot process for 60 sec. Thats easy to spot in dmesg providing you have timestamps turned on.

I have no idea what the firmware updates do but that they exist at all is a vendor admission that things could be better.

LinuxGam
23-09-2013, 14:45
Did anyone ever find a working fix for ProxMox as my server randomly freezes, it passed all the rescue mode hardware tests and I can never see anything in logs from when it freezes

It just freezes and needs a hard reboot. However, it does often freeze when backing stuff up by FTP so could well be the network card.

rizuk
17-09-2013, 19:03
I think ovh should replace them motherboards on the sp servers getting sick of this

raxxeh
17-09-2013, 14:28
Quote Originally Posted by Felix@OVH
Hi Thelen,

Can you provide me a ticket number so I can see all the details and maybe try another kernel with you?

Felix
I have a question, what are you going to do about people like me who use proxmox, we cannot simply change kernels without resulting in the server being useless.

Why aren't you just swapping in low profile nics into these servers at this point?

Felix@OVH
17-09-2013, 10:42
Hi Thelen,

Can you provide me a ticket number so I can see all the details and maybe try another kernel with you?

Felix

Thelen
17-09-2013, 10:35
yea i had a server crash that has been fine for like 45 days.. weird.

and still having crashes for 3 of the 5 servers got at GRA last week, they've replaced the PSU and mobo and ram and cpu on one, nothing on the other 2, though they only just started crashing in the last couple days...

this is very very weird, as i cannot see the same errors as before in the log, it just seems to crash with no warning or messages..

lspci -v
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 44
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8169

uname -r
3.10.9-xxxx-grs-ipv6-64

rizuk
17-09-2013, 00:52
had a crash after 6 days!..also ovh graphs not working again 3.11 kernel


03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 P CI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 44
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8168


can you not use a better motherboard ...

rickyday
11-09-2013, 17:16
Quote Originally Posted by Myatu
You'll be in luck if its Felix. He'll probably step out of that 4th dimension he usually hides in, waves a magic wand and presto!
Cant help with the issue as the NIC works fine with Windows Server 2012 but that is quite some compliment there for Felix coming from our resident forum nix geek!

rickyday
11-09-2013, 17:10
Quote Originally Posted by geoffreyc
moved my production environment to the machine, we will see how it goes !
Good luck!

Fingers crossed no crashes from now on.

Felix@OVH
11-09-2013, 16:45
Quote Originally Posted by Phixion
So far so good Felix, no crash overnight.
...
Is this fix different than the one in the current Ubuntu 12.04 OVH Kernel?
No crash so far is already quite good news, let's see how it goes on.

I couldn't find the precise version number of the r8168 module you used on Ubuntu in your earlier posts... However the one patched into this r8168-OVH-Kernel is:
#define RTL8168_VERSION "8.036.00"

geoffreyc
11-09-2013, 16:15
FYI, still not had a crash after installing realtek drivers on proxmox v3. I'll report back if I do get a crash. Looking good so far, moved my production environment to the machine, we will see how it goes !

Phixion
11-09-2013, 14:42
So far so good Felix, no crash overnight.

What I am referring to is when issuing lspci -v I used to get:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Device 8505
Kernel driver in use: r8169


Then I would install the Realtek drivers and get:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Device 8505
Kernel driver in use: r8168


When I installed Ubuntu 12.04 with OVH Kernel I got the same as I have now using your new kernel, notice the Version numbers of the card have an additional 8411:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Kernel driver in use: r8168

Is this fix different than the one in the current Ubuntu 12.04 OVH Kernel?

Felix@OVH
11-09-2013, 11:06
Quote Originally Posted by Phixion
I will try this kernel, but I'd really appreciate if you could give directions in getting this working with the default kernel.
If the included r8168 driver should be proven to be "THE" fix for this issue, it will become the default kernel.

The lspci -v output is very similar, if not exactly the same
as intended, because lspci -v looks at the pci-devices in /proc and connects these with the data in pci.ids. However, if you look at "lspci -k" it will show you the kernel driver used for a device, in this case r8168:
Code:
# lspci -k|grep -A2 "03:00.0"
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
        Subsystem: ASUSTeK Computer Inc. Device 8505
        Kernel driver in use: r8168
Edit: Can anyone tell me what "grs", "grspax" and "std" mean in the kernel names? I've googled but not found anything worthwhile.
GRS: includes the GRSec patch from https://grsecurity.net/
GRES+PAX: "PaX provides the implementation of non-executable pages and randomization features", part of GRSec
STD is the standard, vanilla kernel like retrieved from kernel.org without patches

Phixion
10-09-2013, 23:34
Sorry Felix, I initially missed your message in my ticket.

My I ask why you aren't convinced it's the network card causing this crash? My error log indicates it is (it shows the card stopping and restarting, all within 3 seconds), this forum has numerous people experiencing the same issue, google is full of tales of the 8168 network card causing crashes. I'm not questioning your experience here (which is no doubt far better than mine) but everything points to the NIC.

I will try this kernel, but I'd really appreciate if you could give directions in getting this working with the default kernel. I've tried installing the r8168 drivers, blacklisting r8169... nothing is stable, even when it displays that it's using the r8168 drivers.

uname -r
3.10.11-r8168-grs-ipv6-64

lspci -v
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 44
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8168

The lspci -v output is very similar, if not exactly the same as the Ubuntu 12.04LTS OVH Kernel install - which I had a crash with earlier today. Is this new Kernel any different to the one in the Ubuntu 12.04 OVH install?

This episode has pretty much put me off Realtek for life...

Many thanks.

Edit: Can anyone tell me what "grs", "grspax" and "std" mean in the kernel names? I've googled but not found anything worthwhile.

Felix@OVH
10-09-2013, 22:39
I sent you a link to a test kernel that includes the r8168 driver instead of the r8169 driver, which might cause the trouble according to what you wrote.

Can you try it and keep me posted if it's working better?

Best regards,
Felix

Phixion
10-09-2013, 20:04
Another crash today, this time with the OVH Kernel installed as was suggested by support.

Myatu
09-09-2013, 19:51
The following will allow you to see what version the module is:

Code:
modinfo r8168 | grep version
(or: )

Code:
modinfo r8169 | grep version
8.x.x-NAPI is the one by Realtek.

geoffreyc
09-09-2013, 18:13
No crashes on proxmox so far ... fingers crossed !

Phixion
09-09-2013, 16:51
They have no motherboards so I doubt that's going to happen anytime soon.

raxxeh
09-09-2013, 15:58
Quote Originally Posted by Phixion
Just phoned OVH and they suggested using the OVH Kernel for now until a fix is pushed through.

Their technical team is aware of the issue.

I am going to use the OVH kernel for now to test.

Doesn't help you guys that need to run modular kernels though.
With any luck i'll get hardware replaced as proxmox can't be used any other way.

Phixion
09-09-2013, 15:12
Just phoned OVH and they suggested using the OVH Kernel for now until a fix is pushed through.

Their technical team is aware of the issue.

I am going to use the OVH kernel for now to test.

Doesn't help you guys that need to run modular kernels though.

geoffreyc
09-09-2013, 14:27
Quote Originally Posted by Myatu
"it" as in the Realtek driver module.

As for compiling, http://forum.ovh.co.uk/showpost.php?p=47569&postcount=9 -- it also explains that you will be kicked out if you don't comment out those lines.
Thanks for the link.
I have followed the guide, and the autorun.sh runs (somewhat). Here is the output:
root@ns3366596:/tmp/r8168-8.036.00# ./autorun.sh

Check old driver and unload it.
rmmod r8169
Build the module and install
expr: syntax error
expr: syntax error
expr: syntax error
Backup r8169.ko
rename r8169.ko to r8169.bak
DEPMOD 2.6.32-23-pve
load module r8168
Updating initramfs. Please wait.
update-initramfs: Generating /boot/initrd.img-2.6.32-23-pve
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
W: mdadm: no arrays defined in configuration file.
Completed.

It seems to have a couple of errors?
Though I do have
root@xxxx:~# lspci -s 03:00.0 -v
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 34
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8168
After a reboot!

Myatu
09-09-2013, 13:27
"it" as in the Realtek driver module.

As for compiling, http://forum.ovh.co.uk/showpost.php?p=47569&postcount=9 -- it also explains that you will be kicked out if you don't comment out those lines.

geoffreyc
09-09-2013, 13:07
Hum, tried to install the linux drivers from the realtek website and .. got locked out straight after running the autorun.sh script :S Nothing a hard reboot cant fix though haha. I'm guessing it will be because of this. "Check old driver and unload it." ^^

geoffreyc
09-09-2013, 12:15
Quote Originally Posted by Myatu
Proxmox does use a module for it though, so all you need to do is compile the one provided by Realtek.
Sorry I don't seem to understand your sentence, what is "it" ? I'm not used to messing around with that side of things, apologies if my question seem stupid do you :s cheers !

Myatu
09-09-2013, 12:13
You can't install that kernel with Proxmox or OpenVZ - they do not use stock kernels (especially Proxmox, as it is patched and backported to the gills - for good reason).

Proxmox does use a module for it though, so all you need to do is compile the one provided by Realtek.

geoffreyc
09-09-2013, 12:05
Quote Originally Posted by Phixion
http://linuxg.net/compile-kernel-3-9-on-debian-wheezy/

The bit where it says 'make oldconfig' use 'make menuconfig' instead else you will be forever pressing Enter.

I doubt it will fix it though, installing the latest realtek drivers did help make it more stable but it still crashed and for a server that is unacceptable.

Apparently a technician is looking into my problem at the moment, faster than I expected - I thought I'd have to wait for Monday.
I might be doing it wrong ... I installed kernal 3.11 and now it tells me "Directory /proc/vz not found, assuming non-OpenVZ kernel" when I try to create a container ... I'm assuming a custom kernel is needed for openvz containers to be created?

Myatu
08-09-2013, 23:42
You'll be in luck if its Felix. He'll probably step out of that 4th dimension he usually hides in, waves a magic wand and presto!

Phixion
08-09-2013, 23:07
http://linuxg.net/compile-kernel-3-9-on-debian-wheezy/

The bit where it says 'make oldconfig' use 'make menuconfig' instead else you will be forever pressing Enter.

I doubt it will fix it though, installing the latest realtek drivers did help make it more stable but it still crashed and for a server that is unacceptable.

Apparently a technician is looking into my problem at the moment, faster than I expected - I thought I'd have to wait for Monday.

geoffreyc
08-09-2013, 22:56
mSP Server running proxmox here, crashes every 10 minutes when under load ... can't use for prod, and have no clue on how to update kernel and such... would anyone have some guidance for me please?

Phixion
08-09-2013, 18:37
Yes, 3.11 also used exactly the same drivers as the default Debian Wheezy Kernel.

In any case, the end user shouldn't be compromising on their choice of Kernel or Distro due to bad hardware... We shouldn't have to spend hours screwing with this stuff to just make it work.

raxxeh
08-09-2013, 18:32
Quote Originally Posted by Phixion
Installing various Kernels, including the latest 3.11 mainline.

Nothing worked, I've had 2 crashes today alone.
ffff****, crashing on 3.11?

this is my goddamn dns machine..... D:

Phixion
08-09-2013, 18:15
raxxeh: I have completely given up on it and have requested OVH to look into it and hopefully change the NIC and or other hardware that may be causing this issue.

I've tried:

Installing latest realtek drivers
Installing various distros
Installing various Kernels, including the latest 3.11 mainline.

Nothing worked, I've had 2 crashes today alone.

I have put in a ticket and await the reply tomorrow morning.

raxxeh
08-09-2013, 15:26
Alright,

I'm using proxmox on this new mSP, I can't change it's kernel without breaking ****.

Anyone have any suggestions? Updating driver didn't make a difference.

Or maybe I should just drop reliable virtualization and shift to virtualbox under 3.11, at least I'll know it won't crash.

rizuk
06-09-2013, 16:32
3.11 stable version is out now

btw current system still going strong

up time 7.6 days

Darkimmortal
06-09-2013, 14:24
The realtek 8168 is a real ***** to work with in a server under Linux. In a non-OVH server running Arch Linux, I've had best luck with kernel 3.8 series and this version of r8168:

Code:
filename:       /lib/modules/3.8.7-1-ARCH/extramodules/r8168.ko.gz
version:        8.035.00-NAPI
license:        GPL
description:    RealTek RTL-8168 Gigabit Ethernet driver
author:         Realtek and the Linux r8168 crew 
srcversion:     0B1D58537B8144A161D189A
alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
depends:        
vermagic:       3.8.6-1-ARCH SMP preempt mod_unload modversions 
parm:           eee_enable:int
parm:           speed:force phy operation. Deprecated by ethtool (8). (ushort)
parm:           duplex:force phy operation. Deprecated by ethtool (8). (int)
parm:           autoneg:force phy operation. Deprecated by ethtool (8). (int)
parm:           aspm:Enable ASPM. (int)
parm:           rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int)
parm:           use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm:           debug:Debug verbosity level (0=none, ..., 16=all) (int)
Many other kernels, including the LTS kernel, would crash, lose network connectivity or drop packets at random after a few days, whereas this one is currently sitting at nearly 150 days uptime.

Phixion
05-09-2013, 04:09
rizuk, being as the 3.11 kernel is now mainline I installed it but it's not displaying what yours is, maybe the lack of BIOS update is the cause here?

Code:
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
        Subsystem: ASUSTeK Computer Inc. Device 8505
        Flags: bus master, fast devsel, latency 0, IRQ 44
        I/O ports at e000 [size=256]
        Memory at f0004000 (64-bit, prefetchable) [size=4K]
        Memory at f0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: r8169
My sever was solid for a few days with the updated driver but crashed again tonight.

rizuk
31-08-2013, 02:36
true yeah not best to install till its stable but atm im running this kernel fine no problem
might even stick with it

Phixion
31-08-2013, 01:52
Problem is gone in 3.11 because it has built in support for the Realtek network card.

https://www.kernel.org/diff/diffview...ch-3.11-rc7.xz

I won't be installing this until it's labelled as stable.

Phixion
30-08-2013, 14:53
Thelen, I also have:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)

But now my system seems to be using the 8168 driver rather than 6189.

This is my last ditch effort, no crash in ~20 hours...

Rizuk it looks like it's fixed it on yours because it actually has the module built in, I didn't want to install the 3.11 kernel because it's not yet classed as stable.

rizuk
30-08-2013, 12:47
install 3.11 problems be gone

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)

3.8 kernel did say

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)


on 3.11 kernel

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 44
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8168

root@ns2:~# modinfo r8168
filename: /lib/modules/3.11.0-3-generic/updates/dkms/r8168.ko
version: 8.036.00-NAPI
license: GPL
description: RealTek RTL-8168 Gigabit Ethernet driver
author: Realtek and the Linux r8168 crew
srcversion: 0CC5CD0B1343DB41FF464A5
alias: pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
alias: pci:v000010ECd00008168sv*sd*bc*sc*i*
depends:
vermagic: 3.11.0-3-generic SMP mod_unload modversions
parm: eee_enable:int
parm: speed:force phy operation. Deprecated by ethtool (8). (ushort)
parm: duplex:force phy operation. Deprecated by ethtool (8). (int)
parm: autoneg:force phy operation. Deprecated by ethtool (8). (int)
parm: aspm:Enable ASPM. (int)
parm: rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int)
parm: use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm: debugebug verbosity level (0=none, ..., 16=all) (int)

not had a crash since
ovh also updated my motherboard bios to the latest 1202

Thelen
30-08-2013, 09:02
took like 2 days of crashing and tickets complaining and such. they said:

Dear customer,

The @^@^@^@^@^ shows a writing issue, generally it is
related to a motherboard problem,



Jul 30 23:42:06 SERVER named[5077]: error (connection
refused) resolving 'tracker1.torrentum.pl/A/IN':
5.135.176.153#53
Jul 30 23:42:17 SERVER someprocess:
gethostby*.getanswer: asked for
"bt.home-ix.ru.nyud.net IN
A", got type "DNAME"
Jul 30 23:42:17 SERVER someprocess:
gethostby*.getanswer: asked for
"bt.home-ix.ru.nyud.net IN
A", got type "DNAME"
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^
then replaced the mobo. but it crashed again:
Dear Customer,

Since the last intervention didn't fixed the issue, would
you please confirm if we can schedule the server
replacement? each component will be replaced beside the
hard drives.
Such intervention can be scheduled starting from Monday
between 10:00 and 17:00.
then:

Dear customer,

The intervention has been completed.

Here are the details of this operation:
Spare server replacement
Date 2013-08-05 12:54:32, david.dirvang made Spare server
replacement:
spare server done :
- motherboard has been replaced
- Cpu has been replaced
- Ram has been replaced
- Power supply has been replaced
- Sata cable has been replaced
restart the server
boot ok
server on login
ping ok
services started
and it didn't crash since. nic is:
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
on another server that they only replaced the mobo, started with:

I have checked your server, and noticed failure
on the motherboard, hence the log error '^@^@^@^@^@^@^':

Aug 13 14:55:11 SERVER named[4131]: error (connection
refused) resolving 'torrent.quebecxtreme.com/A/IN':
67.215.8.196#53
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^


@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ @^@^@^@^@


^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^


@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ @^@^@^@^@


^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^


@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ @^@^@^@^@


^@^@^@^@^@^@^@^@^@^@^@^@Aug 13 15:14:04 SERVER rsyslogd:
[origin software="rsyslogd"
swVersion="7.2.6" x-pid="3890"
x-info="http://www.rsyslog.com"] start
Aug 13 15:14:04 SEREVR kernel: Initializing cgroup subsys
cpuset
Aug 13 15:14:04 SERVER kernel: Linux version
3.8.13-xxxx-grs-ipv6-64 (

I will therefore proceed to replace the motherboard.
Please confirm when to plan the intervention.
and was fine after that. nic:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller (rev 09)
which is weird because all the other servers without mobo replacement also have that nic. so we have 1 server with 8168b, the rest plain 8168 :/

using fedora and 3.8 kernel, not sure what driver.

Phixion
30-08-2013, 02:46
It seems that even though I install the 8168 drivers it continues loading the 8169 drivers.

So I have added the 8169 drivers to the blacklist and it is now using 8168... see how this goes.

rizuk
30-08-2013, 01:49
mine says

[ 2.296231] r8168 Gigabit Ethernet driver 8.036.00-NAPI loaded
[ 2.296340] r8168 0000:03:00.0: irq 44 for MSI/MSI-X

dan
30-08-2013, 01:42
What is the advantage of installing this firmware?
I got the same message, but it seems itīs working fine without it.

Myatu
30-08-2013, 00:56
Quote Originally Posted by Phixion
What confuses me is that the default drivers installed for this NIC were the r8169 drivers.
That's fine. Look in your /var/log/dmesg log, further towards the top (above the lines you've quoted), you'll see something like:

Code:
r8169 Gigabit Ethernet driver 8.036.00-NAPI loaded
r8169 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:03:00.0: setting latency timer to 64
  alloc irq_desc for 33 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
r8169 0000:03:00.0: irq 33 for MSI/MSI-X
r8169 0000:03:00.0: eth0: RTL8168f/8111f at 0xffffc9000185a000, 08:60:6e:6e:57:ed, XID 08000880 IRQ 33
r8169 0000:03:00.0: eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
If you need the firmware, you'd want to try (Debian):

http://packages.debian.org/search?se...rmware-realtek

rizuk
30-08-2013, 00:23
Quote Originally Posted by dan
Is your server stable now with this kernel?
correct not had one crash yet fingers crossed 3.11 kernel
it came with latest nic driver installed etc

this is what it says now

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09)
Subsystem: ASUSTeK Computer Inc. P8H77-I Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 44
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8168

root@ns2:~# modinfo r8168
filename: /lib/modules/3.11.0-3-generic/updates/dkms/r8168.ko
version: 8.036.00-NAPI
license: GPL
description: RealTek RTL-8168 Gigabit Ethernet driver
author: Realtek and the Linux r8168 crew
srcversion: 0CC5CD0B1343DB41FF464A5
alias: pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
alias: pci:v000010ECd00008168sv*sd*bc*sc*i*
depends:
vermagic: 3.11.0-3-generic SMP mod_unload modversions
parm: eee_enable:int
parm: speed:force phy operation. Deprecated by ethtool (8). (ushort)
parm: duplex:force phy operation. Deprecated by ethtool (8). (int)
parm: autoneg:force phy operation. Deprecated by ethtool (8). (int)
parm: aspm:Enable ASPM. (int)
parm: rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int)
parm: use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm: debugebug verbosity level (0=none, ..., 16=all) (int)

Phixion
29-08-2013, 21:43
Well, sadly installing the new driver hasn't fixed it for me.

I've had 2 crashes in 5 minutes, both whilst the NIC was in heavy use.

dan
28-08-2013, 18:07
Quote Originally Posted by rizuk
I have installed 3.11 kernel it came with the nic latest nic driver and has not crashed since fingers crossed
it was 100 percent a nice problem because the server only crashed when the nic was heavily used
Is your server stable now with this kernel?

Phixion
28-08-2013, 17:13
This was in my /var/log/messages just before the crash.

Code:
Aug 28 10:28:50 kernel: [   10.995666] r8169 0000:03:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168f-1.fw (-2)
Aug 28 10:28:50 kernel: [   11.005955] r8169 0000:03:00.0: eth0: link down
Aug 28 10:28:50 kernel: [   11.005963] r8169 0000:03:00.0: eth0: link down
Aug 28 10:28:50 kernel: [   11.007578] ADDRCONF(NETDEV_UP): eth0: link is not ready
Aug 28 10:28:53 kernel: [   14.083005] r8169 0000:03:00.0: eth0: link up
Aug 28 10:28:53 kernel: [   14.085403] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
What confuses me is that the default drivers installed for this NIC were the r8169 drivers.

lspci gives:

Code:
[03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
Notice it says it's a RTL8111/8168B.

The drivers I installed were r8168.

Now in my error log it's mentioning the card is an 8169...

Can anyone clear this up for me please?

Phixion
28-08-2013, 17:10
I had my first crash in a few days last night... no idea what caused it.

But I woke up with my server in netboot mode.

rizuk
27-08-2013, 17:26
Quote Originally Posted by raxxeh
Can confirm i've run into this issue.

Using proxmox, so not as simple as just playing with kernel.

OVH you need to fix this.
Hey mate which kernel do you have currently

raxxeh
27-08-2013, 16:56
Can confirm i've run into this issue.

Using proxmox, so not as simple as just playing with kernel.

OVH you need to fix this.

rizuk
27-08-2013, 16:42
I have installed 3.11 kernel it came with the nic latest nic driver and has not crashed since fingers crossed
it was 100 percent a nice problem because the server only crashed when the nic was heavily used

loveorhate
27-08-2013, 15:04
I compiled new 3.10.9 kernel from source, and compiled
the r8168 drive from Realtek and I'm testing it now.

Let's see if it crashes over the night again or not.


According to some there have also been some BIOS issues on MSPs (new bios is available), I wonder
if this could be the case with FS-24Ts too.

loveorhate
27-08-2013, 12:22
I have a FS-24T and it too has the same NIC as MSP

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Device 8505
Flags: bus master, fast devsel, latency 0, IRQ 42
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8169


Kernel is 3.10.9 latest from OVH

Linux nsxxx.ovh.net 3.10.9-xxxx-grs-ipv6-64 #1 SMP Wed Aug 21 11:51:59 CEST 2013 x86_64 GNU/Linux

And it crashed after reboot I noticed these:

Aug 27 kernel: r8169 0000:03:00.0 eth0: unable to load firmware patch rtl_nic/rtl8168f-1.fw (-2)
Aug 27 kernel: r8169 0000:03:00.0 eth0: link down
Aug 27 kernel: r8169 0000:03:00.0 eth0: link down

Is there any fix for 3.10 kernel?

dan
26-08-2013, 15:01
I have the same problem with an mSP in RBX.

I installed the realtek driver, but it is still crashing.

HARD Reboot
Date 2013-08-26 09:56:18, $name made HARD Reboot:
Here are the details of the operation performed:
No information on the screen ("black screen"). No response
to keyboard.
I am using debian 6 64bit.
There are no relevant messages in syslog or kern.log.

Can this be a hardware issue?

rizuk
25-08-2013, 02:58
Quote Originally Posted by Felix@OVH
Hi,

Are you sure it's related to the NIC? Which kernel are you all using?

If you don't mind, you could test ftp://ftp.ovh.net/made-in-ovh/bzImage/latest-test/ - it's a kernel that we are preparing to roll out soon.

Felix
can we disable speedstep?

rizuk
25-08-2013, 01:11
i installed it

says driver in use

but it doesn't say Kernel modules: r8168
it just says Kernel driver in use: r8168

rizuk
24-08-2013, 21:46
anyway to install this driver on 3.10 kernel

Phixion
22-08-2013, 23:27
I'd like to add that you need to install 'dkms' package to do this else you get nothing but errors... took me 3 days to work that out.

RogEnk
22-08-2013, 22:33
BTW. Thanks to Myatu I was able to update my Realtek drivers under Centos without issue.

It was a tad confusing with 8168/9 numbers being interchanged.

Code:
 lspci -s 03:00.0 -v
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
        Subsystem: ASUSTeK Computer Inc. Device 8505
        Flags: bus master, fast devsel, latency 0, IRQ 34
        I/O ports at e000 [size=256]
        Memory at f0004000 (64-bit, prefetchable) [size=4K]
        Memory at f0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: r8169
        Kernel modules: r8168
And
Code:
 modinfo r8168
filename:       /lib/modules/2.6.32-358.14.1.el6.x86_64/kernel/drivers/net/r8168.ko
version:        8.036.00-NAPI
license:        GPL
description:    RealTek RTL-8168 Gigabit Ethernet driver

RogEnk
22-08-2013, 22:26
Quote Originally Posted by Phixion
Trying to follow Myatu's guide and getting nothing but problems... I'm close to giving up on this ****.

Code:
Check old driver and unload it.
rmmod r8169
Build the module and install
make: *** /lib/modules/3.2.0-52-generic/build: No such file or directory.  Stop.
make[1]: *** [clean] Error 2
make: *** [clean] Error 2
WHY? I have build-essential installed, I have automake installed.

WTF IS WRONG? PLEASE HELP!
Are you running the autorun.sh shell script?
And you have Build Essentials installed (if Ubuntu/Debian)?

Phixion
22-08-2013, 15:00
Trying to follow Myatu's guide and getting nothing but problems... I'm close to giving up on this ****.

Code:
Check old driver and unload it.
rmmod r8169
Build the module and install
make: *** /lib/modules/3.2.0-52-generic/build: No such file or directory.  Stop.
make[1]: *** [clean] Error 2
make: *** [clean] Error 2
WHY? I have build-essential installed, I have automake installed.

WTF IS WRONG? PLEASE HELP!

rizuk
21-08-2013, 16:23
nic was the problem i had to install the nic latest drivers
now my server does not crash

Felix@OVH
21-08-2013, 15:23
Hi,

Are you sure it's related to the NIC? Which kernel are you all using?

If you don't mind, you could test ftp://ftp.ovh.net/made-in-ovh/bzImage/latest-test/ - it's a kernel that we are preparing to roll out soon.

Felix

Phixion
20-08-2013, 14:05
....

Myatu
06-08-2013, 16:33
Here's if you want to use the driver directly from Realtek. But before you do this, keep in mind that the following does not come with any guarantees whatsoever and your mileage may vary - so if you loose network connectivity to your server and don't know how to restore it manually through rescue-pro, then "too bad, reinstall".

To grab info about the current r8168 driver, use:

Code:
modinfo r8168
which will spit out something similar to this:

Code:
modinfo r8169
filename:       /lib/modules/3.2.0-4-amd64/kernel/drivers/net/ethernet/realtek/r8169.ko
firmware:       rtl_nic/rtl8168f-2.fw
firmware:       rtl_nic/rtl8168f-1.fw
firmware:       rtl_nic/rtl8105e-1.fw
firmware:       rtl_nic/rtl8168e-3.fw
firmware:       rtl_nic/rtl8168e-2.fw
firmware:       rtl_nic/rtl8168e-1.fw
firmware:       rtl_nic/rtl8168d-2.fw
firmware:       rtl_nic/rtl8168d-1.fw
version:        2.3LK-NAPI
license:        GPL
description:    RealTek RTL-8169 Gigabit Ethernet driver
author:         Realtek and the Linux r8169 crew 
srcversion:     71126868D831C783047869A
alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
depends:        mii
intree:         Y
vermagic:       3.2.0-4-amd64 SMP mod_unload modversions 
parm:           use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm:           debug:Debug verbosity level (0=none, ..., 16=all) (int)
You can see the "version" is "2.3LK-NAPI" as well as where it pulls any firmware files from.

To update the driver, you need a modular kernel. Following post http://forum.ovh.co.uk/showthread.php?t=5616 will do the trick if you don't want to re-install your OS.

After you have changed your kernel to a modular one (provided you were using OVH's non-modular kernel), grab a copy of the driver from:

http://www.realtek.com.tw/downloads/...etDown=false#2

On Debian/Ubuntu, before you start you'd want to ensure you have basics installed to compile things:

Code:
apt-get install build-essential linux-headers-$(uname -r)
Note:
Quote Originally Posted by Phixion
I'd like to add that you need to install 'dkms' package to do this else you get nothing but errors...
Then untar the file you have downloaded:

Code:
tar vjxf r8168-8.036.00.tar.bz2
cd r8168-8.036.00
At this point, as you are likely to be using the r816x driver already, you have two options:

  1. Edit the "autorun.sh" file so that it does not unload the r816x driver, so you can watch the progress of compilation, or
  2. Allow autorun.sh to unload the driver (which means the server looses its internet connection - thus also your SSH connection) and let it continue do its thing without you seeing the progress, then reboot autmatically


If you are going the "blindly compile and reboot" route, just leave the "autorun.sh" file as-is and use:

Code:
./autorun.sh;  reboot
I would only recommend this if you trust everything will go right. If you don't trust that, then I'd recommend edit the autorun.sh as outlined below:

Comment out (using "#") the lines that contain:

Code:
/sbin/rmmod r8169
and

Code:
/sbin/rmmod r8168
These are lines 15 and 21 respectively, so that section will look like this after commenting the lines out:

Code:
...
check=`lsmod | grep r8169`
if [ "$check" != "" ]; then
        echo "rmmod r8169"
#        /sbin/rmmod r8169
fi

check=`lsmod | grep r8168`
if [ "$check" != "" ]; then
        echo "rmmod r8168"
#        /sbin/rmmod r8168
fi
...
Now run:

Code:
./autorun.sh
Provided everything went OK, you will now need to reboot the server. Just issue a "reboot" command. If things did not go as planned, the "autorun.sh" will have made a backup of the previous driver, which should can be found with:

Code:
ls /lib/modules/$(uname -r)/kernel/drivers/net/ethernet/realtek
It will have the file extension ".bak". Just remove its counterpart with the extension ".ko" and rename the one with the extension ".bak" to ".ko". Then issue "update-initramfs -u -k $(uname -r)" to ensure initramfs has the restored driver too.

After reboot, your driver should have a different version:

Code:
# modinfo r8168
filename:       /lib/modules/3.2.0-4-amd64/kernel/drivers/net/ethernet/realtek/r8168.ko
version:        8.036.00-NAPI
license:        GPL
description:    RealTek RTL-8168 Gigabit Ethernet driver
author:         Realtek and the Linux r8168 crew 
srcversion:     B921A700CADB0EADC1DE1BF
alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
depends:        
vermagic:       3.2.0-4-amd64 SMP mod_unload modversions 
parm:           eee_enable:int
parm:           speed:force phy operation. Deprecated by ethtool (8). (ushort)
parm:           duplex:force phy operation. Deprecated by ethtool (8). (int)
parm:           autoneg:force phy operation. Deprecated by ethtool (8). (int)
parm:           aspm:Enable ASPM. (int)
parm:           rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int)
parm:           use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm:           debug:Debug verbosity level (0=none, ..., 16=all) (int)
As you can see, the version is now "8.036.00-NAPI" opposed to "2.3LK-NAPI" as show in the beginning.

You can also see if this is in fact the driver being used by performing an "lspci" to determine the location of your network driver:

Code:
# lspci
00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 01)
00:1c.2 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 3 (rev 01)
00:1c.3 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 4 (rev 01)
00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation N10/ICH7 Family SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02)
Which here is the last entry (01:00.0). You can find out more details now, using:

Code:
lspci -s 01:00.0 -v
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02)
	Subsystem: Intel Corporation Device 0001
	Flags: bus master, fast devsel, latency 0, IRQ 43
	I/O ports at 1000 [size=256]
	Memory at 90100000 (64-bit, non-prefetchable) [size=4K]
	Memory at 90000000 (64-bit, prefetchable) [size=64K]
	[virtual] Expansion ROM at 90020000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [ac] MSI-X: Enable- Count=2 Masked-
	Capabilities: [cc] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-00-00-00-00
	Kernel driver in use: r8169
As seen here, it does indeed use the r8169 driver. Use additional "v" in the command option to generate more output, ie., "-vvv".

JakeMS
06-08-2013, 12:44
I had the driver installed with older kernels, pre-3.9 however, I run mainline kernel (on Centos 6.4, mainline as my hardware is too "new" for CentOS) and the driver is incompatible with the newer kernels, there was a patch floating around somewhere but I misplaced it.

The official driver made a "slight" difference, while it did not fix it completely, mainly it just made it a lot less frequent.

But I do not bother to put it into my newer kernels now, and just restart the network when it falls over. Granted this is not a viable solution for a dedicated server.

typically "service network restart" on CentOS brings the device straight back up.

I guess as a bit of a "hacky" workaround you could set a cronjob up to run and ping a remote sever, if it gets no response, reboot the network. Not ideal, but you'd be able to sleep knowing if it falls over it will come back up.

rizuk
06-08-2013, 12:33
cheers Jake so if you're still getting crashing no fix to be found?
which kernel did install

JakeMS
06-08-2013, 08:54
@Thelen:
The command in op looks like "lspci". This command lists all known pci devices. Though may include graphics cards as well (including AGP).

The log in post #4 looks like it is from either dmesg, /var/log/messages, lspci -vv or ethtool.

If you would like to learn a little more about identifying information about a network interface have a read here:
http://www.cyberciti.biz/faq/linux-f...card-is-using/

@OP:
I have the same problem with my desktop PC running:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)

I think it's the driver, a while ago the recommended solution was to download, and compile the official module for the controller.

You can see it's very common:
https://duckduckgo.com/?q=driver+RTL8111%2F8168B
https://bugs.launchpad.net/ubuntu/+s...ux/+bug/240470

Additionally, you can try fixing it your self by grabbing the driver from here:
http://www.realtek.com.tw/downloads/...&GetDown=false

Personally, even after using the official driver I still had issues, but I find it's not so much about how high the speed is, but how many active connections their are aswell.

Edit: Woops, forgot to reply to op :-P.
Edit #2: Just a note:

The above driver does not work on 3.9+ kernel series. Additionally, driver can be put into any kernel (except ovh grs due to lack of module support) as it will just unload the incorrect module and install the correct one.

Thelen
06-08-2013, 08:39
what command are you running to show that info?

rizuk
06-08-2013, 04:12
Quote Originally Posted by raxxeh
Is this on your mSP?

A friend of mine has had an issue with one of his servers (mSP) crashing repeatedly for the last 3-5 days....

I'll direct him here; although I'm yet to have any issues - but I haven't had any extended periods of high usage either.
yep mate its the msp server seems to happen over 100mbps speeds after about 20mins ;/

seen this old thread http://forum.ovh.co.uk/showthread.php?t=3831

maybe its a fix but have to change kernals..



03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Device 8505
Flags: bus master, fast devsel, latency 0, IRQ 45
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8169

wrong driver?

raxxeh does yours say the same

Thelen
06-08-2013, 02:57
Hmm sounds indeed liek the same problem I had. They said they replaced the mobo, but it was still happening. So then they asked to replace everything, psu ram cpu etc, hasn't crashed in 12hrs.

No idea TBH.

BTW where is that CLI log thing from, I'll check mine.

up 14:24, though

raxxeh
06-08-2013, 02:54
Is this on your mSP?

A friend of mine has had an issue with one of his servers (mSP) crashing repeatedly for the last 3-5 days....

I'll direct him here; although I'm yet to have any issues - but I haven't had any extended periods of high usage either.

rizuk
06-08-2013, 02:20
I had this problem last year but never go round to sorting it now its the same on my new server if transferover 100mbps for awhile it will crash the server

I'm sure its the network driver/ovh kernal

anyone had this problem

CI devices

-8086:0158 Intel Corporation Unknown device: 0158
-8086:0151 Intel Corporation Unknown device: 0151
-8086:016a Intel Corporation Unknown device: 016a
-8086:1e31 Intel Corporation Unknown device: 1e31
-8086:1e3a Intel Corporation Unknown device: 1e3a
-8086:1e2d Intel Corporation Unknown device: 1e2d
-8086:1e10 Intel Corporation Unknown device: 1e10
-8086:1e18 Intel Corporation Unknown device: 1e18
-8086:244e Intel Corporation 82801 PCI Bridge
-8086:1e26 Intel Corporation Unknown device: 1e26
-8086:1e4a Intel Corporation Unknown device: 1e4a
-8086:1e02 Intel Corporation Unknown device: 1e02
-8086:1e22 Intel Corporation Unknown device: 1e22
-10ec:8168 Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller
-1b21:1080 Unknown vendor: 1b21 Unknown device: 1080