OVH Community, your new community space.

Persistent/Random Server Crashes? Ethernet controller: Realtek Semiconductor Co., Ltd


rizuk
06-08-2013, 02:53
sorry for bumping a thread this old but i have this program server crashing over 100mbps it has the wrong driver as the op i do not wanna risk doing the kernal update any other way?



03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09)
Subsystem: ASUSTeK Computer Inc. Device 8505
Flags: bus master, fast devsel, latency 0, IRQ 45
I/O ports at e000 [size=256]
Memory at f0004000 (64-bit, prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: r8169

Myatu
17-03-2010, 20:16
I had to use RealTeks' own driver with the Proxmox distro at one point (see http://forum.ovh.co.uk/showthread.php?t=2993 -- seems to have been resolved with 1.5). You can use that article as well, then simply blacklist the r8169 driver, depmod and update-initramfs.

_Lemon_
17-03-2010, 13:52
Hello all,

I have recently been running into a lot more server crashes and it has had me stumped for a while. That is, until last night where I found the cause of the problem to be a bad driver loaded for the network interface card:

Code:
[blue ~] lspci | grep realtek -i
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
You can check which module is loaded by running the command (there will be a lot of information about other hardware but it should be there):

Code:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
    Subsystem: Intel Corporation Device d613
    Flags: bus master, fast devsel, latency 0, IRQ 27
    I/O ports at e000 [size=256]
    Memory at d0414000 (64-bit, prefetchable) [size=4K]
    Memory at d0410000 (64-bit, prefetchable) [size=16K]
    [virtual] Expansion ROM at 40400000 [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
    Capabilities: [70] Express Endpoint, MSI 01
    Capabilities: [ac] MSI-X: Enable- Mask- TabSize=4
    Capabilities: [cc] Vital Product Data 
    Capabilities: [100] Advanced Error Reporting 
    Capabilities: [140] Virtual Channel 
    Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
    Kernel driver in use: r8169
I have highlighted the line in red, "r8169" is the driver causing grief on the system. It appears that this driver doesn't work out of the box. To top this off, none of OVH's kernels appear to have the right module loaded as well (I tested from on a netboot).

What we need to do is grab the latest working driver from Realtek's website and use that one instead. The current solution I have involves recompiling the kernel to support modules and then loading/unloading the module.

Here's how to compile your kernel and load the correct module. I have intentionally tried to make it detailed enough to allow anyone to do this (and yes this is all via SSH):

Code:
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.32.10.tar.bz2
tar -xjvf "linux-2.6.32.10.tar.bz2" -C /usr/src
rm linux-2.6.32.10.tar.bz2
cd "/usr/src/linux-2.6.32.10/"

make clean
make mrproper
Now you need to get your .config file sorted. Here are three ways to do it (try to use the first if possible):

1) Use the output of /proc/config.gz (I don't believe this is enabled on OVH kernels):

Code:
zcat /proc/config.gz > .config
2) The second possibility is to use OVH's kernel configuration found somewhere on this site. However I can't find these at all.

3) The next possibility is to Feral's configuration (this will work but may have extra non-harmful things enabled such as ext4 support, IPv6, traffic control, file quota, etc):

Code:
wget ftp://scarlet.feralhosting.com/kernel/config/config-2.6.32.9 -O .config
Now that you have the configuration all sorted, we need to (1) enable module support, (2) compile r8169 as a kernel module. To begin run:

Code:
make menuconfig
This should present you (after briefly compiling a few things) with a blue/grey console to edit the configuration. Use up/down to select the menu item found in the centre of the screen and left/right to select the action on the bottom row (e.g. Select, Exit, Help). Enter to go into an item and space bar to toggle an item (some items have more than one value to toggle).

If you're curious about something, go right to select "Help" and press enter to receive more information -- just try not to get to carried away here!
  1. Go down and highlight "Enable loadable module support" and press space bar. This should put a "[ * ]" symbol next to it.
  2. Go down until "Device Drivers" is selected and press enter (revealing more options).
  3. Go down until "Network device support" is selected and press enter.
  4. Go down until "Ethernet (1000 Mbit)" is selected and press enter. (Even if your server is 100 Mbps you may still have a 1 Gbps network interface card).
  5. Go down until "Realtek 8169 gigabit ethernet support" is selected.
  6. Press space bar (typically just twice) until you see "" before it. This means that it will be loaded as a module.
  7. Go right and highlight "Exit" and press enter. Keep doing this until asked "Do you wish to save your new kernel configuration?". Select yes and press enter (very important!).


You've now got your configuration, hooray! Now to compile the kernel, which should take a while so let it sit there for 30 minutes or so:

Code:
make
make modules
make modules_install
make install
Now you have to update lilo (by default OVH servers use lilo, if you use grub you probably just have to run "update-grub"). You can do this by editing /etc/lilo.conf:

Code:
nano /etc/lilo.conf
Replace the line starting with "image="/boot/old-kernel" with "image=/boot/vmlinuz-2.6.32.10" (you're specifying the path to the kernel you've just created). Now run this command:

Code:
lilo
It should output something similar to:

Code:
[loquat /usr/src/linux-2.6.32.10] lilo
Added Linux *
If not, diagnose the error and sort it out (you must probably aren't specifying the correct kernel).

You will now need to reboot into your new kernel:

Code:
shutdown -r now
Once up you can double check you're using the new kernel like so:

Code:
[celadon ~] uname -a
Linux celadon.feralhosting.com 2.6.32.10 #1 SMP Wed Mar 17 13:25:07 UTC 2010 x86_64 GNU/Linux
Now we get to the good part, replacing that wonky module! You can find the original download site on Realtek's website but unfortunately it's not easy to get a direct download link so I've put it on my FTP. You can find their website here: http://www.realtek.com.tw/downloads/...etDown=false#2

The following will untar and run their installer (it's a small bash script):

Code:
wget ftp://scarlet.feralhosting.com/kernel/r8168-8.017.00.tar.bz2
tar -xvjf r8168-8.017.00.tar.bz2
cd r8168-8.017.00/
./autorun.sh
You will get errors with this, if you followed everything by the book. The script attempts to unload the module but you didn't add support for this. This is a good thing as you're currently using the network card! (Unless of course you live in the datacentre...)

Code:
[saffron ~/r8168-8.017.00] ./autorun.sh

Check old driver and unload it.
rmmod r8169
FATAL: Kernel does not have unload support.
Build the module and install
/root/r8168-8.017.00/src/r8168_n.c: In function 'rtl8168_close':
/root/r8168-8.017.00/src/r8168_n.c:8732: warning: unused variable 'ioaddr'
Backup r8169.ko
rename r8169.ko to r8169.bak
Depending module. Please wait.
load module r8168
Completed.
You should then be able to see the module "r8168" loaded when running "lsmod" but wait, "lspci -v" is still listing running the faulty "r8169"! Simply reboot to finish it off:

Code:
shutdown -r now
Once the server has restarted (it can take a good few nerve racking minutes) you should see the last output of "lspci -v" now read:

Code:
    Kernel driver in use: r8168
    Kernel modules: r8168
...and voila! You now have a decent Realtek driver running.

I hope this serves someone else well. I'm not even sure how much OVH support knew about it (thank you Max for pushing me in the right direction).