OVH Community, your new community space.

Udev CPU Usage 100%....!


Myatu
08-08-2009, 18:09
Glad it got solved. Sometimes the most mundane looking things can cause such a havock!

impstimp
08-08-2009, 11:12
ok, we figured out the problem was with a CGI toplist. We removed the call to the script from the main index page and udev seems to have gone back to normal.

Thanks for all the help we have received trying to track down the problem, Myatu especially, you pointed us in the right direction

impstimp
08-08-2009, 07:39
Hi,

Yes I've run strace -p, results are many...

--- SIGCHLD (Child exited) @ 0 (0) ---
write(6, "\0", 1) = 1
rt_sigreturn(0x6) = 140000635861295
select(8, [3 4 5 7], NULL, NULL, NULL) = 2 (in [4 5])
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000001}, msg_iov(1)=[{"remove@/kernel/uids/510\0ACTION=r"..., 2560}], msg_controllen=0, msg_flags=0}, 0) = 93
open("/dev/.udev/uevent_seqnum", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 9
write(9, "1590953\n", 8) = 8
close(9) = 0
unlink("/dev/.udev/failed/kernel@uids@510") = -1 ENOENT (No such file or directory)
rmdir("/dev/.udev/failed") = -1 ENOTEMPTY (Directory not empty)
stat("/dev/.udev/queue", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
symlink("/sys/kernel/uids/510", "/dev/.udev/queue/kernel@uids@510") = -1 EEXIST (File exists)
read(5, "\0", 256) = 1
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 23802
unlink("/dev/.udev/failed/kernel@uids@510") = -1 ENOENT (No such file or directory)
rmdir("/dev/.udev/failed") = -1 ENOTEMPTY (Directory not empty)
wait4(-1, 0x7fff780b014c, WNOHANG, NULL) = -1 ECHILD (No child processes)
open("/proc/stat", O_RDONLY) = 9
fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54700a0000
read(9, "cpu 9942374 450 2209292 1284193"..., 1024) = 1024
read(9, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 889
close(9) = 0
munmap(0x7f54700a0000, 4096) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGC HLD, child_tidptr=0x7f54700947a0) = 23813
select(8, [3 4 5 7], NULL, NULL, NULL) = 1 (in [4])
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000001}, msg_iov(1)=[{"add@/kernel/uids/510\0ACTION=add\0"..., 2560}], msg_controllen=0, msg_flags=0}, 0) = 87
open("/dev/.udev/uevent_seqnum", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 9
write(9, "1590954\n", 8) = 8
close(9) = 0
unlink("/dev/.udev/failed/kernel@uids@510") = -1 ENOENT (No such file or directory)
rmdir("/dev/.udev/failed") = -1 ENOTEMPTY (Directory not empty)
stat("/dev/.udev/queue", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
symlink("/sys/kernel/uids/510", "/dev/.udev/queue/kernel@uids@510") = -1 EEXIST (File exists)
open("/proc/stat", O_RDONLY) = 9
fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54700a0000
read(9, "cpu 9942386 450 2209297 1284195"..., 1024) = 1024
read(9, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 889
close(9) = 0
munmap(0x7f54700a0000, 4096) = 0
select(8, [3 4 5 7], NULL, NULL, NULL) = 1 (in [4])
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000001}, msg_iov(1)=[{"remove@/kernel/uids/510\0ACTION=r"..., 2560}], msg_controllen=0, msg_flags=0}, 0) = 93
open("/dev/.udev/uevent_seqnum", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 9
write(9, "1590955\n", 8) = 8
close(9) = 0
unlink("/dev/.udev/failed/kernel@uids@510") = -1 ENOENT (No such file or directory)
rmdir("/dev/.udev/failed") = -1 ENOTEMPTY (Directory not empty)
stat("/dev/.udev/queue", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
symlink("/sys/kernel/uids/510", "/dev/.udev/queue/kernel@uids@510") = -1 EEXIST (File exists)
open("/proc/stat", O_RDONLY) = 9
fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54700a0000
read(9, "cpu 9942397 450 2209300 1284197"..., 1024) = 1024
read(9, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 889
close(9) = 0
munmap(0x7f54700a0000, 4096) = 0
select(8, [3 4 5 7], NULL, NULL, NULL) = 1 (in [4])
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000001}, msg_iov(1)=[{"add@/kernel/uids/510\0ACTION=add\0"..., 2560}], msg_controllen=0, msg_flags=0}, 0) = 87
open("/dev/.udev/uevent_seqnum", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 9
write(9, "1590956\n", 8) = 8
close(9) = 0
unlink("/dev/.udev/failed/kernel@uids@510") = -1 ENOENT (No such file or directory)
rmdir("/dev/.udev/failed") = -1 ENOTEMPTY (Directory not empty)
stat("/dev/.udev/queue", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
symlink("/sys/kernel/uids/510", "/dev/.udev/queue/kernel@uids@510") = -1 EEXIST (File exists)
open("/proc/stat", O_RDONLY) = 9
fstat(9, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54700a0000
read(9, "cpu 9942408 450 2209300 1284199"..., 1024) = 1024
read(9, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 889
close(9) = 0
munmap(0x7f54700a0000, 4096) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---

derchris
08-08-2009, 00:42
Can you run a strace against udev?

impstimp
08-08-2009, 00:30
also, do I really need udev to be running? I'm not adding anymore hardware to the server... Or will it cause problems on reboot or the filesystem?

impstimp
07-08-2009, 22:27
yeah, pretty annoying.

I went into /tmp and ran:
find . -type f -size 0 | xargs rm

to remove all 0 byte files, they are session files being created. The problem is, the main webserver is pretty busy, we're getting around 100-150k uniques a day, it's a bit hard to just stop the webserver without "irritating" people

I have noticed the memory slowly increases with udev as well....

I've just annoyed a few people quickly and stopped and started httpd, udev cpu usage did drop down to 2% whilst httpd was off and now gone back upto 15% when httpd started back up again...

Myatu
07-08-2009, 22:04
Quote Originally Posted by impstimp
So it's not that /tmp is running out but maybe rather it's got too many 0 byte files ?
I saw it has 1K blocks, so the max amount isn't as high as with 4K blocks (which is a few million compared to a few thousand). I forgot the exact calculation... But what's causing these 0 byte files is more the question and in this case could very well be related to the CPU usage. Sounds like a runaway process (a connection looping on itself? - I've seen that before)... Do the filenames give a hint? If you progressively stop services (not system processess), does it stop? It's more an "process of elimination" atm, as I'm starting to run out of ideas myself

impstimp
07-08-2009, 21:43
yes it's running cpanel and after what you said I've done some digging about also. I'm not sure if this error is occuring because of being out of disk space or because of too many 0 byte files, as this error I believe suggests:

Aug 3 09:42:51 kernel: EXT3-fs warning (device loop0): ext3_dx_add_entry: Directory index full!


df /tmp replies this:

Filesystem 1K-blocks Used Available Use% Mounted on
/usr/tmpDSK 495844 31814 438430 7% /tmp


and df -h replies this:

Filesystem Size Used Avail Use% Mounted on
/dev/sda1 25G 3.1G 20G 14% /
/dev/sda2 655G 4.3G 617G 1% /home
/dev/shm 3.9G 0 3.9G 0% /dev/shm
/usr/tmpDSK 485M 32M 429M 7% /tmp


So it's not that /tmp is running out but maybe rather it's got too many 0 byte files ?

I tried cat /dev/null > tmpDSK after backing it up of course but it was just fille up immediately again....

I'm completely at a loss as to what to do fix this problem and udev just keeps sitting at 100% usage and increasing the load on the server unnecessarily

Myatu
07-08-2009, 20:14
Quote Originally Posted by impstimp
/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)
Hmm, not sure if that would be related then...

Are you running cPanel by any chance? Anyway, this is a so-called hardening of the "/tmp" directory and turns a file "/usr/tmpDSK" into a temporary storage. Check if you've run out of space with "df /tmp" or "df -h" to see all your disks.

impstimp
07-08-2009, 15:54
Thanks for your help with this, it's been painfully irritating all of this

mount -l produces:

/dev/sda1 on / type ext3 (rw,errors=remount-ro) [/]
/dev/proc on /proc type proc (rw)
/dev/sys on /sys type sysfs (rw)
/dev/devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda2 on /home type ext3 (rw,usrquota) [/home]
/dev/shm on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)

Myatu
07-08-2009, 15:25
Quote Originally Posted by impstimp
Aug 3 09:42:51 kernel: EXT3-fs warning (device loop0): ext3_dx_add_entry: Directory index full!
I'm having a feeling that udevd and this error are related. What's mounted on it? (mount -l)

Quote Originally Posted by impstimp
dmesg | tail gives me this:

TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1388369763:1388374119. Repaired.
...
TCP: Treason uncloaked! Peer 84.182.72.121:1422/80 shrinks window 3009774095:3009782807. Repaired.
Usually this is fairly benign, but if you continue to see a high amount of these messages repeating from particular IP srouces, I'd block the IP or write up some filters... (The "SYN flood" was a hint as to why).

impstimp
07-08-2009, 13:48
Hi,

Yeah I've checked /var/log/messages, this is the only thing I can see that repeats occassionally...

Aug 3 09:42:51 kernel: EXT3-fs warning (device loop0): ext3_dx_add_entry: Directory index full!

also on bootup:

Aug 7 14:38:04 smartd[5714]: smartd version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Aug 7 14:38:04 smartd[5714]: Home page is http://smartmontools.sourceforge.net/
Aug 7 14:38:04 smartd[5714]: Opened configuration file /etc/smartd.conf
Aug 7 14:38:04 smartd[5714]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Aug 7 14:38:04 smartd[5714]: Problem creating device name scan list
Aug 7 14:38:04 smartd[5714]: Device: /dev/sda, opened
Aug 7 14:38:04 smartd[5714]: Device /dev/sda, please try adding '-d 3ware,N'
Aug 7 14:38:04 smartd[5714]: Device /dev/sda, you may need to replace /dev/sda with /dev/twaN or /dev/tweN
Aug 7 14:38:04 smartd[5714]: Monitoring 0 ATA and 0 SCSI devices
Aug 7 14:38:04 smartd[5719]: smartd has fork()ed into background mode. New PID=5719.


dmesg | tail gives me this:

TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1388369763:1388374119. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1388520771:1388526579. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1389026067:1389033327. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1389026067:1389033327. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1389223539:1389229034. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1335/80 shrinks window 1389223539:1389229034. Repaired.
possible SYN flooding on port 80. Sending cookies.
TCP: Treason uncloaked! Peer 84.182.72.121:1422/80 shrinks window 3009774095:3009782807. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1422/80 shrinks window 3009774095:3009782807. Repaired.
TCP: Treason uncloaked! Peer 84.182.72.121:1422/80 shrinks window 3009774095:3009782807. Repaired.

This server is used just for the webserver only and is the latest EG MAX on OVH.

Inside /etc/udev/rules.d is:

-rw-r--r-- 1 root root 515 Apr 18 01:45 05-udev-early.rules
-rw-r--r-- 1 root root 920 Apr 18 01:47 40-multipath.rules
-rw-r--r-- 1 root root 15647 Apr 18 01:45 50-udev.rules
-rw-r--r-- 1 root root 471 Apr 18 01:45 51-hotplug.rules
-rw-r--r-- 1 root root 143 Nov 13 2008 60-net.rules
-rw-r--r-- 1 root root 452 Jan 21 2009 60-raw.rules
-rw-r--r-- 1 root root 61 Apr 18 01:45 90-dm.rules
-rw-r--r-- 1 root root 82 Jan 21 2009 90-hal.rules
-rw-r--r-- 1 root root 107 Apr 18 01:45 95-pam-console.rules


Thanks for your assistance.

Myatu
07-08-2009, 11:25
Did you check "/var/log/messages" or "dmesg | tail" to see what messages are generated?

Are you using raid - specifically, are you using SOFTraid? And what's in /etc/udev/rules.d?

Udev does a lot, so there's a lot that needs checking too

impstimp
07-08-2009, 09:51
Hi,

We have a problem with udevd process. It keeps going increasing slowly up to 100% cpu and the only way we can sort it out is by stopping udev and starting it back up again or rebooting the server... :/

Does anyone have any suggestions as to why this is occuring? Any help would really be appreciated

Please see below info on server, it is used to purely run PHP, mysql and other static content is being served from other servers:



top - 23:15:13 up 4 days, 6:10, 1 user, load average: 3.49, 4.46, 4.34
Tasks: 169 total, 2 running, 166 sleeping, 0 stopped, 1 zombie
Cpu(s): 34.6%us, 8.5%sy, 0.0%ni, 53.5%id, 0.0%wa, 0.2%hi, 3.1%si, 0.0%st
Mem: 8144108k total, 7255156k used, 888952k free, 292424k buffers
Swap: 10241428k total, 22076k used, 10219352k free, 1113820k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2447 root 16 -4 103m 91m 392 R 73.9 1.2 50:46.57 udevd
22754 nobody 20 0 1137m 169m 4816 S 18.3 2.1 0:07.88 httpd


uname -a
Linux 2.6.28.4-xxxx-std-ipv4-64 #2 SMP Wed Feb 18 16:34:21 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux


rpm -qa kernel\* | sort
kernel-devel-2.6.18-128.2.1.el5
kernel-devel-2.6.18-128.4.1.el5
kernel-headers-2.6.18-128.4.1.el5



cat /proc/meminfo
MemTotal: 8144108 kB
MemFree: 566104 kB
Buffers: 292476 kB
Cached: 1115160 kB
SwapCached: 1092 kB
Active: 6299112 kB
Inactive: 408352 kB
SwapTotal: 10241428 kB
SwapFree: 10219352 kB
Dirty: 5840 kB
Writeback: 0 kB
AnonPages: 5298596 kB
Mapped: 19772 kB
Slab: 789216 kB
SReclaimable: 570204 kB
SUnreclaim: 219012 kB
PageTables: 37036 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 14313480 kB
Committed_AS: 25257092 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 136652 kB
VmallocChunk: 34359601711 kB
DirectMap4k: 3072 kB
DirectMap2M: 8376320 kB