OVH Community, your new community space.

High server load


Kode
15-11-2012, 20:36
I've redirected all the API users to the old server and the loads are much better on the main server, and barely registering on the API server.

The main server is idling at 1-2 load which I think is entirely down to serving static content, so the current thought is instead of getting http://www.hetzner.de/en/hosting/pro...rootserver/ex4 for the API server get a KS 2G dedicated to serving images, and either a KS 2G for the API server or if it's not powerful enough a KS 8G, I think the plan is going to be the current API server expires in Dec 27th but we wont be doing our yearly payment for the servers until mid Jan, so get a KS 2G mid december and see how it holds up, if it copes stick with it and buy an extra for the images, if it doesnt, stick the images on it and get something like the KS 8G for the API.

And if we run out of the 5TB bandwidth we can always serve the images from the main server (SP 32G)

It depends how much we raise by mid Jan really, the target of 1440 would allow for the SP 32G + KS 2G + KS 8G, otherwsie we'll have to compromise.

What are peoples thoughts, would that work?

Kode
15-11-2012, 17:00
the old one was probably only faster because it had nothing else bogging it down though, I just want to know if the SSD would have any bearing on this. The new one has plesk, and therefore mpm prefork

Myatu
15-11-2012, 16:59
Have you used mysqltuner, to see if there may be some areas of improvement?

Code:
wget mysqltuner.pl -O mysqltuner.pl && chmod +x mysqltuner.pl
./mysqltuner.pl

Andy
15-11-2012, 16:56
So the old one had the SSD, the new doesn't, and the old one was faster? If so then I'd say your SSD probably was helping a lot. Depending on the DB size you might consider holding it entirely in memory as that would give you virtually unlimited speed compared to an SSD or regular disk.

I don't use Linux so I don't know what tools are available to monitor disk load, but you should monitor this, memory and cpu. Graph it if you can and monitor it over time. There's plenty of tools out there for that. Graphs help a hell of a lot in diagnosis of issues like this.

E.G. http://www.cacti.net/

Kode
15-11-2012, 16:50
My old server doesn't expire until dec sotoday I reinstalled it and put a base debian install on it, I installed MySQL, PHP5, and Nginx, set up mysql replication on the main server, took a bit of messing about but got the API working on it.

Then from my main server I ran ab -k -n 10000 -c 50 http://176.31.123.148/webservice/new...7edaa78675ac2/ which took 38.504 seconds and load only got up to 0.5, I then changes the ip to the main site and ran it.

Now the main site was idling around 2-4 load anyway and has a lot more to do also serving the real API and images, but the load went up to about 125 at it's height and took 20.8 minutes to complete.

Would the SSD in the old server have contributed to the speedy results or would it have been purely running in memory?

Kode
14-11-2012, 14:10
Are there any MySQL gurus that can help optimise some queries?

I turn slow query logging on and almost all of the results come from just a few queries.

The seemingly most common, whenever anyone downloads a file through the API they are actually hitting a php file which increments the download count with a query such as:

UPDATE movie_images SET image_downloads = image_downloads+1 WHERE image_id = '12809'

image_id is a primary key and the table is using innodb

The second is more complex and is actually the results of an API request, the following is an example of a request to the Movie API

SELECT *, IFNULL(SUM(likes.vote), 0) AS likecount FROM movie_images img LEFT JOIN movie_items item ON item.movie_tmdb_id = img.image_movie_tmdb_id LEFT JOIN fanart_types ON type_id = image_type LEFT JOIN fanart_image_like likes ON likes.like_image_id = img.image_id and likes.type = '3' WHERE (img.image_movie_tmdb_id = 10824 OR item.movie_imdb_id = 10824) AND img.image_active = 'y' GROUP BY img.image_id ORDER BY likecount DESC;

The results of the query are memcached for 24 hours, but I'm not sure how useful that is.

The third most common are ones such as:

SELECT m.artist_name, m.artist_mbid, COUNT( image_id ) AS images, (SELECT COUNT(image_id) FROM music_images c LEFT JOIN music_artists ON c.image_mbid = artist_mbid WHERE m.artist_mbid = artist_mbid AND image_active = 'y') AS total_images FROM music_images LEFT JOIN music_artists m ON image_mbid = m.artist_mbid WHERE image_active = 'y' AND m.artist_mbid IS NOT NULL GROUP BY image_mbid ORDER BY image_url DESC;

This generates a list of artists that have images on the first run it would be used as above, but on subsequent runs it should include a date, so that it only checks for artists that have been updated since that date so the above query would become

SELECT m.artist_name, m.artist_mbid, COUNT( image_id ) AS images, (SELECT COUNT(image_id) FROM music_images c LEFT JOIN music_artists ON c.image_mbid = artist_mbid WHERE m.artist_mbid = artist_mbid AND image_active = 'y') AS total_images FROM music_images LEFT JOIN music_artists m ON image_mbid = m.artist_mbid WHERE image_active = 'y' AND m.artist_mbid IS NOT NULL AND image_approve_date >= '2012-09-13 00:00:00' GROUP BY image_mbid ORDER BY image_url DESC

elcct
14-11-2012, 01:34
Then just throw another mKS in. They are cheap.

Kode
13-11-2012, 16:58
I doubt kimsufi 5TB limit would last long at the rate we are growing, 4 months ago we did 600GB a month, last month we did 2TB, but we are toying with multiple servers to do different things.

elcct
13-11-2012, 15:58
You could consider getting SSD version of server and move static content to Kimsufi range if space is an issue?

Andy
13-11-2012, 12:18
It's all about optimisation Good luck.

With the rate limiting, you'd need to keep track of how many requests people have made in X amount of time and when they made their last one. So for example, you can limit requests per API user to 100/hr. Once they use up all 100, they get an error which doesn't pull anything from the database except their API usage (thus saves resources). Twitter employ a similar tactic and others limit you to one request every 5 seconds as well as the hourly limit to avoid hammering.

Also check your code to make sure you're not using "SELECT *" where it's not needed. Only pull the rows you're using.

Kode
12-11-2012, 12:59
Andy I've looked at rate limiting on and off a fair few times, not 100% sure how I would properly go about implementing it.

*edit*

Also some of the issues may be reduced when I get around to putting in a newart method so apps can see which artists/movies/etc have been updated since the last check and match that against ones in their database, rather than checking against every single one each time (so currently if someone has 5,000 movies, every time they want to check for new images (most users check every day) it has to look up all 5,000 entries, whereas with the new method they would check for movies with updates since they checked yesterday, then if they have any of those movies just update those ones)

Kode
12-11-2012, 12:58
I'm not 100% what I did, but I've managed to get the loads down, I think it may have been an issue w3 total cache minimising the js files causing the scripts to go haywire, I also installed APC and set w3 total cache to use that, as well as increasing buffer sizes for innodb and anything else I could think of, first sub 8 load I've seen since setting it all up, lol (currently 1.18)

Andy
12-11-2012, 12:35
As I said in your last topic, perhaps you should put in rate limiting and make your users cache their own requests. Lots of API's force you to do this for this exact reason - server load.

Kode
12-11-2012, 12:12
The SSD option on SP is too small, we have the older SP SSD and don't have a lot of room left on it hence having to move to SATA drives, The SSD Max model is out of the price range we can afford from donations.

It's probably not what would be considered huge number of requests, probably between 1500 - 3500 hits the API at any given time depending on the time of day, but that's based on google analytics real time reporting of "At the moment" (which only tracks the API) currently it says "1168 active visitors on site" earlier it was about 3100.

marks
12-11-2012, 11:59
for applications that require lots of hard disk I/O, we have the SSD and SSD Max models. We're also looking into bigger SSD drives, but so far, we don't have more information about that.
Another question, most of the traffic to the site goes to the API,
1) if they were hitting a url and that url didn't exist would that cause load?
2) if the url they hit had a htaccess redirect to another server would that cause load?
3) if they hit a url and that url was resulting in an internal server error (500) would that cause load?
Generally speaking, it doesn't seem that these things could affect the performance a lot, unless they are huge number of requests. It's just generally, seems that your server access the drive a lot. There is plenty of room there to optimise your server

Mark1978
12-11-2012, 10:40
Another reason why OVH removing the SAS15k option was a very bad move

Kode
12-11-2012, 09:22
I have the http://www.ovh.co.uk/dedicated_servers/sp_32g.xml so it has 2 x 2TB SATA3 drives, I think you are probably right about that being the issue, we had SSD before.

Another question, most of the traffic to the site goes to the API,
1) if they were hitting a url and that url didn't exist would that cause load?
2) if the url they hit had a htaccess redirect to another server would that cause load?
3) if they hit a url and that url was resulting in an internal server error (500) would that cause load?

The reason I'm asking is because we are thinking about getting a server just for the API so the website sits separate from it, but I had turned the redirect off on the API so 2000 people would have been hitting the server and getting a 404, but load was still at a constant 6-9 which it shouldn't be just to visitors to the site. I turned the redirect back on yesterday, and was still seeing similar load, but when I checked yoday the API users would have been getting 500 as theres an issue with the memcache function I'm using (have commented out the memcache stuff for now), subsequently the load is currently around 25.

DigitalDaz
12-11-2012, 00:48
What hard drives you got in that thing, it may well turn out to be the weak link.

In fact in all these new 64GB servers, the massive weak link there is the SATA hard drives, they just aint gonna keep up on any sort of virtualisation platform.

Myatu
11-11-2012, 18:59
Your "wa" (Waiting for I/O) is high, so you're encountering a hardware bottleneck (not CPU - but what the CPU communicates with, ie., HDD, RAM, NIC, etc). Probably the cause for the zombie process as well.

P.S.: To clarify, high I/O wait times can be software induced, such as a bad driver.

Kode
11-11-2012, 17:19
Ok so I moved my site onto the new server, but it hasn't made much difference with the load (not that I can see the site yet, it still pings to the old server for me, but other people ping to the new one), it's a bit weird though.

top - 17:58:39 up 14:42, 2 users, load average: 8.03, 9.23, 8.54
Tasks: 231 total, 3 running, 227 sleeping, 0 stopped, 1 zombie
Cpu(s): 8.5%us, 0.9%sy, 0.0%ni, 57.8%id, 32.7%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 32843776k total, 32060876k used, 782900k free, 659008k buffers
Swap: 1051832k total, 2360k used, 1049472k free, 24923748k cached


load is high, but the Cpus load isn't particulary high, and the plesk server monitor says the load is only 20%

Does anybody have any ideas what I could look at to hind the problem? the error.log doesn't really show much.