Home Page
  • April 23, 2024, 11:48:27 pm *
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

Official site launch very soon, hurrah!



Post reply

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Note: this post will not display until it's been approved by a moderator.

Name:
Email:
Subject:
Message icon:

Attach:
(Clear Attachment)
(more attachments)
Restrictions: 10 per post, maximum total size 8192KB, maximum individual size 5120KB
Note that any files attached will not be displayed until approved by a moderator.
Verification:
Type the letters shown in the picture
Listen to the letters / Request another image

Type the letters shown in the picture:
Please stop spamming. Your spam posts are moderated and will never be displayed on the internet. What is eighty-eight minus eighty-six (spell out the answer):
Пожалуйста, прекратите спамить. Ваши спам-сообщения модерируются и никогда не будут отображаться в Интернете. What color is grass.:

shortcuts: hit alt+s to submit/post or alt+p to preview


Topic Summary

Posted by: Dakusan
« on: September 28, 2009, 05:31:02 am »


So I have been having major speed issues with one of our servers. After countless hours of diagnoses, I determined the bottle neck was always I/O (input/output, accessing the hard drive).  For example, when running an MD5 hash on a 600MB file load would jump up to 31 with 4 logical CPUs and it would take 5-10 minutes to complete. When performing the same test on the same machine on a second drive it finished within seconds.

Replacing the hard drive itself is a last resort for a live production server, and a friend suggested the drive controller could be the problem, so I confirmed that the drive controller for our server was not on-board (on its own card), and I attempted to convince the company hosting our server of the problem so they would replace the drive controller. I ran my own tests first with an iostat check while doing a read of the main hard drive (cat /etc/sda > /dev/null). This produced steadily worsening results the longer the test went on, and always much worse than our secondary drive. I passed these results on to the hosting company, and they replied that a “badblocks –vv” produced results that showed things looked fine.

So I was about to go run his test to confirm his findings, but decided to check parameters first, as I always like to do before running new Linux commands.  Thank Thor I did. The admin had meant to write “badblocks –v” (verbose) and typoed with a double key stroke. The two v’s looked like a w due to the font, and had I ran a “badblocks –w” (write-mode test), I would have wiped out the entire hard drive.

Anyways, the test outputted the same basic results as my iostat test with throughput results very quickly decreasing from a remotely acceptable level to almost nil.  Of course, the admin only took the best results of the test, ignoring the rest.

I had them swap out the drive controller anyways, and it hasn’t fixed things, so a hard drive replace will probably be needed soon.  This kind of problem would be trivial if I had access to the server and could just test the hardware myself, but that is a price to pay for proper security at a server farm.