The outage of 2013-01-16

Discuss recent changes, make suggestions, etc.
Post Reply
User avatar
crfriend
Master Barista
Posts: 14431
Joined: Fri Nov 19, 2004 9:52 pm
Location: New England (U.S.)
Contact:

The outage of 2013-01-16

Post by crfriend »

The "regulars" here must have certainly noticed that SkirtCafe was down for almost the entire day of 2013-01-16. This was due to a catastrophic hardware failure to the database server and a subsequent comedy of errors and dead parts with the replacement iron.

Here's what the hosting company posted about it, verbatim. All times quoted are Pacific Standard Time, which is UTC - 8 hours.
Hello,

Update - 10:00 PM -- Data migration is still completing. We hope to be able to call this finished very soon!

Update - 6:54 PM -- Data migration to the new hardware is nearly complete. Most customers should be seeing their sites loading, but a few may still not be seeing their content yet. I am very sorry for the delay, and thank you for your continued patience!

Update - 12:45pm PST -- The replacement hardware had some bad disks which needed to be replaced. Now that the drives are all in good health, we are rebuilding the RAID array to finalize the preparation of the server. Once that is complete, we will be able to migrate data back to the new server and bring everything back online. Thanks for your patience!

Update - 10:00am PST -- The replacement server is almost done being set up and once it is ready we will be migrating everything to the new hardware. This process is taking longer than expected, but should be underway shortly. Please check back here for more updates!

Update! We are still in the process of replacing this hardware and getting everything back online. We are actively moving data and taking a different approach so that we can get this server back online as quickly as possible. Please check back here for more updates throughout the morning!

As a reminder...

You are receiving this message because your web hosting services are being affected by hardware maintenance due to issues with your shared MySQL server perch.

Due to issues with the RAID controller, we are replacing the hardware for this server and will be restoring all data from our backup server to the new hardware. Unfortunately, this will cause your databases to be offline until they are restored. We sincerely apologize for the inconvenience and will work to bring them back online as quickly as possible.

Once the 'all clear' is given by us, if you're still having trouble loading your sites or access your server, please let us know the specifics, so we can take a look for you. You may contact us either by replying to this message from your inbox, or by submitting a support ticket in the DreamHost panel (https://panel.dreamhost.com/), under Support/Contact Support.

This notice will also appear on that page in your DreamHost Panel and will be updated throughout the process with the latest details we can provide.

This incident is only related to the shared MySQL server perch, and no other servers or services (such as web or email) are affected.

Thank you for your patience!
The Happy DreamHost Server Fixing Team
The Outage seems to have begun at about 22:56 on 2013-01-15 UTC and ended at around 02:00 on 2013-01-17 UTC.

Being a computing professional, and having worked on hardware in a field-engineering gig, I feel for the guys in the trenches and machine-rooms; it's no fun when you're dealing with a dead system and likely have everybody from your immediate boss to most of the way up the food chain harassing you for current status (and, no, they do not like the very accurate answer of, "It'll be done when it's done.). So this one, at around 24 hours must've been excruciating.
Retrocomputing -- It's not just a job, it's an adventure!
skirted_in_SF
Member Extraordinaire
Posts: 1081
Joined: Tue Feb 16, 2010 1:56 am
Location: San Francisco, CA USA

Re: The outage of 2013-01-16

Post by skirted_in_SF »

I was on about this time (8:00PM PST) yesterday (1/16/13) and I thought something was strange since the board seemed to have forgotten what I had read the day before. Now I understand. :)
Stuart Gallion
No reason to hide my full name 8)
Back in my skirts in San Francisco
User avatar
skirtingtoday
Member Extraordinaire
Posts: 1518
Joined: Wed Nov 30, 2011 1:28 pm
Location: Edinburgh, Scotland

Re: The outage of 2013-01-16

Post by skirtingtoday »

And there was me thinking it was your problems wrestling with Windows 8... ;)

Good to have the site back up and running again!
"A lie gets halfway around the world before the truth has a chance to get its pants on" - Winston Churchill.
"If you tell a lie big enough and keep repeating it, people will eventually come to believe it" - Joseph Goebbels
User avatar
crfriend
Master Barista
Posts: 14431
Joined: Fri Nov 19, 2004 9:52 pm
Location: New England (U.S.)
Contact:

Re: The outage of 2013-01-16

Post by crfriend »

skirtingtoday wrote:And there was me thinking it was your problems wrestling with Windows 8... ;)
That angst has been passed to my wife. I got the base infrastructure working and the files from her old laptop (carefully scanned for viruses, &c.) on her new one and she seems happy.

Once everything has calmed down, I'll see if I can get the old laptop capable of running Linux in a more or less stable way and turn it into a general-purpose compute-server as it has several times more horsepower as all the classic gear I have combined.
Retrocomputing -- It's not just a job, it's an adventure!
User avatar
crfriend
Master Barista
Posts: 14431
Joined: Fri Nov 19, 2004 9:52 pm
Location: New England (U.S.)
Contact:

Re: The outage of 2013-01-16

Post by crfriend »

Just to let everybody know that it didn't go un-noticed, there was another short outage earlier today (2013-01-18) where the database engine was unreachable. If memory serves (and it may be serving liver-and-onions) this one lasted somewhere between a half hour and 45 minutes. There is another migration in progress to another set of hardware that will, hopefully, prove up to the task at hand.
Retrocomputing -- It's not just a job, it's an adventure!
Brad
Member Extraordinaire
Posts: 246
Joined: Wed Oct 17, 2012 11:54 pm
Location: Rockland County, New York, USA

Re: The outage of 2013-01-16

Post by Brad »

Now is a good time to show my appreciation to Carl and the others who make this board possible. When I type in Skirt Cafe, I magically expect it to appear on the screen and I was lost without it. But the infrastructure, both human and electronic, is so invisible to us that it appears not to exist.
Sarongman
Member Extraordinaire
Posts: 1049
Joined: Sat Jul 28, 2007 6:59 am
Location: Australia

Re: The outage of 2013-01-16

Post by Sarongman »

Brad wrote:Now is a good time to show my appreciation to Carl and the others who make this board possible
I second that motion--- all those in favour say aye---motion carried :thumleft: :thumright: :thumleft: :thumright:
It will not always be summer: build barns---Hesiod
User avatar
crfriend
Master Barista
Posts: 14431
Joined: Fri Nov 19, 2004 9:52 pm
Location: New England (U.S.)
Contact:

Re: The outage of 2013-01-16

Post by crfriend »

Brad wrote:[... T]he infrastructure, both human and electronic, is so invisible to us that it appears not to exist.
Those are words of the very highest praise, Brad, and I, and on behalf of my team, thank you for them.

The sad thing about being very proficient in computing -- and especially infrastructure-computing -- is that success by its very nature means being invisible for when something goes wrong everybody notices. It's like when the lights go out at home and it's actually the utility that's at fault not a blown bulb or a popped fuse. This is why I refuse to slag off on the guys (and likely gals, too) "in the trenches" who deal with the hardware.

Organisationally, the hosting company takes care of the hardware and the OS side of things, I deal with software upgrades and technical tweaks to keep junk to a minimum, and the moderation team (and myself) deal with the "human interaction" layer. At home and at work, I deal with all of those, so I really do feel the pain of others.

As the saying goes, "This too shall pass"; let's just hope it passes quickly, not unlike a bad case of gas.

The main players in this are Bob (who still, very generously, foots the bill), Milfmog, Uncle Al, and myself. However, let's not forget the real focus of this -- the community! Without you -- all of you -- this place would wither and die, and I, for one, think that would be sad.
Retrocomputing -- It's not just a job, it's an adventure!
User avatar
crfriend
Master Barista
Posts: 14431
Joined: Fri Nov 19, 2004 9:52 pm
Location: New England (U.S.)
Contact:

Re: The outage of 2013-01-16

Post by crfriend »

Here's the latest news, as of about 06:00 UTC:
The migration to the new server has started and should be complete by tomorrow evening. All databases should be accessible at this time and will we update via email when the transition to the new machine completes.

We will update this post again tomorrow morning unless there is a change in status which necessitates additional notification.
We're online as of this writing (no kidding) so hopefully things will go better this time than last.
Retrocomputing -- It's not just a job, it's an adventure!
ChrisM
Member Extraordinaire
Posts: 468
Joined: Thu Mar 18, 2004 12:49 am
Location: Vancouver, British Columbia, Canada

Re: The outage of 2013-01-16

Post by ChrisM »

Hear Hear - three cheers of appreciation to Carl!

...On invisibility: Yes Carl, I do front of house sound (mixer board guy) for live performances, and sound is exactly like that: If it sounds great, the band gets compliments. If it sounds bad, the sound guy gets complaints. Our job is to be invisible, like a window, and a window only gets noticed when it's dirty.

Ah well, c'est la vie.

Chris
User avatar
skirtyscot
Member Extraordinaire
Posts: 3448
Joined: Thu Aug 04, 2011 10:44 pm
Location: West Kilbride, Ayrshire, Scotland
Contact:

Re: The outage of 2013-01-16

Post by skirtyscot »

crfriend wrote:The main players in this are Bob (who still, very generously, foots the bill) ...
How much does it cost to run the site? Purely in cash outlay terms, I mean, and valuing your labour at $0 per hour (sorry!)

If it is a hefty sum, have you considered making it possible for members to donate towards the running costs?
Keep on skirting,

Alastair
User avatar
crfriend
Master Barista
Posts: 14431
Joined: Fri Nov 19, 2004 9:52 pm
Location: New England (U.S.)
Contact:

Re: The outage of 2013-01-16

Post by crfriend »

skirtyscot wrote:How much does it cost to run the site?
The last time I spoke with Bob, I offered to kick a few quid at the issue and he declined the offer. I suspect it's something below his "noise level" so he doesn't worry about it.

On the matter of my time being worth $0 per hour, I'll note that I donate my time and draw my compensation in seeing the forum continue to run smoothly and membership gradually grow and folks get confident enough to challenge societal norms and swap trousers for skirts! As the Moderator of our local Town Meeting comments about his $1 salary per year, "You got the best Moderator you can buy for a dollar." (I think he does it for the same reasons that I do things here.)
Retrocomputing -- It's not just a job, it's an adventure!
Post Reply