Woke up this morning to find BigCloset down.

I didn’t have my load-monitoring script running on the front end because I wasn’t that worried about it. But I am now.

My htop session had died but was spiked over 21.5 when it did. For a general idea, a load of 1 == 1 Loaded CPU thread. We have a capacity of 6 threads so a load of 6 == Server is at capacity. A load OVER 6 == We’re over capacity, but if things slow down a bit, there is a chance to catch up.

As best I can tell at 3:11am Central time (4:11am eastern/my time) LogRotate ran and HUP’d all the services so that it could properly rotate the log files, and Apahce threw this error:

[Sun Feb 28 03:11:10 2016] [notice] seg fault or similar nasty error detected in the parent process
I think the CPU load was spiked way over 21 by that point (my htop session died about 12:53am when it showed 21.49 for the immediate average, and 14:50 for the 15 minute aggregated average load. It was climbing and I don’t have any confidence that it would have been able to drop far enough in 2 hours for this to not be the issue when logrotate ran.One front end is NOT going to be enough for BigCloset while using PHP5.6 and we aren’t going to be able to get PHP7/HHVM working for probably a couple weeks (possibly months) at best. And it won’t be an off the shelf solution. At best it would be me tracking down code errors, profiling them, and submitting patches to various Drupal code projects and hoping they get accepted to mainstream and us running custom-patched code till they do. We can either load a 2nd Front end, or load more CPU cores onto the one we have.

I will tweak and install my load monitoring script later. I was originally written for a situation just like this. Where we were throwing too much at a dieing server and wrote a script to watch for dead processes and high loads and kill things appropriately. We eventually fixed this by opening our Bridgewater,NJ pop.

-Piper

Quick Note, we are DEFINATELY spiking usage on the front end. I’ve seen load average spikes over 15.0 which means, 15 cores worth of usage basically. But it only has 3 real cores, and 3 fake cores (6 total threads). It’s been recovering mostly on it’s own, but I’m afraid with using PHP56 we WILL need the 2nd front end.

HHVM and PHP7 do a lot of redudant call optimizaions which is what makes it MUCH leaner process wise.

-Piper

I have officially downgraded the cloud cluster. It’s now longer using PHP7 or HHVM since neither was providing a stable environment for which we could properly operate Bigcloset/Topshelf.

Belle (front end server) is now operationally running PHP5.6

Belle, can be switched over to HHVM for limited testing as needed as we try to track down HHVM issues and fix them in code, but right now, the full/operational stack is as follows.

Belle (Front End)

  • Apache 2.2
  • Memcached
  • PHP5.6 (remi stable release)
Ariel (Back End/DB Server)
  • Percona 5.6 (release 76.1 Revision 5759e76)
Things seem to be operating within acceptable limits at the moment. I’ve had things switched over for maybe 10 minutes at this point. The site is still fast in places, a bit slower in others. It is DEFINATELY stressing the front end server more. Jumping the load average from under 0.5 consistently to over 2.0 consistently.-Piper

We are currently up and running using HHVM via Cloud in a “cobbled together” way. It is a less than ideal setup so I still need the VM re-imaged.

The method we have cobbled together right now seems to “fail” for HHVM every 8 hours or so in our initial tests, so I’ve written my own set of scripts to monitor the site and take automated action based on results. It’s kinda primitive but works 🙂

-Piper

Wed Feb 24 15:00:01 CST 2016
HTTP Status when killed was: 503
Wed Feb 24 15:37:08 CST 2016
HTTP Status when killed was: 503
Wed Feb 24 23:43:01 CST 2016
HTTP Status when killed was: 503
Wed Feb 24 23:45:01 CST 2016
HTTP Status when killed was: 503

TopShelf is closed while we upgrade the software. This shouldn’t take more than…oh, maybe most of a day? Two days? Frankly, we’re not sure. Check back here now and then for progress reports.

Hugs,

Erin, Piper and Cat

We are currently transferring backups of BigCloset to outside servers. Once that is complete we will order the old drive removed, and destroyed, and the new drive installed.

Once installed, it will take the data-center possibly some time to install the new OS on the drive, and then a bit of time after that, for me (Piper) to get things just right and as needed for BigCloset.

Please hang around, and I will post updates as often as I can.

-Piper, Joyce, Cat and BigCloset Staff.

Hi Everyone, while we wait on the data-center to do what they need to do with the primary BC server, and I wait to get started doing what I need to do, I have begun work on the VERY EARLY stages of a new project. As the title of this blog says, you may now upload Microsoft Word Open Document Format files (.docx NOT .doc).

This is VERY EARLY alpha stage development, which means it won’t always grab your lists, tables, headings, italics, etc, and Images.

While we do plan to support those features as time goes on, at the moment, we just wanted to get the basic stuff working.

It’s done the same as if you were uploading a TXT file, or a HTML file for our site. If you have not already done so, please register for an account. Once you’ve logged in, click on Account Information and choose Add Story. The form presented there will allow you to submit your story. Just look for section labeled “ ” and click the “Browse” button, and then find your document on your HDD.

-Piper