technical
Unscheduled downtime for our own site.
Had a brief problem this morning with one of our internal Xen VPS boxes, this one resulting in our own website being unavailable for approximately 20 minutes. Ultimately it was determined that the Xen DomU that runs our site was unresponsive (and was not coming back online properly after shutting down that single DomU), and we had to reboot the entire VPS machine after applying some Xen updates to be safe.
It's getting there
"It's getting there."
"No idea when it'll be done."
That's been me for the last two weeks, and in reality I'm not sure that I'm much further along then I was 14 days ago. I'm supposed to be in charge of the "Agents" for the new monitoring system. Little programs (not php code) that will run on a couple of servers around the internet and keep tabs on all of our services/equipment, and report their findings back to the main system and trigger alerts when something breaks.
The humble beginnings of a Monitoring System
Late yesterday afternoon the website here was updated from our subversion repository, and included in that update was the start of our new Network Status/Monitoring system. It is nowhere near complete, but it is beginning to take shape enough to start tinkering with it on the live site and collecting data.
What we have so far is the beginnings of the "master" for the monitoring system, integrated into our Drupal setup here. What is done so far in the code includes:
The Coming Death of PHP4
As we've previously announced and discussed, we will be dropping support for PHP4 entirely from our servers as it has been discontinued by the PHP Team and no further fixes/upgrades will be coming for it. We currently we estimate this process will be completed across all server before the end of October, barring any severe security exposures in the current PHP4 release before then that causes us to accelerate it's removal.
Aftermath of a Bad Chip, Part 1
As I write this blog entry, I'm currently sitting here waiting for Srv3 to come back online after having it’s RAM replacement as part of this afternoon's emergency maintenance.
Unscheduled Maintenance Window for Srv3
We have a previously unscheduled (well, unscheduled prior to today) maintenance window for Srv3 scheduled for Thursday, September 4th at 4:00pm - 6:00pm CDT.
We anticipate that actual downtime for the machine should be less then 30 minutes, but the exact start time will be dependent upon the technicians in the Data Center.
This window is to replace the RAM modules in srv3, we hope to rectify the recent problems we have been experiencing with the server.
Migration of ns3.purenrg.com
Tomorrow we will begin the process of replacing ns3.purenrg.com, one of the namservers used for client domains. The new ns3 has been online for the last week, quietly staying sync'd with ns4.purenrg.com and servicing those requests we have sent it's way while testing to insure everything was up to speed.
Tomorrow we will update the nameserver record for ns3.purenrg.com, to point it to the IP address of the new machine ( 208.43.104.254 ), and over the weekend all dns traffic for client domains should drift away from the old server and over to the new one.
