<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Linux System Admins Blog &#187; Down Time</title>
	<atom:link href="http://linuxsysadminblog.com/category/down-time/feed/" rel="self" type="application/rss+xml" />
	<link>http://linuxsysadminblog.com</link>
	<description>System admins of Promet - an e-commerce, high availability Open Source web shop - share their findings</description>
	<lastBuildDate>Wed, 25 Aug 2010 19:46:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>When Open Source kills</title>
		<link>http://linuxsysadminblog.com/2009/05/when-open-source-kills/</link>
		<comments>http://linuxsysadminblog.com/2009/05/when-open-source-kills/#comments</comments>
		<pubDate>Wed, 27 May 2009 16:39:25 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[Down Time]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[murder]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=702</guid>
		<description><![CDATA[RieserFS is a journalling filesystem that is excellent when dealing with small files under 4K in size. When used with tail-packing it is 10-15x faster then ext2/ext3. ReiserFS was first included in Linux kernel 2.4.1 and even used as default filesystem in SUSE Enterprise Linux and others. What many may not know is that Reiser [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/ReiserFS">RieserFS</a> is a journalling filesystem that is excellent when dealing with small files under 4K in size. When used with <a href="http://en.wikipedia.org/wiki/Tail_packing">tail-packing</a> it is 10-15x faster then ext2/ext3. ReiserFS was first included in Linux kernel 2.4.1 and even used  as default filesystem in SUSE Enterprise Linux and others. What many may not know is that <a href="http://en.wikipedia.org/wiki/Hans_Reiser">Reiser killed</a>, LITERALLY. The man behind this filesystem has been convicted of second degree murder for killing his wife. While this isn&#8217;t exactly breaking new it just goes to show you that extroverted geeks have it in them.</p>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2009/05/when-open-source-kills/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>us-to-invade-asia-over-google-traffic-snafoo</title>
		<link>http://linuxsysadminblog.com/2009/05/us-to-invade-asia-over-google-traffic-snafoo/</link>
		<comments>http://linuxsysadminblog.com/2009/05/us-to-invade-asia-over-google-traffic-snafoo/#comments</comments>
		<pubDate>Thu, 14 May 2009 21:15:35 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Down Time]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=641</guid>
		<description><![CDATA[Google slowdown causes blogger hysteria Official Google Blog: This is your pilot speaking. Now, about that holding pattern&#8230; Ok, so I checked out my RSS feeds and went over to google blog to see what the hoopla was about today after slowdown. Yes, I had pings from folks asking me if google was down for [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Google slowdown causes blogger hysteria </strong></p>
<p><a href="http://googleblog.blogspot.com/2009/05/this-is-your-pilot-speaking-now-about.html">Official Google Blog: This is your pilot speaking. Now, about that holding pattern&#8230;</a></p>
<p>Ok, so I checked out my RSS feeds and went over to google blog to see what the hoopla was about today after slowdown.  Yes, I had pings from folks asking me if google was down for them or just me, but I really find the backlinks interesting&#8230; from funny to pathetic.  </p>
<p>I think uptime still matters.</p>
<p>See for yoursleves (my emphasis): </p>
<p>  Google Outage Caused by Asian “Traffic Jam” | John Paczkowski &#8230;<br />
    If the <strong>Web has does have a single point of failure, you&#8217;d think it was Google</strong> given all the outcry over the the outages suffered by some of the company&#8217;s services Thursday. Something went wrong at the company this morning and whatever &#8230;<br />
    Posted by John Paczkowski at 12:23 </p>
<p>  Google Slow (or Down) for Some<br />
    Thursday, May 14, 2009. Google Slow (or Down) for Some. Some of us are having problems accessing google.com, YouTube, Gmail and others. [This post may update if there's further info.] Update: And it seems to be back up now (18:15 CET). &#8230;<br />
    Posted by Philipp Lenssen at 10:08 </p>
<p>  Google Slow, <strong>Twitterati Hysterical</strong><br />
    UPDATED: Google appears to be having problems across its Gmail, search and even its Blogger platforms, judging by complaints on &#8230;<br />
    Posted by Stacey Higginbotham at 09:48 </p>
<p>  It&#8217;s Down! <strong>The Day Google Stood Still</strong> (Updated) &#8211; ReadWriteWeb<br />
    We have seen our fair share of failures from web based products, but this morning, for a large number of users (at least in the US), it looks &#8230;<br />
    Posted by Frederic Lardinois at 09:32 </p>
<p>  Major Google Outages Today: #GoogleFail Or #AT&#038;T Fail?<br />
    A bunch of GoogleGoogle reviews services have been failing this morning, and we&#8217;ve been trying to figure out why. The hashtag #googlefail on TwitterTwitter.<br />
    Posted by Adam Ostrow at 09:25 </p>
<p>  Google Services Go Down For Many<br />
    Currently, many people who use Google&#8217;s services, including web search, Gmail, Google Reader and other products are either down or incredibly slow for some.<br />
    Posted by Barry Schwartz at 09:15 </p>
<p>  La panne de Google: une erreur d&#8217;aiguillage &#8211; Media &#038; Pub &#8211; E24.fr<br />
    Le moteur de recherche a connu de sérieux problèmes techniques entre 17h et 18h ce jeudi. Des milliers d&#8217;internautes ont témoigné sur Twitter des difficultés rencontrées sur Google.<br />
    Posted by at 09:08 </p>
<p>  <strong>Google Stumbles, Internet Breaks A Leg</strong><br />
    Recent Posts. Google Stumbles, Internet Breaks A Leg · Beer Monday: Redhook&#8217;s Slim Chance · Friday Gallimaufry: Migratory Birds · Moms On The Net · Beer Monday: New Glarus Brewing · The Pale Blue Dot · Devo Was Right About Everything &#8230;<br />
    Posted by forbes blogger at 08:35 </p>
<p>  Ajax Girl » Blog Archive » <strong>Google&#8217;s Outage Was Asia&#8217;s Fault</strong><br />
    Written by on May 14th, 2009 in Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site. Google finally has an explanation for its &#8230;<br />
    Posted by at 07:34 </p>
<p>  Tech Science | SearchBeat.com Shout-Out Blog<br />
    Top Technology, Computer, Internet and Science News Latest Science News from Around the Web Scientific.<br />
    Posted by keithco at 23:12 </p>
<p>  Tech Central &#8211; Times Online &#8211; WBLG: Problems with Google today &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2009/05/us-to-invade-asia-over-google-traffic-snafoo/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>dv camera + computer + vlan + dvgrab = cheap video surveillance</title>
		<link>http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/</link>
		<comments>http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/#comments</comments>
		<pubDate>Wed, 06 May 2009 18:21:04 +0000</pubDate>
		<dc:creator>max</dc:creator>
				<category><![CDATA[Down Time]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[monitoring]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=586</guid>
		<description><![CDATA[In the day and age of high definition many are upgrading their video recording gear to the latest harddrive or flash based hi def video cameras. Unlike auto dealerships consumer electronics retailers don&#8217;t offer trade in options for your old stuff.  In the green / renewable mindset we can put these no longer used video [...]]]></description>
			<content:encoded><![CDATA[<p>In the day and age of high definition many are upgrading their video recording gear to the latest harddrive or flash based hi def video cameras. Unlike auto dealerships consumer electronics retailers don&#8217;t offer trade in options for your old stuff.  In the green / renewable mindset we can put these no longer used video cameras to good use as video surveillance devices perfect for keeping an eye on your own or others property.</p>
<p>On the hardware side you need a <a title="DV Camera" href="http://linuxsysadminblog.com/?attachment_id=596" target="_blank">DV camera</a> with firewire port (IEEE 1394), firewire port equipped Pentium 4 or equivalent pc or laptop with loaded with Fedora 9 or 10 and a  firewire cable to connect camera to the computer. For software we will only need <a href="http://freshmeat.net/projects/dvgrab/">dvgrab</a> and <a href="http://www.videolan.org">VLC</a></p>
<p><span id="more-586"></span>Install dvgrab from Fedora update repo:<br />
<code>yum install dvgrab</code></p>
<p>Install vlc from rpmfusion repo:<br />
<code>sudo rpm -ivh http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-stable.noarch.rpm<br />
sudo yum install vlc</code></p>
<p>Set DV camera audio to 16bit (default is 12 bit) to avoid garbled audio.</p>
<p><a rel="attachment wp-att-597" href="http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/audio16bit/"><img class="alignnone size-full wp-image-597" title="audio16bit" src="http://linuxsysadminblog.com/wp-content/uploads/2009/05/audio16bit.jpg" alt="audio16bit" width="360" height="239" /></a></p>
<p>Turn on and connect video camera to computer and you should see something like this in /var/log/dmesg:<br />
<code>firewire_core: created device fw1: GUID 0800460102721e20, S100</code></p>
<p>To test that we are able to grab video/audio from camera and display in VLC player pipe output of dvgrab into vlc.</p>
<p><code>sudo dvgrab - -noavc -nostop | vlc - --no-sub-autodetect-file :demux=rawdv</code><br />
After issuing this command you should see a 720&#215;480 video feed with 16bit 48000Hz audio stream in vlc on your desktop</p>
<p><a rel="attachment wp-att-589" href="http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/vlcwindow1/"><img class="alignnone size-full wp-image-589" title="vlcwindow1" src="http://linuxsysadminblog.com/wp-content/uploads/2009/05/vlcwindow1.png" alt="vlcwindow1" width="385" height="330" /></a></p>
<p>Now we setup vlc as a streaming server so that we can view the video/audio when away. Streaming a 720&#215;480 video stream is a bit overkill as the video quality on the DV camera is pretty good when vlc streams video at lower resolutions like 320&#215;240, I also reduce the audio quality to save on bandwidth. Here I used &#8220;cvlc&#8221; or command vlc to avoid opening a vlc window and &#8220;&amp;&#8221; to put process into background.</p>
<p><code>sudo dvgrab - -noavc -nostop | cvlc - --no-sub-autodetect-file :demux=rawdv --sout '#transcode{vcodec=mp4v,vb=600,acodec=mp3,ab=56,scale=1,width=320,height=240,channels=2}:duplicate{dst=std{access=http,mux=ts,dst=192.168.1.102:3323}}' &amp; </code></p>
<p>To view the feed locally via vlc open a http network location on ip and port you specified in the <code>dst=</code> section of the command above.</p>
<p><a rel="attachment wp-att-590" href="http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/vlcopen/"><img class="alignnone size-full wp-image-590" title="vlcopen" src="http://linuxsysadminblog.com/wp-content/uploads/2009/05/vlcopen.png" alt="vlcopen" width="367" height="358" /></a></p>
<p>To view your feed from the Internet you will need to either configure vlc to stream on an outside interface or configure port forwarding.</p>
<p><a rel="attachment wp-att-591" href="http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/ddwrt/"><img class="alignnone size-full wp-image-591" title="ddwrt" src="http://linuxsysadminblog.com/wp-content/uploads/2009/05/ddwrt.png" alt="ddwrt" width="714" height="342" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2009/05/dv-camera-computer-vlan-dvgrab-cheap-video-surveillance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Cloud computing scenario&#8217;s for database servers</title>
		<link>http://linuxsysadminblog.com/2009/02/cloud-computing-scenarios-for-database-servers/</link>
		<comments>http://linuxsysadminblog.com/2009/02/cloud-computing-scenarios-for-database-servers/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 15:09:35 +0000</pubDate>
		<dc:creator>Pim</dc:creator>
				<category><![CDATA[Down Time]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Replication]]></category>
		<category><![CDATA[hosting]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[ec2]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=322</guid>
		<description><![CDATA[We&#8217;ve been investigating the possibilities of using cloud computing for our clients. Especially Amazon EC2 has the potential to be be really effective in offering flexible, pay-as-you-go computing. From my own perspective I have been looking at how to use cloud computing in combination with MySQL and I must say that I&#8217;m a bit sceptical [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been investigating the possibilities of using cloud computing for our clients. Especially Amazon EC2 has the potential to be be really effective in offering flexible, pay-as-you-go computing. From my own perspective I have been looking at how to use cloud computing in combination with MySQL and I must say that I&#8217;m a bit sceptical about the effectiveness of cloud computing in replacing the primary database server. First off there does not seem to be that much in the way of performance data for this type of installation. Can a cloud server really offer the I/O performance necessary to replace a dedicated database server? And even if the performance is equal, what is the main advantage? Scaling web sites is done by adding more servers in most cases but the same approach only works for database servers when clusters are used. So in what other scenario&#8217;s does cloud computing give us an edge?</p>
<p><span id="more-322"></span><strong>Temporary reporting servers<br />
</strong>Create a one time copy of an existing production database server to run specific heavy reports. This is ideal for monthly reports since the server only needs to be up and running for several hours per month.</p>
<p><strong>Backup database server<br />
</strong>This is a backup solution where the server is only allocated once there is a problem with the primary server which makes a lot of sense because the client only pays for the server once it is used. One downside to this scenario is that the server has to created and loaded with the latest backup which will result in a decent amount of downtime but at least all of this can be automated. A bigger problem is the loss of data since the latest backup.For our high availability sites we have a standby database server replicating all changes from the master so we can switch over at a moment&#8217;s notice without losing any data.</p>
<p><strong>Migrations<br />
</strong>Performing a migration or a system upgrade usually brings some downtime. Promoting a standby system to primary creates a single point of failure so it makes sense to create a remporary standby of the standby.</p>
<p><strong>Development branches and testing environments<br />
</strong>For development branches we usually only need an extra database for a short amount of time although truth be told, those database are not very large in general so we tend to put them on the same development database server anyway. The same is true for testing and QA. These activities usually occur in cycles which means that they are very attractive targets for cloud based servers.</p>
<p><strong>Alternative data center<br />
</strong>Yes, it happened to us once that our datacenter went off line due to a very heavy attack. Instead of finding another data center for these eventualities it could be useful to have cloud based backup servers defined. However, this requires the extra effort of keeping these instances up to date for this eventuality. Additionally, DNS caching will stop the switch from being instantaneous. A geographical load balancing solution would be the answer to that but at that point the cost for preparing for this eventuality will have to be compared to the loss due to down time.</p>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2009/02/cloud-computing-scenarios-for-database-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why is there a system change freeze &#8211; especially on black monday and black friday?</title>
		<link>http://linuxsysadminblog.com/2008/12/why-is-there-a-system-change-freeze-especially-on-black-monday-and-black-friday/</link>
		<comments>http://linuxsysadminblog.com/2008/12/why-is-there-a-system-change-freeze-especially-on-black-monday-and-black-friday/#comments</comments>
		<pubDate>Wed, 03 Dec 2008 23:53:34 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Down Time]]></category>
		<category><![CDATA[hosting]]></category>
		<category><![CDATA[microsoft]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=149</guid>
		<description><![CDATA[When I started working in systems, one of my first client was a major bank.  Yes, this was back in the mainframe batch processing days.  They never did any system updates when they ran the month end, quarter end and especially year end. I always thought that they just weren&#8217;t confident in their system folks [...]]]></description>
			<content:encoded><![CDATA[<p>When I started working in systems, one of my first client was a major bank.  Yes, this was back in the mainframe batch processing days.  They never did any system updates when they ran the month end, quarter end and especially year end.</p>
<p>I always thought that they just weren&#8217;t confident in their system folks and scoffed at this policy as it always made our deadlines shorter.</p>
<p>I think this story convinced me that doing production work these days on the bussiest web days is not a good idea.  Maybe microsoft should have borrowed a page from the mainframe policy manual &#8211; don&#8217;t do system updates on black monday or black friday as it may cause system outage.</p>
<p>This story: <a title="http://www.efluxmedia.com/news_Microsoft_Says_Sorry_For_Black_Friday_Cashback_Outage_30408.html" href="http://www.efluxmedia.com/news_Microsoft_Says_Sorry_For_Black_Friday_Cashback_Outage_30408.html" target="_blank">Microsoft Says Sorry For Black Friday Cashback Outage</a></p>
<blockquote><p><span id="intelliTXT">For Internet users, <a class="iAs" style="border-bottom: 0.075em solid darkgreen ! important; font-weight: normal ! important; font-size: 100% ! important; text-decoration: underline ! important; padding-bottom: 1px ! important; color: darkgreen ! important; background-color: transparent ! important;" href="http://www.efluxmedia.com/news_Microsoft_Says_Sorry_For_Black_Friday_Cashback_Outage_30408.html#" target="_blank">Black Friday</a> was supposed to be about buying and cashing back, but <a class="iAs" style="border-bottom: 0.075em solid darkgreen ! important; font-weight: normal ! important; font-size: 100% ! important; text-decoration: underline ! important; padding-bottom: 1px ! important; color: darkgreen ! important; background-color: transparent ! important;" href="http://www.efluxmedia.com/news_Microsoft_Says_Sorry_For_Black_Friday_Cashback_Outage_30408.html#" target="_blank">Microsoft’s</a> Live Search cashback machine apparently broke down just as customers “barged in” to make some early morning purchases.</span></p>
<p><span id="intelliTXT">According to a blog posting, the unexpected outage occurred due to a significant spike in traffic, which caused the system to go down for several hours. It took quite a while for it to come back to life, but apparently that was related to investigating the issue and rebuilding and deploying the <a class="iAs" style="border-bottom: 0.075em solid darkgreen ! important; font-weight: normal ! important; font-size: 100% ! important; text-decoration: underline ! important; padding-bottom: 1px ! important; color: darkgreen ! important; background-color: transparent ! important;" href="http://www.efluxmedia.com/news_Microsoft_Says_Sorry_For_Black_Friday_Cashback_Outage_30408.html#" target="_blank">databases</a> and indexes that support Microsoft Live Search Cashback.</span></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2008/12/why-is-there-a-system-change-freeze-especially-on-black-monday-and-black-friday/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Server and backup woes</title>
		<link>http://linuxsysadminblog.com/2008/11/server-and-backup-woes/</link>
		<comments>http://linuxsysadminblog.com/2008/11/server-and-backup-woes/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 13:31:58 +0000</pubDate>
		<dc:creator>Pim</dc:creator>
				<category><![CDATA[Down Time]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[cpanel]]></category>
		<category><![CDATA[hosting]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[reboot]]></category>
		<category><![CDATA[repository]]></category>
		<category><![CDATA[rm]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[undelete]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=101</guid>
		<description><![CDATA[Looking back it seems like most posts on this blog are helpful tips and not reports of problems we encountered. Not that we don&#8217;t have any problems but we mostly report our solutions instead of the actual problems. Of course now and again a problem comes along that doesn&#8217;t have a solution ready to copy-paste [...]]]></description>
			<content:encoded><![CDATA[<p>Looking back it seems like most posts on this blog are helpful tips and not reports of problems we encountered. Not that we don&#8217;t have any problems but we mostly report our solutions instead of the actual problems. Of course now and again a problem comes along that doesn&#8217;t have a solution ready to copy-paste into a blog post. A week ago a wrong modification in a shell script resulted in the deletion of a good number of files before we caught it. The command below ended up being run with 2 empty variables:</p>
<p><code>rm -fr ${DIR}/${SUBDIR}</code></p>
<p><em>Hint: add the following alias for all users to prevent this: alias rm=&#8217;rm &#8211;preserve-root&#8221;</em></p>
<p>We were lucky in two ways. First off, this was not a production server, just a development and testing server and secondly the databases and web sites on that server were unaffected. That&#8217;s where the good news ended and Murphy&#8217;s Law kicked in. A couple of days before we found that our backup server had a corrupt filesystem on its RAID array. Since we did not have enough space available on other servers to place all the backups on other servers we temporarily suspended (you guessed it) the backups of the development and testing server.</p>
<p><span id="more-101"></span></p>
<h3>To undelete or not to undelete</h3>
<p>To get back up and running we immediately closed off access to the server and considered how we could recover the deleted files. Unfortunately undeleting files on an ext3 file system can only be done under certain circumstances. If the deleted files are still opened by some process the lsof utility can help as is documented on some web sites (just Google &#8220;ext3 undelete lsof&#8221;) but for larger scale undeletes the first step is to create an image of the partition in question. That image can then be searched for inode entries which can be very useful for finding specific files. However, if you want perform a more general undelete this method is a lot less useful because the file names will not be recovered.</p>
<p>Apart from the limited usefulness that creating this image would yield it would have taken several hours to complete during which development and testing would be at a standstill. We decided not to do this and instead take our losses instead. It is important to note what data we were losing at that point. Among the missing directories were some binary directories (/usr/sbin and such) which were easily recoverable by copying them from similarly configured servers. The most important missing data was the version control repository and a custom scripts directory. All the history of changes in the repository was lost but the latest state of the code was easily restored. We copied the latest code from the developer who had last performed a complete update (which is a part of the daily development process) and put that code into the repository again. Since the versions did not match up anymore after that (all code versions were reinitialized) all developers had to retrieve the complete set of code files again and copy their latest versions over it to keep working.</p>
<p>Although this is definitely a loss for us the impact is limited by the fact that we keep copies of all released code. These copies were unaffected on the server in question but are also present on other servers. If need be we can go through that history to track down a change, but the comments are gone and it&#8217;s not a process the developers can do themselves.</p>
<h3>Rebooting the server</h3>
<p>After all this we were left with one task, rebooting that server. Since we did not know exactly what got deleted this might give us some severe problems. This was scheduled for a quiet night with several system admins present. Unfortunately our hand was forced when a change in the iptables configuration caused a kernel panic. Rebooting the server revealed several more problems, the main one being the privileges on the /tmp directory. This resulted in Apache not being able to write session info there and MySQL not being able to write temporary data either. This was quickly solved of course. Without going into too many details the final action we took was to update our Cpanel. This reinstalled many missing scripts and binaries.</p>
<p>I bet you&#8217;re wondering why we don&#8217;t use off site backups. Well, we do actually. The problem is that this involves copying many gigabytes over a limited line so we made a selection of what needed to be copied and we focused mainly on all our production servers. The main purpose of our off site backups is to recover production servers in case our data center becomes unavailable.</p>
<h3>Conclusions</h3>
<p>It&#8217;s been an annoying experience and it&#8217;s hard to draw positive lessons from mr. Murphy&#8217;s teachings but all in all it could have been a lot worse. Production was not down or affected and even testing and development impact was pretty limited. The main things on our agenda after this are to review our backup strategy for essential locations and reviewing the use of root privileges on our servers. Although we use non-root users most of the time there are tasks that are made a lot quicker by changing to root. We all know the danger of this and need to be a lot more aware of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2008/11/server-and-backup-woes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Amazon Down Time slashdoted</title>
		<link>http://linuxsysadminblog.com/2008/06/down-time/</link>
		<comments>http://linuxsysadminblog.com/2008/06/down-time/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 19:29:26 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Down Time]]></category>
		<category><![CDATA[Amazon]]></category>

		<guid isPermaLink="false">http://linuxsysadminblog.com/?p=16</guid>
		<description><![CDATA[Any host or system admin hates the words &#8220;its not working&#8221; or &#8220;its down&#8221;. When I see a big site take a hit or is town there are two reactions that I generally have: Site Down &#8211; those poor bastards (system admins) their life must suck right now Site Down &#8211; SEE!!! It happens to [...]]]></description>
			<content:encoded><![CDATA[<p>Any host or system admin hates the words &#8220;its not working&#8221; or &#8220;its down&#8221;.  When I see a big site take a hit or is town there are two reactions that I generally have:</p>
<ul>
<li>Site Down &#8211; those poor bastards (system admins) their life must suck right now</li>
<li>Site Down &#8211; SEE!!! It happens to them too!  No one is perfect!  No one is immune!</li>
</ul>
<p>Here is a story that elicited these feelings in today&#8217;s Slashdot stories:</p>
<blockquote><p>
<code>+--------------------------------------------------------------------+<br />
| US Amazon.com Website Down For Over 1 Hour                         |<br />
|   from the there-goes-the-bottom-line dept.                        |<br />
|   posted by ScuttleMonkey on Friday June 06, @15:10 (The Internet) |<br />
|   <a href="http://tech.slashdot.org/comments.pl?sid=08/06/06/199211">http://tech.slashdot.org/article.pl?sid=08/06/06/199211  </a>        |<br />
+--------------------------------------------------------------------+</p>
<p>CorporalKlinger writes "CNET News is reporting that Amazon's US website, Amazon.com, has been unreachable since 10:30 AM PDT today. As of posting, visiting www.amazon.com produces an 'Http/1.1 Service Unavailable' message. According to CNET, "Based on last quarter's revenue of $4.13 billion, a full-scale global outage would cost Amazon more than [0]$31,000 per minute on average." Some of Amazon's international websites still appear to be working, and some pages on the US Amazon.com site load if accessed using HTTPS instead of HTTP."</p>
<p>Discuss this story at:<br />
    <a href="http://tech.slashdot.org/comments.pl?sid=08/06/06/199211">http://tech.slashdot.org/comments.pl?sid=08/06/06/199211</a><br />
</code></p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://linuxsysadminblog.com/2008/06/down-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
