<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Developer Coach</title>
	<atom:link href="http://developercoach.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://developercoach.com</link>
	<description>Linux, Mac OS X, and Windows</description>
	<lastBuildDate>Fri, 29 Jan 2010 21:23:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using rsync and Time Machine for web site backups</title>
		<link>http://developercoach.com/2010/using-rsync-and-time-machine-for-web-site-backups/</link>
		<comments>http://developercoach.com/2010/using-rsync-and-time-machine-for-web-site-backups/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 21:23:30 +0000</pubDate>
		<dc:creator>Timothy Platt</dc:creator>
				<category><![CDATA[Mac OS X]]></category>

		<guid isPermaLink="false">http://developercoach.com/?p=127</guid>
		<description><![CDATA[Jeff Atwood has declared December 14th, 2009 as International Backup Awareness Day, and with good reason, as his blog, Coding Horror, experienced catastrophic data loss.  Long story short, a disk drive failure on a external hosting server resulted in complete data loss and a recent full backup was not available.  A lengthy, manual restore procedure <a href="http://developercoach.com/2010/using-rsync-and-time-machine-for-web-site-backups/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>Jeff Atwood has declared December 14th, 2009 as International Backup Awareness Day, and with good reason, as <a href="http://www.codinghorror.com/blog/archives/001315.html">his blog, Coding Horror, experienced catastrophic data loss</a>.  Long story short, a disk drive failure on a external hosting server resulted in complete data loss and a recent full backup was not available.  A lengthy, manual restore procedure is under way. Lesson learned: Jeff suggests maintaining your own backups of any content hosted on someone else&#8217;s server.  I believe this is an excellent idea and here I present how I backup this web site along with several others and the contents of my home computer.</p>
<h2>Why do your own backups?</h2>
<p>First, we should establish that performing your own backups is desirable and necessary.   Most 3rd party hosting services provide backups.  However, with your own backup scheme there are several advantages:</p>
<ul>
<li>You control the retention policy &#8211; You can decide how many versions to keep and for how long</li>
<li>Fast access to backups &#8211; You can control how accessible your backed up data is and there should be no lengthy retrievals over slow network connections</li>
<li>Review for unexpected file changes &#8211; While you are backing up, why not look for unexpectedly changed files, which can be an indication of a web site hack.</li>
<li>Trust no-one &#8211; be responsible for the data you own, whether it was created by you or by others, you have a responsibility to ensure it is protected and safe at all times.</li>
</ul>
<h2>You need 3 backups, not one</h2>
<p>Convenience &#8211; You need a drive attached to the computer at all times such that backups can be automated and convenient.  Any inconvenience or barrier to performing the backup, such as having to plug in a drive, etc. will make it less likely you have a recent copy of the data.</p>
<p>Theft &#8211; If someone breaks in and steals your home computer, they will steal any and all attached backup drives.  You need a copy not attached to the computer, preferably in a fireproof safe</p>
<p>Fire &#8211; If your house burns down, it is likely any hard drive on site will be destroyed.  You need an off-site copy.</p>
<p>Lastly, RAID (mirrored) is not a backup &#8211; it only protects you from simple hard drive mechanical failures.  It will not protect you from application level data corruption,  accidental updates or deletions, physical theft, fire, malicious users, disgruntled employees or associates, etc.  Therefore we will not discuss it further.</p>
<h2>Cloud Backup vs Local Hard Drives</h2>
<p>A simple way to achieve much of the above would be to use a cloud based online backup service.  I cannot recommend this approach simply because I have no practical experience with it.  It sounds good in theory, but here is what I don&#8217;t like about cloud: 1) I don&#8217;t like regularly recurring expenses.  I&#8217;d rather drop $100 on an external USB hard drive here and there and 2) performance, particularly for data retrieval can be a problem with the various online services, and 3) depending on what kind of data you are working with (multi gigabyte virtual machine images anyone?) upload speeds may be prohibitively slow, if for no other reason than the typical asymmetric upload/download speeds provided by most ISPs (DSL or cable).  Lastly, depending on how paranoid you are , data privacy may be an issue for you.</p>
<h2>Basic process</h2>
<p>I use <a href="http://1and1.com">1and1.com&#8217;s</a> Linux based web hosting and I run several WordPress sites (including this one) and a VBulletin forum (<a href="http://www.speedbagforum.com">http://speedbagforum.com</a>).  Both the WordPress and VBulletin web sites utilize Apache, MySQL, and PHP.  The hosting company provides SSH access, which grants me a high level of control and flexibility and enabled me to implement a very automated (but not completely) backup solution for all the aforementioned web sites.</p>
<p>To backup the web sites, including the PHP/HTML files, MySQL database, and all uploads/attachments, I simply run a BASH script on my Mac.   This script starts an SSH session (passwordless for convenience &#8211; I want to avoid any barriers to performing the backups &#8211; instead of passwords I am using private/public key authentication), remotely runs another BASH script on the linux host using MySQLDump to backup the databases.  The script then invokes rsync to copy the files to my Mac.  It then invokes another remote BASH command to remove the SQL backup files.  Since I use Time Machine on my Mac, it will pick up the newly copied local files off to an external USB drive, which I swap monthly , rotating through several drives I own.  One drive goes into the fireproof safe in the house, the other goes to a relative&#8217;s house (also kept in a fireproof safe).</p>
<p>The main BASH script could be automatically run via CRON.  However, I prefer to run it manually.  Firstly, this allows me to check the web sites and make sure no major corruption (hacked, etc &#8211; no sense backing up a corrupted web site!) has occurred and secondly this gives me a chance to review the changed files (displayed by rsync) and I review for any changed files out of the ordinary (a common virus writer trick is to spread malware via other&#8217;s websites).  You should always investigate any unusual or unexpected file changes.</p>
<h2>Details</h2>
<p>The bash script I run under Mac OS X is quite simple:</p>
<pre>#!/bin/bash
echo "Backup of databases on remote server..."
ssh (user)@(domain.com) ./databasebackup

echo "rsync all web files, including sql backups..."
rsync -avz --delete (user)@(domain.com):~ /Users/tplatt/ForumDownload

echo "Remove sql backups on remote server..."
ssh (user)@(domain.com) 'rm -v *.sql'

echo "Complete."</pre>
<p>(user)@(domain.com) should be changed to your linux login for your remote host.  If you have not created key authentication, you will be prompted for passwords when logging in.</p>
<p>First, the databasebackup bash script is executed on the remote host.  This script uses the mysqldump command to dump all the database data into a file.  Note that the file is uniquely named such that it contains the current date and time, which ensures we can retain multiple copies of the database file.</p>
<pre>#!/bin/bash
now=`date +%Y%m%d-%s`
echo "Date is $now"
#
filename="dbbackup-spf-$now.sql"
echo "Dumping $filename..."
mysqldump -h(server name) -u(database user) -p(password) (database name) &gt; $filename
tail -1 $filename</pre>
<p>The actual script that is run repeats the mysqldump command steps for all databases.  (server name) should be the MySQL server fully qualified name.  (database user) is the MySQL login and (password) is the corresponding password.  Lastly, (database name) is the name of the database.  The end result of this script is that there will be one or more .sql files in the home directory on the remote host.  The tail command outputs the last line of the backup file, which allows me to visually confirm successful backup, if the line doesn&#8217;t indicated &#8220;Dump completed&#8221;, I know the database backup failed.  A partial database backup is pretty much worthless.</p>
<pre>-- Dump completed on 2010-01-27 13:09:26</pre>
<p>Once the ./databasebackup scripts has completed, an rsync command is invoked.  This command copies down all files in the user home directory into a local directory (on my iMac).  The &#8211;delete option ensures any deleted files are removed from the local directory.   So, that being the case, how can I use this to retrieve an accidentally or maliciously deleted file?   The final part of the equation is Time Machine running on the local Mac.  Hourly the local directory is backed up to an external drive, where deleted files are retained, at least until space is needed.  I can typically fit 2 months worth of web site backups on the external drive &#8211; your results may vary according the size of the drive, size of the backups, and any other data Time Machine is backing up for you.   What if a file is deleted and I don&#8217;t notice for 3 months?  I go to the drive stored in the safe, which will have older backups, or the drive offsite.  When the rsync command executes it displays a list of the files being transferred (the -v verbose option), which gives me a quick glance at any changed files.</p>
<p>In summary, ssh and rsync enable powerful and easy backups for your hosted web sites.  Combine those with Time Machine for Mac OS X and you have a powerful, simple backup solution that is quite comprehensive.</p>
]]></content:encoded>
			<wfw:commentRss>http://developercoach.com/2010/using-rsync-and-time-machine-for-web-site-backups/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using at for command scheduling under Mac OS X</title>
		<link>http://developercoach.com/2009/using-at-for-command-scheduling-under-mac-os-x/</link>
		<comments>http://developercoach.com/2009/using-at-for-command-scheduling-under-mac-os-x/#comments</comments>
		<pubDate>Sun, 20 Dec 2009 16:00:34 +0000</pubDate>
		<dc:creator>Timothy Platt</dc:creator>
				<category><![CDATA[Mac OS X]]></category>

		<guid isPermaLink="false">http://developercoach.com/?p=131</guid>
		<description><![CDATA[Mac OS X contains the handy at command for scheduling commands to run at a later time.  at is used to schedule the commands, and the atrun utility is used to execute the jobs.  However, by default the atrun utility isn&#8217;t enabled, any jobs scheduled via at will never run, with no particular warning.  The <a href="http://developercoach.com/2009/using-at-for-command-scheduling-under-mac-os-x/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>Mac OS X contains the handy <strong>at</strong> command for scheduling commands to run at a later time.  <strong>at</strong> is used to schedule the commands, and the <strong>atrun</strong> utility is used to execute the jobs.  However, <em>by default the <strong>atrun</strong> utility isn&#8217;t enabled, any jobs scheduled via at will never run, with no particular warning</em>.  The reason given in the<strong> </strong>man page for<strong> atrun</strong> is to &#8220;prevent disk access every 10 minutes&#8221;, which would be detrimental to laptop battery life and the computer going to sleep.</p>
<p>To enable <strong>atrun</strong>, execute the following</p>
<pre>sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist
</pre>
<p>Once this is done, you may now use the <strong>at</strong> command.  For example, to schedule a command to run at 11am today, simply type:</p>
<pre>[Mac-Book-Pro]$ at 11:30 am today</pre>
<p>Then enter the desired command(s) and use Ctrl-D (end of file) to end input</p>
<pre>touch diditrun
pwd &gt; output.txt
job 13 at Sun Dec 20 11:30:00 2009</pre>
<p>The example commands above will simply create a 0 length file (diditrun) and output the working directory of the command execution, as simple proof that it indeed has run.  You can review the jobs queued using <strong>atq</strong> (or <strong>at -l)</strong></p>
<pre>[Mac-Book-Pro]$ atq
13    Sun Dec 20 11:30:00 2009</pre>
<p>To review the actual commands to be executed for any particular job (where 13 is the job number listed via atq):</p>
<pre>[Mac-Book-Pro]$ at -c 13
#!/bin/sh
# atrun uid=501 gid=501
# mail tplatt 0
umask 22
(... much output removed ...)
PATH=/Users/tplatt/depot_tools:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin; export PATH
PWD=/Users/tplatt/compression\ testing; export PWD
(... more output removed ...)
cd /Users/tplatt/compression\ testing || {
 echo 'Execution directory inaccessible' &gt;&amp;2
 exit 1
}
OLDPWD=/Users/tplatt; export OLDPWD
touch diditrun
pwd &gt; output.txt</pre>
<p>Notice that the command will be executed in the working directory in which it was scheduled, and with the user credentials of the user who scheduled it.<br />
To remove a job from queue use <strong>atrm</strong> and the relevant job number</p>
<pre>atrm 13
</pre>
<p>To disable the atrun command from running, you can again manipulate <a href="http://en.wikipedia.org/wiki/Launchd">launchd</a> settings via launchctl.  You would do this if you were not actively using at and wanted to prevent extraneous disk access and to ensure the computer sleeps properly.</p>
<pre>sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.atrun.plist
</pre>
<p>Lastly, <strong>at</strong> is intended for running commands in the future, but only once, not on a recurring basis.  For commands you wish to run on a recurring basis, use <strong>crontab</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://developercoach.com/2009/using-at-for-command-scheduling-under-mac-os-x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chrome OS: Step by Step build and run using Ubuntu 9.10 and VMware Fusion 3 (Mac OS X)</title>
		<link>http://developercoach.com/2009/chrome-os-step-by-step-build-and-run-using-ubuntu-9-10-and-vmware-fusion-3-mac-os-x/</link>
		<comments>http://developercoach.com/2009/chrome-os-step-by-step-build-and-run-using-ubuntu-9-10-and-vmware-fusion-3-mac-os-x/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 03:20:18 +0000</pubDate>
		<dc:creator>Timothy Platt</dc:creator>
				<category><![CDATA[Chrome OS]]></category>

		<guid isPermaLink="false">http://developercoach.com/?p=70</guid>
		<description><![CDATA[Here&#8217;s a step by step breakdown of how to build Chrome OS under Ubuntu 9.10 (running as a virtual machine with VMware Fusion on Mac OS X) and testing the built image also via VMware Fusion.    At this time of the original post I couldn&#8217;t get chromium (the browser portion) to actually build, as it <a href="http://developercoach.com/2009/chrome-os-step-by-step-build-and-run-using-ubuntu-9-10-and-vmware-fusion-3-mac-os-x/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a step by step breakdown of how to build Chrome OS under Ubuntu 9.10 (running as a virtual machine with VMware Fusion on Mac OS X) and testing the built image also via VMware Fusion.    At this time of the original post I couldn&#8217;t get chromium (the browser portion) to actually build, as it had compilation errors, and it required a pre-built browser binary, which was graciously provided by <a href="http://mohamedmansour.com">Mohamed Mansour</a>.  I have now updated the steps to list how to build the browser from source, or use the pre-built binary.</p>
<p>Most, if not all, of this information is available via the <a href="http://www.chromium.org/chromium-os">Chromium OS pages</a>, <a href="http://gdgt.com/google/chrome-os/download/">gdgt</a>, and some other sources, however, these instructions are meant to be easier to follow for anyone with this specific build environment (using VMware Fusion on Mac OS X).  I like VMs for these purposes, while there may be a performance tax, it allows easy snapshots (can roll back to a snapshot if need be) and help avoids adding too much clutter on my main machine.</p>
<p>Lastly, the <a href="http://www.chromium.org/chromium-os/discussion-groups">Chromium OS discussion group</a> and the #chromium-os channel on irc.freenode.net are good resources for getting quick answers from knowledgeable people.</p>
<h2>1. Create Ubuntu 9.10 virtual machine under VMware Fusion 3 (Mac OS X)</h2>
<ul>
<li>Download 9.10 .iso file for Ubuntu</li>
<li>In VMware fusion, click File -&gt; New</li>
<li>Continue without disc</li>
<li>Use an operating system installation disc image file</li>
<li>Choose .iso file downloaded previously</li>
<li>Operating System: Linux, Version: Ubuntu</li>
<li>Assign Easy Install password</li>
<li>Click &#8220;Customize Settings&#8221;</li>
<li>Save Settings as desired</li>
<li>Enable sharing, create a folder on the Mac to share with Read/Write access.  In this example, I have created a Desktop folder named &#8220;Ubuntu Share&#8221;</li>
<li>Change networking to bridged (if desired, leaving as NAT should work fine as well)</li>
<li>Increase the disk size as 20 GB is a bit small for a build machine, 100 GB should be fine.  There&#8217;s really no reason to go conservative if you are using a dynamically expanding VMware virtual disk, the complete size will only be used if needed.  Also be sure to check the option to split the VMDK into 2 GB chunks, which makes backup/restore more granular and easier to manage.</li>
<li>Start up Virtual Machine</li>
<li>Installation will start and proceed, eventually it will restart and you will receive a login prompt, login with the user and password specified above</li>
<li>VMware Tools will install automatically, you will be placed at a login prompt, wait while it installs, it will eventually restart into the GUI.  If it doesn&#8217;t finish after quite some time, just power off and restart, it should boot into the GUI</li>
<li>Login when prompted</li>
</ul>
<h2>2. Configure dependencies and tools (Ubuntu 9.10)</h2>
<ul>
<li>Start a terminal session: Applications -&gt; Accessories -&gt; Terminal</li>
<li>(http://www.chromium.org/chromium-os/building-chromium-os)</li>
<li>Install chromium prerequesites: http://src.chromium.org/svn/trunk/src/build/install-build-deps.sh</li>
<li>Save file as install-build-deps.sh.  It will save to ~/Downloads, so cd to that directory.</li>
<li>Run chmods +x install-build-deps.sh</li>
<li>Execute with sudo ./install-build-deps.sh</li>
<li>Select y for debugging symbols</li>
<li>When prompted for REPLACE SYSTEM LINKER ld with gold and back up ld, select y or press enter</li>
<li>If prompted do you want to continue, press y</li>
<li>It will run for a considerable amount of time, when complete you will be returned to a command prompt</li>
<li>install qemu: sudo apt-get install qemu (if prompted select y)</li>
</ul>
<h2>3. Download / sync source code (Ubuntu 9.10)</h2>
<ul>
<li>Download the chromium depot tools: http://sites.google.com/a/chromium.org/dev/developers/how-tos/install-gclient</li>
<li>cd ~</li>
<li><code>svn co http://src.chromium.org/svn/trunk/tools/depot_tools</code></li>
<li>Add the new dir to your path via:  export PATH=`pwd`/depot_tools:&#8221;$PATH&#8221;</li>
<li>You must run that command in the ~ directory. Use echo $PATH to see the results.  The depot_tools dir in your home directory should be at the front of the path.  You can add this to .profile or .bashrc so that it will occur next session</li>
<li>Install git: sudo apt-get install git-core.  If prompted to continue, choose y</li>
</ul>
<ul>
<li><span style="text-decoration: underline;"><strong>If you wish to build the chromium (browser) from source do the following<br />
</strong></span></li>
<li>mkdir ~/chromium</li>
<li>cd chromium</li>
<li>gclient config http://src.chromium.org/svn/trunk/src http://build.chromium.org/buildbot/continuous/LATEST/REVISION (Get a known good)</li>
<li>export GYP_DEFINES=&#8221;chromeos=1&#8243;  (consider adding to .profile or .bashrc)</li>
<li>gclient sync &#8211;deps=&#8221;chromeos,unix&#8221;</li>
<li>(will run for a long time)</li>
<li>(end of steps to retrieve browser source)</li>
</ul>
<ul>
<li>Get the chromium OS repository as follows:</li>
<li>mkdir chromiumos</li>
<li>cd chromiumos</li>
<li>gclient config http://src.chromium.org/git/chromiumos.git</li>
<li>gclient sync</li>
<li>The sync will take a long time, and will appear to stall at several points, just allow it to continue</li>
</ul>
<h2>4. Build (Ubuntu 9.10)</h2>
<ul>
<li>http://sites.google.com/a/chromium.org/dev/chromium-os/building-chromium-os/build-instructions</li>
<li>Make Local Repository as follows:</li>
<li>cd ~/chromiumos/chromiumos.git/src/scripts</li>
<li>./make_local_repo.sh  (may need to enter sudo password)</li>
<li>Create build environment</li>
<li>./make_chroot.sh</li>
</ul>
<ul>
<li><span style="text-decoration: underline;"><strong>If you did not download the chromium browser source, you must incorporate a pre-built browser binary</strong></span>:</li>
<li>Download chromium browser binary: http://mohamedmansour.com/chrome/builds/chrome-linux.zip  or http://build.chromium.org/buildbot/continuous/linux/LATEST/chrome-linux.zip</li>
<li> cp ~/Downloads/chrome-linux.zip ~/chromiumos/chromiumos.git/src/build/x86/local_assets</li>
</ul>
<ul>
<li><span style="text-decoration: underline;"><strong>If you did download the chromium browser source, you will now build it:</strong></span></li>
<li>cd ~/chromiumos/chromiumos.git/src/scripts</li>
<li><code>./build_chrome.sh --chrome_dir ~/chromium</code></li>
<li>(This will take quite some time.  It must complete successfully.  If it doesn&#8217;t you&#8217;ll likely get a blank blue gradient screen after signing in to Chromium OS)</li>
</ul>
<ul>
<li>Now we will build the OS</li>
<li>cd ~/chromiumos/chromiumos.git/src/scripts</li>
<li>./enter_chroot.sh</li>
<li>Create a debug user (called USERNAME)</li>
<li>( cd ../platform/pam_google &amp;&amp; ./enable_localaccount.sh USERNAME)</li>
<li>./set_shared_user_password.sh (set any password)</li>
<li>./build_platform_packages.sh</li>
<li>./build_kernel.sh</li>
<li>./build_image.sh</li>
<li>When complete, a short blurb on how to convert for vmware.  Note that path will have to be changed (see next instructions)</li>
</ul>
<h2>5. Convert to VMware VDK (Ubuntu 9.10)</h2>
<ul>
<li>Open another terminal (must do this outside of the chroot)</li>
<li>cd ~/chromiumos/chromiumos.git/src/scripts</li>
<li>./image_to_vmware.sh &#8211;from=/chromiumos/chromiumos.git/src/build/images/999.999.32609.025303-a1  (NOTE: Last part will change see the output from the ./build_image.sh to confirm the specific build name)</li>
<li>Copy the resulting vmdk file to the VMware shared folder</li>
<li>cp ~/chromiumos/chromiumos.git/src/build/images/999.999.32609.025303-a1/ide.vmdk &#8220;/mnt/hgfs/Ubuntu Share&#8221;</li>
<li>NOTE: in the previous command &#8220;<strong>hgfs</strong>&#8220;  of the &#8220;/mnt/hgfs&#8221; path stands for <strong>h</strong>ost <strong>g</strong>uest <strong>f</strong>ile <strong>s</strong>ystem, and is how you access shared folders in Linux under VMware.</li>
</ul>
<h2>6. Test image using VMware Fusion (Mac OS X)</h2>
<ul>
<li>In VMware Fusion, click File -&gt; New</li>
<li>Continue Without Disc</li>
<li>Use an existing virtual disk</li>
<li>Select the file created, ide.vdmk (will be in ~/Desktop/Ubuntu Share)</li>
<li>Select option: Share this virtual disk with the virtual machine that created it</li>
<li>If prompted to convert, choose &#8220;Don&#8217;t convert&#8221;</li>
<li>Continue</li>
<li>Leave Linux, Ubuntu</li>
<li>Change disk size if desired, the default 20 GB is plenty of space.  Click Finish</li>
<li>Save as Chrome OS 1 (or name of your liking)</li>
<li>Start up the virtual machine</li>
<li>After booting, Chromium OS login screen should appear</li>
</ul>
<p>Login with your google account (gmail e-mail address and password), if you receive a blank blue screen , try Ctrl-Alt-N or Ctrl-Alt-T for virtual terminal (must be a debug build for virtual terminal to work).  If your gmail account doesn&#8217;t work, try logging in with the USERNAME and password set during build.</p>
]]></content:encoded>
			<wfw:commentRss>http://developercoach.com/2009/chrome-os-step-by-step-build-and-run-using-ubuntu-9-10-and-vmware-fusion-3-mac-os-x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>File System Compression in HFS+: Space savings and performance gain?</title>
		<link>http://developercoach.com/2009/file-system-compression-in-hfs-space-savings-and-performance-gain/</link>
		<comments>http://developercoach.com/2009/file-system-compression-in-hfs-space-savings-and-performance-gain/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 02:00:48 +0000</pubDate>
		<dc:creator>Timothy Platt</dc:creator>
				<category><![CDATA[File Systems]]></category>
		<category><![CDATA[Mac OS X]]></category>

		<guid isPermaLink="false">http://developercoach.com/?p=45</guid>
		<description><![CDATA[Many modern operating systems offer compression at the individual file level.  This is most useful when it is transparent, allowing all programs and utilities to take advantage of compression without a need for specific programming.  Contrast this with compressed file or archive formats, such as zip, bzip2, gzip, which aren&#8217;t typically handled directly by applications <a href="http://developercoach.com/2009/file-system-compression-in-hfs-space-savings-and-performance-gain/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>Many modern operating systems offer compression at the individual file level.  This is most useful when it is transparent, allowing all programs and utilities to take advantage of compression without a need for specific programming.  Contrast this with compressed file or archive formats, such as zip, bzip2, gzip, which aren&#8217;t typically handled directly by applications and therefore cannot be described as transparent.  As we have discussed previously, the HFS+ file system includes transparent per file compression as of the 10.6 Snow Leopard release.</p>
<h2>Benefits of compression</h2>
<p>First and foremost, compression saves disk space, and this is the primary benefit.  A secondary benefit is reduced disk I/O, which is the slowest operation on a computer: disk I/O is many magnitudes slower than any CPU instruction due to the physical movement involved: seek time, rotational latency, and transfer time (the last two being dictated by rotational speed of the disk).   Therefore minimizing disk I/O offers great potential for overall performance improvement.  Less disk I/O means that loading a compressed file may actually be quicker then loading the equivalent uncompressed file.  But these space and I/O time savings do not come without cost.  When initially written (and for subsequent updates) the file must be compressed, and it must be decompressed each and every time the file is read. This can involve the CPU intensively, particularly when complex algorithms (very space efficient, but CPU intensive) are utilized.  With compression we are effectively trading both disk I/O and disk space for CPU cycles.  Compression can function as a performance enhancer only if the CPU cycles required for compression/decompression and the time to read the reduced data on the disk take less time overall than the total disk I/O for an equivalent uncompressed file.  An older single CPU/single core computer may be slower with compression, but multi-core/multi CPUs are now common place and represent the path forward for computer performance.  That is to say, recent computers tend to have &#8220;CPU cycles to spare&#8221;, whereas comparatively speaking traditional disk drive technology has not kept pace.</p>
<h2>HFS+</h2>
<p>Individual file compression is new with the Snow Leopard 10.6 release of Mac OS X.  The feature set is quite limited compared to other file systems such as NTFS.  For instance, compressing files is only possible via the terminal ditto command and there is no integration with the GUI.  Lastly, the current functionality is recommended for read only system files, not for end user data files, per the man entry for ditto.  It is expected that Apple will build upon this functionality in future releases of Mac OS X.  However, the functionality that is provided today helps produce the reduced disk foot print of Snow Leopard and likely results in improved performance as well.</p>
<h2>In the real world</h2>
<p>Let&#8217;s check some of our assumptions and see if we really do gain performance by compression.  We will use <a href="http://forums.macrumors.com/showthread.php?t=780570">a custom compression tool named afsctool provided by brkirch</a> to analyze the compression size savings (afsctool can also compress a file in place, unlike the Mac OS X ditto command.)  We will test a variety of file sizes and will compare compressed to uncompressed read performance.</p>
<p>First we confirm the size of the uncompressed version of a medium sized PDF file.</p>
<pre>[Mac-Book-Pro]$ ./afsctool -v test-medium.pdf
/Users/user1/compression testing/test-medium.pdf:
File is not HFS+ compressed.
File content type: com.adobe.pdf
File data fork size (reported size by Mac OS X Finder): <span style="color: #ff0000;"><strong>5168509 bytes / 5.2 MB (megabytes)</strong></span> /
 4.9 MiB (mebibytes)
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended attributes: 0 bytes
Approximate total file size (data fork + resource fork + EA + EA overhead + file overhead):
5169400 bytes / 5.2 MB (megabytes) / 4.9 MiB (mebibytes)</pre>
<p>We will then use ditto to compress the file as shown below.</p>
<pre>[Mac-Book-Pro]$ ditto --hfsCompression test-medium.pdf test-medium-compressed.pdf</pre>
<p>If we check size with the ls -al command, you&#8217;ll see that the reported size is the same.</p>
<pre>-rw-r--r--   1 tplatt  tplatt    <strong><span style="color: #ff0000;">5168509</span></strong> Nov 25 21:57 test-medium-compressed.pdf
-rw-r--r--   1 tplatt  tplatt    <strong><span style="color: #ff0000;">5168509</span></strong> Nov 25 21:57 test-medium.pdf</pre>
<p>However, when we confirm actual on disk size with afsctool we will see</p>
<pre>[Mac-Book-Pro]$ ./afsctool -v test-medium-compressed.pdf
/Users/user1/compression testing/test-medium-compressed.pdf:
File is HFS+ compressed.
File content type: com.adobe.pdf
File size (uncompressed data fork; reported size by Mac OS 10.6+ Finder):
 5168509 bytes / 5.2 MB (megabytes) / 4.9 MiB (mebibytes)
File size (compressed data fork - decmpfs xattr; reported size by Mac OS 10.0-10.5 Finder):
 <strong><span style="color: #ff0000;">3735071 bytes / 3.7 MB (megabytes)</span></strong>
 / 3.6 MiB (mebibytes)
File size (compressed data fork): 3735087 bytes / 3.7 MB (megabytes) / 3.6 MiB (mebibytes)
<span style="color: #ff0000;"><strong>Compression savings: 27.7%</strong></span>
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended attributes: 536 bytes
Approximate total file size (compressed data fork + EA + EA overhead + file overhead):
 3736352 bytes / 3.7 MB (megabytes) / 3.6 MiB (mebibytes)</pre>
<p>You can see a substantial disk space savings of almost 28% was achieved.  The on disk size is now 3,735,087 bytes or 3.7 MB.  We should expect this will require significantly less disk head movement, and therefore should result in better performance.  The reduced disk read time should more than offset the CPU overhead of having to uncompress the file.  To test this, we&#8217;ll  first purge the disk cache, and then simply time the output of the file via cat.  Purging the disk cache is an important step, otherwise the file may be in disk cache (memory) and will not be read from disk.</p>
<pre>[Mac-Book-Pro]$ purge
[Mac-Book-Pro]$ time cat test-medium.pdf 1&gt;/dev/null
<span style="color: #ff0000;"><strong>real    0m1.238s</strong>
</span>user    0m0.001s
sys    0m0.025s</pre>
<pre>[Mac-Book-Pro]$ purge
[Mac-Book-Pro]$ time cat test-medium-compressed.pdf 1&gt;/dev/null
<span style="color: #ff0000;"><strong>real    0m0.192s
</strong></span>user    0m0.001s
sys    0m0.077s</pre>
<p>You can see that our hypothesis was correct.  Reading the compressed file was substantially quicker, but did require more CPU time.  A 5.2 MB file is quite large and therefore a large amount of transfer time is involved.  Will we see significant savings with other file sizes?  I ran some testing using a variety of files, repeated each test 3 times and averaged the results.  Times are in seconds.</p>
<p><img src="http://developercoach.com/wp-content/uploads/2009/11/CompressionPerformance.png" alt="" /></p>
<p>As you can see, there was a significant performance improvement provided by compression in all cases but one.  The 45 KB text file showed a performance DECREASE of 36%.  Why would this be?  To understand why this is, you must understand how files are stored on a block device like a hard drive.  <strong>(figure out reason here, experimentation in progress, check back later for details)</strong> Therefore, there is NO advantage to compression, because the extra CPU overhead to uncompress only adds to the total time to read.   After this revelation, you may wonder why this is the case for the 45 KB text, but not the 8 KB executable or 80 byte text file? Surely these both involve a lengthy read from disk?  These files are so small, they are compressed as an extended attribute (the 8 KB exe) and an inline attribute (the 80 byte text).  This means that the file contents are retrieved when the file&#8217;s meta data is retrieved, which eliminates a significant amount of head seek time (A normal file retrieval requires first accessing the metadata (somewhere on the disk) then the actual file contents, which are typically elsewhere, hence a head seek from the meta data position to the file content position is needed.</p>
<h2>Summary</h2>
<p>Apple has decided to compress the majority of the Snow Leopard system files and it is clear why they would do that, there is both space savings and performance (load time) to be gained (at least with tranditional hard disks, see caveats below).  These system files are generally small and read only.   Compressing everything on the hard disk would likely not be a wise choice as it would negatively affect performance for frequently updated (swap files, database files) or already compressed data (zipped files).  When determining what to compress, the workload and typical uses of the machine must be taken into account.</p>
<h2>Caveats</h2>
<ul>
<li>I cheated a bit on the 404 MB PDF file as the Apple provided ditto command will not compress a file that large.  I used afsctool with the -c and -5 (zlib level 5 compression) parameters to achieve the compression.</li>
<li>These tests were run on a Mac Book Pro with a relatively slow 5,400 RPM hard drive.  Using a faster disk (7,200 RPM or 10,000 RPM) could produce different results as the transfer time component would be reduced.</li>
<li><strong>Read only</strong> access performance was tested, as updates and writes are not supported currently by the operating system.  The additional overhead of compressing frequently updated files could influence performance negatively.</li>
<li>Much of the performance gain is due to rotational disk technology limitations, an SSD (Solid State Drive) would exhibit different characteristics and may not exhibit a performance increase.  Further developments in SSDs &#8211; which show great potential in both performance, noise reduction, and power consumption &#8211; will likely drastically shape the future of mass storage, but as of now they remain problematic and very expensive.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://developercoach.com/2009/file-system-compression-in-hfs-space-savings-and-performance-gain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HFS+ and File System Fragmentation</title>
		<link>http://developercoach.com/2009/file-system-fragmentation/</link>
		<comments>http://developercoach.com/2009/file-system-fragmentation/#comments</comments>
		<pubDate>Sat, 17 Oct 2009 15:41:48 +0000</pubDate>
		<dc:creator>Timothy Platt</dc:creator>
				<category><![CDATA[Mac OS X]]></category>

		<guid isPermaLink="false">http://www.developercoach.com/?p=3</guid>
		<description><![CDATA[A common question asked by Mac users is: Does my HFS+ file system get fragmented and what should I do about it?  This question is most often asked by those who have experience with the Windows operating system, where defragmentation tools are readily available, visible to the user, and frequently recommended (at least historically).  Apple <a href="http://developercoach.com/2009/file-system-fragmentation/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>A common question asked by Mac users is: Does my HFS+ file system get fragmented and what should I do about it?  This question is most often asked by those who have experience with the Windows operating system, where defragmentation tools are readily available, visible to the user, and frequently recommended (at least historically).  <a href="http://support.apple.com/kb/HT1375">Apple does not generally recommend defragmentation for HFS+,</a> but let&#8217;s dig a little deeper and see why that is.</p>
<h2>What is file system fragmentation?</h2>
<p>Hard drives are block devices, that is, they read and write multiple bytes at a time, in groups called blocks.  The actual physical hard drive has a device block size (512 bytes is typical) while the operating system&#8217;s file system will implement a block size of it&#8217;s own (4kB is typical).  That is to say, when a file is written or read, it is done so in discrete groups of 4,096 bytes at a time.   Any file that is larger than this amount must occupy multiple blocks.  For example, a 16kB file would occupy four 4kB blocks on the hard drive.  A 9kB file would occupy three 4kB blocks &#8211; notice the inefficient use of space! (This wasted space is known as internal fragmentation and we won&#8217;t discuss it further here).  These blocks may or may not be contiguous (adjacent to one another).  In the case they are not, the disk head must seek to multiple locations on the disk platter in order to retrieve the complete file.  This movement is slow, relatively speaking , so less movement of the disk head is better than more movement from a performance standpoint.  Therefore, optimal file performance (for a single file at least) implies that all the blocks making up the file be contiguous, so they may all be read with a minimum of head movement.</p>
<h2>Where does fragmentation come from?</h2>
<p>Suppose the operating system is writing a file and it requires 4 blocks.  If the hard drive is relatively empty, odds are good the 4 blocks can be written contiguously.  Now suppose at some time in the future the file doubles in size, it now requires an additional 4 blocks.  The file may no longer occupy a contiguous set of blocks, as there may or may not be additional room in the vicinity of the original blocks.  An additional four blocks will be written, at some other location non-adjacent to the original blocks.  Any access of this file will now require an additional seek time (for the disk head to transit from the original set of blocks to the second set of blocks).  Therefore, the file is now fragmented.  This is known as external fragmentation and unless preventative measures are taken, it can grow worse over time.  As a disk is used, the free space tends to get split up.  Consider that as existing files are deleted, free blocks will appear in locations that were used previously, and these blocks need to be reused.  The cycle of using and freeing space over time results in the available space on the hard drive becoming spread out in a random pattern, with fewer and fewer large areas of available blocks.</p>
<h2>How the operating system deals with fragmentation</h2>
<p>There are two general approaches to handling external fragmentation: avoiding it in the first place and cleaning it up when it does happen.  These approaches are not mutually exclusive and can be combined, hence Mac OS X implements several of these tricks.   When fragmentation does occur, &#8220;on the fly&#8221; defragmentation can be applied (under fairly specific circumstances).</p>
<h3>Extents</h3>
<p>HFS+ uses an extent based allocation scheme.   An extent (also known as a block run) consists of a starting block number and a count of contiguous blocks.  One or more extents are used to store file contents, however, the algorithm for selecting the extents to use will prefer a single extent (i.e. contiguous storage).  This allocation scheme inherently tends towards contiguous storage, as the system will attempt to store the file in the minimum number of extents that provide the space needed, and an extent by definition represents contiguous storage.  Contrast this approach with a pure block based allocation scheme such as that of the legacy Windows File Allocation Table (FAT) file system, which tends towards allocating single blocks at a time, which can be widely dispersed.  Many modern file systems use an extent based allocation scheme, including the successor to the FAT file system, NTFS.  Another technique employed by HFS+ is that it will try to avoid reusing freed space if possible, i.e. it will ignore the extents freed from deleted files, which are likely to be widely dispersed and therefore highly fragmented.</p>
<h3>Delayed allocation</h3>
<p>HFS+ also uses a technique known as delayed allocation.  When an application requests that data be written, the actual write to the disk is delayed as long as possible.  Meanwhile the contents of the file to be written are buffered into memory.  Inevitably the file must be written to disk, but because of the delay there is a much improved chance all the data can be written to a set of contiguous blocks (a single extent).  Contrast this with an approach where bytes are written to storage as soon as possible, which could result in a insufficiently size extent being selected, necessitating further extents shortly thereafter.   A trade off of this approach is the increased possibility of lost data if a power outage occurs before the data is written to disk.</p>
<h3>On the fly defragmentation</h3>
<p>HFS+ can also detect and correct fragmented files under certain circumstances.  When a file is opened, it is checked by the kernel and if certain conditions are met, the file will be defragmented on the fly.  One of the advantages of using Mac OS X (perhaps not for the casual user, but certainly for a power user with interest in the internal workings) is that a large portion of the source code is readily accessible via the Darwin project.  Here we see a snippet of source representing the actual algorithm for determining when to defragment a file (<a href="http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/hfs/hfs_vnops.c">this is from the hfs_vnop_open function</a>.) You&#8217;ll notice a number of constraints: The file in question must be less than 20 MB, the system must have booted at least 3 minutes ago, and there must be a minimum of 8 extents, and the file must not have been updated in the last minute, to prevent thrashing.</p>
<pre style="background-color:#eee;">	<span>/*
	 * On the first (non-busy) open of a fragmented
	 * file attempt to de-frag it (if its less than 20MB).
	 */</span>
	<span>if</span> ((hfsmp-&gt;hfs_flags &amp; HFS_READ_ONLY) ||
	    (hfsmp-&gt;jnl == NULL) ||
#<span>if</span> <span>NAMEDSTREAMS</span>
	    !vnode_isreg(vp) || vnode_isinuse(vp, 0) || vnode_isnamedstream(vp)) {
#<span>else</span>
	    !vnode_isreg(vp) || vnode_isinuse(vp, 0)) {
#<span>endif</span>
		<span>return</span> (0);
	}

	<span>if</span> ((error = hfs_lock(cp, HFS_EXCLUSIVE_LOCK)))
		<span>return</span> (error);
	fp = VTOF(vp);
	<span>if</span> (fp-&gt;ff_blocks &amp;&amp;
	    fp-&gt;ff_extents[7].blockCount != 0 &amp;&amp;
	    fp-&gt;ff_size &lt;= (20 * 1024 * 1024)) {
		<span>int</span> no_mods = 0;
		<span>struct</span> timeval now;
		<span>/*
		 * Wait until system bootup is done (3 min).
		 * And don't relocate a file that's been modified
		 * within the past minute -- this can lead to
		 * system thrashing.
		 */</span>

		<span>if</span> (!past_bootup) {
			microuptime(&amp;tv);
			<span>if</span> (tv.tv_sec &gt; (60*3)) {
				past_bootup = 1;
			}
		}

		microtime(&amp;now);
		<span>if</span> ((now.tv_sec - cp-&gt;c_mtime) &gt; 60) {
			no_mods = 1;
		} 

		<span>if</span> (past_bootup &amp;&amp; no_mods) {
			(<span>void</span>) hfs_relocate(vp, hfsmp-&gt;nextAllocation + 4096,
					vfs_context_ucred(ap-&gt;a_context),
					vfs_context_proc(ap-&gt;a_context));
		}
	}
	hfs_unlock(cp);</pre>
<p>The call to hfs_relocate performs the actual defragmentation, and is also used in the implementation of the next feature.</p>
<h3>Adaptive hot file clustering</h3>
<p>Adaptive hot file clustering is another mechanism by which files are defragmented.   This performance enhancing feature of Mac OS X attempts to keep the most frequently used files in the &#8220;hot zone&#8221; of the disk, in other words, the area which can be accessed most quickly.  A &#8220;temperature&#8221; is calculated over a time period and periodically files are moved into or out of the hot zone based on this temperature.  The end result is that small, frequently used files are put in the most advantageous location.  There are certain restrictions to this technique, including file size and maximum number of files &#8211; due to the limited space in the &#8220;hot zone&#8221;.  Finally, as files are moved into the hot zone they are automatically defragmented, if necessary, courtesy of the hfs_relocate function.</p>
<h3>Caveat: free space required</h3>
<p>Obviously, the defragmentation magic described above is subject to contiguous free space being available on the disk.  There can come a time when the disk&#8217;s remaining free space is simply too fragmented for the on the fly defragmentation to operate properly.  When this condition occurs (flagged as HFS_FRAGMENTED_FREESPACE) the hfs_relocate function will no longer work as desired.</p>
<h3>In the real world</h3>
<p><a href="http://osxbook.com/software/hfsdebug/">Amit Singh&#8217;s hfsdebug tool</a> can be used to calculate actual fragmentation of an HFS+ file system.  Let&#8217;s look at actual values from a Mac Book Pro laptop.  You can see that the laptop&#8217;s 149 GiB hard drive is relatively full, at 87% used:</p>
<pre style="background-color: #eeeeee; width: 100%;">[Mac-Book-Pro ~]$ df -h
Filesystem      Size   Used  Avail Capacity  Mounted on
/dev/disk0s2   149Gi  128Gi   21Gi    87%    /</pre>
<p>The hfsdebug command will require superuser privileges, run it via the sudo command.  The output can be quite voluminous, as when given the -f parameter, it will output a list of all fragmented files, so here we pipe it into the tail command and retrieve the last 10 lines of output.</p>
<pre style="background-color: #eeeeee; width: 100%;">[Mac-Book-Pro ~]$ sudo ./hfsdebug -f -t 5 | tail -10
# Top 5 Files with the Most Extents on the Volume
rank    extents   blk/extents       cnid path
1          2370          1.44    1415451 Macintosh HD:/Users/User1/Pictures/iPhoto Library/face_blob.db
2          1066          1.36    1415450 Macintosh HD:/Users/User1/Pictures/iPhoto Library/face.db
3           263        139.72    1056058 Macintosh HD:/Users/User2/Library/Caches/com.apple.Safari/Cache.db
4           178         10.66     961880 Macintosh HD:/Users/User2/Library/PubSub/Database/Database.sqlite3
5           131       1940.60    1087232 Macintosh HD:/Users/User2/Downloads/xcode314_2809_developerdvd.dmg

Out of 496715 non-zero data forks total, 496275 (99.911 %) have no fragmentation.
Out of 43688 non-zero resource forks total, 43688 (100.000 %) have no fragmentation.</pre>
<p>You can see that despite the heavy utilization of the disk&#8217;s free space, there is in fact little significant fragmentation.  But at the same time there are SOME heavily fragmented files, which are listed via the -t parameter of the hfsdebug command.  These heavily fragmented files are too large (&gt; 20MB) to invoke on the fly defragmentation. Lastly, you will see that although this hard disk is quite full, there are still several large (1GB+) of contiguous free space available, as displayed via the -0 option of hfsdebug.</p>
<pre style="background-color:#eee;">[Mac-Book-Pro ~ ]$ sudo ./hfsdebug -0 | grep GB
 300992         0x1d5e8         0x66da7     1.15 GB
 944875         0xd8114        0x1bebfe     3.60 GB
 262145       0x217d835       0x21bd835     1.00 GB
 2115346       0x232a4b9       0x252ebca     8.07 GB</pre>
<h3>Summary</h3>
<p>In summary, modern operating systems such as Mac OS X have largely eliminated file system fragmentation as a performance concern for the average (home) computer user through a combination of techniques to avoid fragmentation, and techniques to correct it when it does occur.  My recommendation: if your hard drive is not close to full capacity you probably have no action to take.  If you do suspect fragmentation is affecting performance, you can easily confirm whether this is the case or not via the hfsdebug tool.  If you find significant fragmentation (5% or more), you can either obtain a 3rd party tool, or perform a Time Machine backup and do a clean install of the OS, then restore your files.   For the average user, a reasonable strategy would be to always perform clean installs when applying major operating system updates (10.5 to 10.6, for example), as this will correct any fragmentation that is present.  Given the ease of use of Time Machine for both the backup and restore process, there is really no daunting technical hurdles for the average Mac user to overcome.  Lastly, if you are have less than 15% free space on your hard drive, you should consider upgrading to a larger disk, as with more free space available the features of HFS+ that keep fragmentation at bay will work properly.</p>
<h3>Recommendations</h3>
<ul>
<li>Try to keep 15% to 20% free space on your hard drive, defragmentation can&#8217;t work if there is no space for it to use!</li>
<li>Time Machine backup and clean installs for major operating system upgrades (i.e. 10.5 to 10.6, etc.), which will remove most, if not all, fragmentation.</li>
<li>For the average home user, nothing beyond the above should be needed.</li>
<li>Results for machines used as servers may vary, as the workload is different.  Best to use hfsdebug to figure out if fragmentation is a problem or not</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://developercoach.com/2009/file-system-fragmentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
