Using git with Xcode

About git

Git is a distributed version control system that is easy to use and free.  I will describe the use of git with Xcode on Mac OS X for iPhone development.  Because git is distributed, each client has a full copy of the repository.  This results in great performance in most situations involving the management of source code files.

Installation

A disk image for Mac OS X is available here.  Simply download, open it, and run the .pkg to install.

Xcode Preparation

I’ll be using command line, even though GUI clients are available for Mac OS X.  Why?  Because command line is quick and easy, git isn’t so complicated as to require a GUI.

It’s generally considered poor practice to check build outputs and intermediate files into your source control system.  The binaries tend to bloat the underlying storage mechanisms and in addition, why store what you can re-create from source later?  It can also complicate the check in and merge process , as binaries would have to be manually resolved.  This may be an issue for a multi-developer team.

The easy way to do this in Xcode is to change the build output location. This can be found under “Project” -> “Edit Project Settings” in the application menu:

If you have already built previously, you may want to use “Build” -> “Clean All Targets” to remove any outputs in the old location.  I also like to use the rm command on the command line to remove the now empty (and unnecessary) directories.

First Commit

Now, to the command line.  You must change directories to the location of the project to commit, then create the initial repository using the init sub command.  Then add . (all files and directories underneath), then commit.  Text editor will appear , edit the commit message, save, and quit

[Mac-Book-Pro]$ pwd
/Users/tplatt/development/iphone/research/HelloWorld
[Mac-Book-Pro]$ git init
Initialized empty Git repository in /Users/tplatt/development/iphone/research/HelloWorld/.git/
[Mac-Book-Pro]$ git add .
[Mac-Book-Pro]$ git commit
[master (root-commit) 36395bc] Initial checkin, basic functionality.
16 files changed, 4510 insertions(+), 0 deletions(-)
create mode 100644 Classes/HelloWorldAppDelegate.h
create mode 100644 Classes/HelloWorldAppDelegate.m
(Other output removed for brevity)

Changing and updating

git diff  - to check for differences

git log – to see history

git add (filename) – to add new files

git commit -a – to commit changed files (and any added since last commit)

Sharing with others

Create a “magic file” in each .git repository that is OK to export:

touch git-daemon-export-ok

Then run the git daemon

git daemon &

The remote client can then use:

git clone git://(hostname)/Users/tplatt/development/iphone/research/HelloWorld/.git

Other commands

git user’s manual

Posted in Mac OS X | Leave a comment

Using rsync and Time Machine for web site backups

Jeff Atwood has declared December 14th, 2009 as International Backup Awareness Day, and with good reason, as his blog, Coding Horror, experienced catastrophic data loss.  Long story short, a disk drive failure on a external hosting server resulted in complete data loss and a recent full backup was not available.  A lengthy, manual restore procedure is under way. Lesson learned: Jeff suggests maintaining your own backups of any content hosted on someone else’s server.  I believe this is an excellent idea and here I present how I backup this web site along with several others and the contents of my home computer.

Why do your own backups?

First, we should establish that performing your own backups is desirable and necessary.   Most 3rd party hosting services provide backups.  However, with your own backup scheme there are several advantages:

  • You control the retention policy – You can decide how many versions to keep and for how long
  • Fast access to backups – You can control how accessible your backed up data is and there should be no lengthy retrievals over slow network connections
  • Review for unexpected file changes – While you are backing up, why not look for unexpectedly changed files, which can be an indication of a web site hack.
  • Trust no-one – be responsible for the data you own, whether it was created by you or by others, you have a responsibility to ensure it is protected and safe at all times.

You need 3 backups, not one

Convenience – You need a drive attached to the computer at all times such that backups can be automated and convenient.  Any inconvenience or barrier to performing the backup, such as having to plug in a drive, etc. will make it less likely you have a recent copy of the data.

Theft – If someone breaks in and steals your home computer, they will steal any and all attached backup drives.  You need a copy not attached to the computer, preferably in a fireproof safe

Fire – If your house burns down, it is likely any hard drive on site will be destroyed.  You need an off-site copy.

Lastly, RAID (mirrored) is not a backup – it only protects you from simple hard drive mechanical failures.  It will not protect you from application level data corruption,  accidental updates or deletions, physical theft, fire, malicious users, disgruntled employees or associates, etc.  Therefore we will not discuss it further.

Cloud Backup vs Local Hard Drives

A simple way to achieve much of the above would be to use a cloud based online backup service.  I cannot recommend this approach simply because I have no practical experience with it.  It sounds good in theory, but here is what I don’t like about cloud: 1) I don’t like regularly recurring expenses.  I’d rather drop $100 on an external USB hard drive here and there and 2) performance, particularly for data retrieval can be a problem with the various online services, and 3) depending on what kind of data you are working with (multi gigabyte virtual machine images anyone?) upload speeds may be prohibitively slow, if for no other reason than the typical asymmetric upload/download speeds provided by most ISPs (DSL or cable).  Lastly, depending on how paranoid you are , data privacy may be an issue for you.

Basic process

I use 1and1.com’s Linux based web hosting and I run several WordPress sites (including this one) and a VBulletin forum (http://speedbagforum.com).  Both the WordPress and VBulletin web sites utilize Apache, MySQL, and PHP.  The hosting company provides SSH access, which grants me a high level of control and flexibility and enabled me to implement a very automated (but not completely) backup solution for all the aforementioned web sites.

To backup the web sites, including the PHP/HTML files, MySQL database, and all uploads/attachments, I simply run a BASH script on my Mac.   This script starts an SSH session (passwordless for convenience – I want to avoid any barriers to performing the backups – instead of passwords I am using private/public key authentication), remotely runs another BASH script on the linux host using MySQLDump to backup the databases.  The script then invokes rsync to copy the files to my Mac.  It then invokes another remote BASH command to remove the SQL backup files.  Since I use Time Machine on my Mac, it will pick up the newly copied local files off to an external USB drive, which I swap monthly , rotating through several drives I own.  One drive goes into the fireproof safe in the house, the other goes to a relative’s house (also kept in a fireproof safe).

The main BASH script could be automatically run via CRON.  However, I prefer to run it manually.  Firstly, this allows me to check the web sites and make sure no major corruption (hacked, etc – no sense backing up a corrupted web site!) has occurred and secondly this gives me a chance to review the changed files (displayed by rsync) and I review for any changed files out of the ordinary (a common virus writer trick is to spread malware via other’s websites).  You should always investigate any unusual or unexpected file changes.

Details

The bash script I run under Mac OS X is quite simple:

#!/bin/bash
echo "Backup of databases on remote server..."
ssh (user)@(domain.com) ./databasebackup

echo "rsync all web files, including sql backups..."
rsync -avz --delete (user)@(domain.com):~ /Users/tplatt/ForumDownload

echo "Remove sql backups on remote server..."
ssh (user)@(domain.com) 'rm -v *.sql'

echo "Complete."

(user)@(domain.com) should be changed to your linux login for your remote host.  If you have not created key authentication, you will be prompted for passwords when logging in.

First, the databasebackup bash script is executed on the remote host.  This script uses the mysqldump command to dump all the database data into a file.  Note that the file is uniquely named such that it contains the current date and time, which ensures we can retain multiple copies of the database file.

#!/bin/bash
now=`date +%Y%m%d-%s`
echo "Date is $now"
#
filename="dbbackup-spf-$now.sql"
echo "Dumping $filename..."
mysqldump -h(server name) -u(database user) -p(password) (database name) > $filename
tail -1 $filename

The actual script that is run repeats the mysqldump command steps for all databases.  (server name) should be the MySQL server fully qualified name.  (database user) is the MySQL login and (password) is the corresponding password.  Lastly, (database name) is the name of the database.  The end result of this script is that there will be one or more .sql files in the home directory on the remote host.  The tail command outputs the last line of the backup file, which allows me to visually confirm successful backup, if the line doesn’t indicated “Dump completed”, I know the database backup failed.  A partial database backup is pretty much worthless.

-- Dump completed on 2010-01-27 13:09:26

Once the ./databasebackup scripts has completed, an rsync command is invoked.  This command copies down all files in the user home directory into a local directory (on my iMac).  The –delete option ensures any deleted files are removed from the local directory.   So, that being the case, how can I use this to retrieve an accidentally or maliciously deleted file?   The final part of the equation is Time Machine running on the local Mac.  Hourly the local directory is backed up to an external drive, where deleted files are retained, at least until space is needed.  I can typically fit 2 months worth of web site backups on the external drive – your results may vary according the size of the drive, size of the backups, and any other data Time Machine is backing up for you.   What if a file is deleted and I don’t notice for 3 months?  I go to the drive stored in the safe, which will have older backups, or the drive offsite.  When the rsync command executes it displays a list of the files being transferred (the -v verbose option), which gives me a quick glance at any changed files.

In summary, ssh and rsync enable powerful and easy backups for your hosted web sites.  Combine those with Time Machine for Mac OS X and you have a powerful, simple backup solution that is quite comprehensive.

Posted in Mac OS X | Leave a comment

Using at for command scheduling under Mac OS X

Mac OS X contains the handy at command for scheduling commands to run at a later time.  at is used to schedule the commands, and the atrun utility is used to execute the jobs.  However, by default the atrun utility isn’t enabled, any jobs scheduled via at will never run, with no particular warning.  The reason given in the man page for atrun is to “prevent disk access every 10 minutes”, which would be detrimental to laptop battery life and the computer going to sleep.

To enable atrun, execute the following

sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist

Once this is done, you may now use the at command.  For example, to schedule a command to run at 11am today, simply type:

[Mac-Book-Pro]$ at 11:30 am today

Then enter the desired command(s) and use Ctrl-D (end of file) to end input

touch diditrun
pwd > output.txt
job 13 at Sun Dec 20 11:30:00 2009

The example commands above will simply create a 0 length file (diditrun) and output the working directory of the command execution, as simple proof that it indeed has run.  You can review the jobs queued using atq (or at -l)

[Mac-Book-Pro]$ atq
13    Sun Dec 20 11:30:00 2009

To review the actual commands to be executed for any particular job (where 13 is the job number listed via atq):

[Mac-Book-Pro]$ at -c 13
#!/bin/sh
# atrun uid=501 gid=501
# mail tplatt 0
umask 22
(... much output removed ...)
PATH=/Users/tplatt/depot_tools:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin; export PATH
PWD=/Users/tplatt/compression\ testing; export PWD
(... more output removed ...)
cd /Users/tplatt/compression\ testing || {
 echo 'Execution directory inaccessible' >&2
 exit 1
}
OLDPWD=/Users/tplatt; export OLDPWD
touch diditrun
pwd > output.txt

Notice that the command will be executed in the working directory in which it was scheduled, and with the user credentials of the user who scheduled it.
To remove a job from queue use atrm and the relevant job number

atrm 13

To disable the atrun command from running, you can again manipulate launchd settings via launchctl.  You would do this if you were not actively using at and wanted to prevent extraneous disk access and to ensure the computer sleeps properly.

sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.atrun.plist

Lastly, at is intended for running commands in the future, but only once, not on a recurring basis.  For commands you wish to run on a recurring basis, use crontab.

Posted in Mac OS X | 1 Comment

Chrome OS: Step by Step build and run using Ubuntu 9.10 and VMware Fusion 3 (Mac OS X)

Here’s a step by step breakdown of how to build Chrome OS under Ubuntu 9.10 (running as a virtual machine with VMware Fusion on Mac OS X) and testing the built image also via VMware Fusion.    At this time of the original post I couldn’t get chromium (the browser portion) to actually build, as it had compilation errors, and it required a pre-built browser binary, which was graciously provided by Mohamed Mansour.  I have now updated the steps to list how to build the browser from source, or use the pre-built binary.

Most, if not all, of this information is available via the Chromium OS pages, gdgt, and some other sources, however, these instructions are meant to be easier to follow for anyone with this specific build environment (using VMware Fusion on Mac OS X).  I like VMs for these purposes, while there may be a performance tax, it allows easy snapshots (can roll back to a snapshot if need be) and help avoids adding too much clutter on my main machine.

Lastly, the Chromium OS discussion group and the #chromium-os channel on irc.freenode.net are good resources for getting quick answers from knowledgeable people.

1. Create Ubuntu 9.10 virtual machine under VMware Fusion 3 (Mac OS X)

  • Download 9.10 .iso file for Ubuntu
  • In VMware fusion, click File -> New
  • Continue without disc
  • Use an operating system installation disc image file
  • Choose .iso file downloaded previously
  • Operating System: Linux, Version: Ubuntu
  • Assign Easy Install password
  • Click “Customize Settings”
  • Save Settings as desired
  • Enable sharing, create a folder on the Mac to share with Read/Write access.  In this example, I have created a Desktop folder named “Ubuntu Share”
  • Change networking to bridged (if desired, leaving as NAT should work fine as well)
  • Increase the disk size as 20 GB is a bit small for a build machine, 100 GB should be fine.  There’s really no reason to go conservative if you are using a dynamically expanding VMware virtual disk, the complete size will only be used if needed.  Also be sure to check the option to split the VMDK into 2 GB chunks, which makes backup/restore more granular and easier to manage.
  • Start up Virtual Machine
  • Installation will start and proceed, eventually it will restart and you will receive a login prompt, login with the user and password specified above
  • VMware Tools will install automatically, you will be placed at a login prompt, wait while it installs, it will eventually restart into the GUI.  If it doesn’t finish after quite some time, just power off and restart, it should boot into the GUI
  • Login when prompted

2. Configure dependencies and tools (Ubuntu 9.10)

  • Start a terminal session: Applications -> Accessories -> Terminal
  • (http://www.chromium.org/chromium-os/building-chromium-os)
  • Install chromium prerequesites: http://src.chromium.org/svn/trunk/src/build/install-build-deps.sh
  • Save file as install-build-deps.sh.  It will save to ~/Downloads, so cd to that directory.
  • Run chmods +x install-build-deps.sh
  • Execute with sudo ./install-build-deps.sh
  • Select y for debugging symbols
  • When prompted for REPLACE SYSTEM LINKER ld with gold and back up ld, select y or press enter
  • If prompted do you want to continue, press y
  • It will run for a considerable amount of time, when complete you will be returned to a command prompt
  • install qemu: sudo apt-get install qemu (if prompted select y)

3. Download / sync source code (Ubuntu 9.10)

  • Download the chromium depot tools: http://sites.google.com/a/chromium.org/dev/developers/how-tos/install-gclient
  • cd ~
  • svn co http://src.chromium.org/svn/trunk/tools/depot_tools
  • Add the new dir to your path via:  export PATH=`pwd`/depot_tools:”$PATH”
  • You must run that command in the ~ directory. Use echo $PATH to see the results.  The depot_tools dir in your home directory should be at the front of the path.  You can add this to .profile or .bashrc so that it will occur next session
  • Install git: sudo apt-get install git-core.  If prompted to continue, choose y
  • If you wish to build the chromium (browser) from source do the following
  • mkdir ~/chromium
  • cd chromium
  • gclient config http://src.chromium.org/svn/trunk/src http://build.chromium.org/buildbot/continuous/LATEST/REVISION (Get a known good)
  • export GYP_DEFINES=”chromeos=1″  (consider adding to .profile or .bashrc)
  • gclient sync –deps=”chromeos,unix”
  • (will run for a long time)
  • (end of steps to retrieve browser source)
  • Get the chromium OS repository as follows:
  • mkdir chromiumos
  • cd chromiumos
  • gclient config http://src.chromium.org/git/chromiumos.git
  • gclient sync
  • The sync will take a long time, and will appear to stall at several points, just allow it to continue

4. Build (Ubuntu 9.10)

  • http://sites.google.com/a/chromium.org/dev/chromium-os/building-chromium-os/build-instructions
  • Make Local Repository as follows:
  • cd ~/chromiumos/chromiumos.git/src/scripts
  • ./make_local_repo.sh  (may need to enter sudo password)
  • Create build environment
  • ./make_chroot.sh
  • If you did not download the chromium browser source, you must incorporate a pre-built browser binary:
  • Download chromium browser binary: http://mohamedmansour.com/chrome/builds/chrome-linux.zip  or http://build.chromium.org/buildbot/continuous/linux/LATEST/chrome-linux.zip
  • cp ~/Downloads/chrome-linux.zip ~/chromiumos/chromiumos.git/src/build/x86/local_assets
  • If you did download the chromium browser source, you will now build it:
  • cd ~/chromiumos/chromiumos.git/src/scripts
  • ./build_chrome.sh --chrome_dir ~/chromium
  • (This will take quite some time.  It must complete successfully.  If it doesn’t you’ll likely get a blank blue gradient screen after signing in to Chromium OS)
  • Now we will build the OS
  • cd ~/chromiumos/chromiumos.git/src/scripts
  • ./enter_chroot.sh
  • Create a debug user (called USERNAME)
  • ( cd ../platform/pam_google && ./enable_localaccount.sh USERNAME)
  • ./set_shared_user_password.sh (set any password)
  • ./build_platform_packages.sh
  • ./build_kernel.sh
  • ./build_image.sh
  • When complete, a short blurb on how to convert for vmware.  Note that path will have to be changed (see next instructions)

5. Convert to VMware VDK (Ubuntu 9.10)

  • Open another terminal (must do this outside of the chroot)
  • cd ~/chromiumos/chromiumos.git/src/scripts
  • ./image_to_vmware.sh –from=/chromiumos/chromiumos.git/src/build/images/999.999.32609.025303-a1  (NOTE: Last part will change see the output from the ./build_image.sh to confirm the specific build name)
  • Copy the resulting vmdk file to the VMware shared folder
  • cp ~/chromiumos/chromiumos.git/src/build/images/999.999.32609.025303-a1/ide.vmdk “/mnt/hgfs/Ubuntu Share”
  • NOTE: in the previous command “hgfs“  of the “/mnt/hgfs” path stands for host guest file system, and is how you access shared folders in Linux under VMware.

6. Test image using VMware Fusion (Mac OS X)

  • In VMware Fusion, click File -> New
  • Continue Without Disc
  • Use an existing virtual disk
  • Select the file created, ide.vdmk (will be in ~/Desktop/Ubuntu Share)
  • Select option: Share this virtual disk with the virtual machine that created it
  • If prompted to convert, choose “Don’t convert”
  • Continue
  • Leave Linux, Ubuntu
  • Change disk size if desired, the default 20 GB is plenty of space.  Click Finish
  • Save as Chrome OS 1 (or name of your liking)
  • Start up the virtual machine
  • After booting, Chromium OS login screen should appear

Login with your google account (gmail e-mail address and password), if you receive a blank blue screen , try Ctrl-Alt-N or Ctrl-Alt-T for virtual terminal (must be a debug build for virtual terminal to work).  If your gmail account doesn’t work, try logging in with the USERNAME and password set during build.

Posted in Chrome OS | Leave a comment

File System Compression in HFS+: Space savings and performance gain?

Many modern operating systems offer compression at the individual file level.  This is most useful when it is transparent, allowing all programs and utilities to take advantage of compression without a need for specific programming.  Contrast this with compressed file or archive formats, such as zip, bzip2, gzip, which aren’t typically handled directly by applications and therefore cannot be described as transparent.  As we have discussed previously, the HFS+ file system includes transparent per file compression as of the 10.6 Snow Leopard release.

Benefits of compression

First and foremost, compression saves disk space, and this is the primary benefit.  A secondary benefit is reduced disk I/O, which is the slowest operation on a computer: disk I/O is many magnitudes slower than any CPU instruction due to the physical movement involved: seek time, rotational latency, and transfer time (the last two being dictated by rotational speed of the disk).   Therefore minimizing disk I/O offers great potential for overall performance improvement.  Less disk I/O means that loading a compressed file may actually be quicker then loading the equivalent uncompressed file. But these space and I/O time savings do not come without cost.  When initially written (and for subsequent updates) the file must be compressed, and it must be decompressed each and every time the file is read. This can involve the CPU intensively, particularly when complex algorithms (very space efficient, but CPU intensive) are utilized. With compression we are effectively trading both disk I/O and disk space for CPU cycles.  Compression can function as a performance enhancer only if the CPU cycles required for compression/decompression and the time to read the reduced data on the disk take less time overall than the total disk I/O for an equivalent uncompressed file.  An older single CPU/single core computer may be slower with compression, but multi-core/multi CPUs are now common place and represent the path forward for computer performance.  That is to say, recent computers tend to have “CPU cycles to spare”, whereas comparatively speaking traditional disk drive technology has not kept pace.

HFS+

Individual file compression is new with the Snow Leopard 10.6 release of Mac OS X.  The feature set is quite limited compared to other file systems such as NTFS.  For instance, compressing files is only possible via the terminal ditto command and there is no integration with the GUI.  Lastly, the current functionality is recommended for read only system files, not for end user data files, per the man entry for ditto.  It is expected that Apple will build upon this functionality in future releases of Mac OS X.  However, the functionality that is provided today helps produce the reduced disk foot print of Snow Leopard and likely results in improved performance as well.

In the real world

Let’s check some of our assumptions and see if we really do gain performance by compression.  We will use a custom compression tool named afsctool provided by brkirch to analyze the compression size savings (afsctool can also compress a file in place, unlike the Mac OS X ditto command.)  We will test a variety of file sizes and will compare compressed to uncompressed read performance.

First we confirm the size of the uncompressed version of a medium sized PDF file.

[Mac-Book-Pro]$ ./afsctool -v test-medium.pdf
/Users/user1/compression testing/test-medium.pdf:
File is not HFS+ compressed.
File content type: com.adobe.pdf
File data fork size (reported size by Mac OS X Finder): 5168509 bytes / 5.2 MB (megabytes) /
 4.9 MiB (mebibytes)
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended attributes: 0 bytes
Approximate total file size (data fork + resource fork + EA + EA overhead + file overhead):
5169400 bytes / 5.2 MB (megabytes) / 4.9 MiB (mebibytes)

We will then use ditto to compress the file as shown below.

[Mac-Book-Pro]$ ditto --hfsCompression test-medium.pdf test-medium-compressed.pdf

If we check size with the ls -al command, you’ll see that the reported size is the same.

-rw-r--r--   1 tplatt  tplatt    5168509 Nov 25 21:57 test-medium-compressed.pdf
-rw-r--r--   1 tplatt  tplatt    5168509 Nov 25 21:57 test-medium.pdf

However, when we confirm actual on disk size with afsctool we will see

[Mac-Book-Pro]$ ./afsctool -v test-medium-compressed.pdf
/Users/user1/compression testing/test-medium-compressed.pdf:
File is HFS+ compressed.
File content type: com.adobe.pdf
File size (uncompressed data fork; reported size by Mac OS 10.6+ Finder):
 5168509 bytes / 5.2 MB (megabytes) / 4.9 MiB (mebibytes)
File size (compressed data fork - decmpfs xattr; reported size by Mac OS 10.0-10.5 Finder):
 3735071 bytes / 3.7 MB (megabytes)
 / 3.6 MiB (mebibytes)
File size (compressed data fork): 3735087 bytes / 3.7 MB (megabytes) / 3.6 MiB (mebibytes)
Compression savings: 27.7%
Number of extended attributes: 0
Total size of extended attribute data: 0 bytes
Approximate overhead of extended attributes: 536 bytes
Approximate total file size (compressed data fork + EA + EA overhead + file overhead):
 3736352 bytes / 3.7 MB (megabytes) / 3.6 MiB (mebibytes)

You can see a substantial disk space savings of almost 28% was achieved. The on disk size is now 3,735,087 bytes or 3.7 MB. We should expect this will require significantly less disk head movement, and therefore should result in better performance.  The reduced disk read time should more than offset the CPU overhead of having to uncompress the file.  To test this, we’ll  first purge the disk cache, and then simply time the output of the file via cat.  Purging the disk cache is an important step, otherwise the file may be in disk cache (memory) and will not be read from disk.

[Mac-Book-Pro]$ purge
[Mac-Book-Pro]$ time cat test-medium.pdf 1>/dev/null
real    0m1.238s
user    0m0.001s
sys    0m0.025s
[Mac-Book-Pro]$ purge
[Mac-Book-Pro]$ time cat test-medium-compressed.pdf 1>/dev/null
real    0m0.192s
user    0m0.001s
sys    0m0.077s

You can see that our hypothesis was correct.  Reading the compressed file was substantially quicker, but did require more CPU time.  A 5.2 MB file is quite large and therefore a large amount of transfer time is involved.  Will we see significant savings with other file sizes?  I ran some testing using a variety of files, repeated each test 3 times and averaged the results.  Times are in seconds.

As you can see, there was a significant performance improvement provided by compression in all cases but one.  The 45 KB text file showed a performance DECREASE of 36%.  Why would this be?  To understand why this is, you must understand how files are stored on a block device like a hard drive.  (figure out reason here, experimentation in progress, check back later for details) Therefore, there is NO advantage to compression, because the extra CPU overhead to uncompress only adds to the total time to read.   After this revelation, you may wonder why this is the case for the 45 KB text, but not the 8 KB executable or 80 byte text file? Surely these both involve a lengthy read from disk?  These files are so small, they are compressed as an extended attribute (the 8 KB exe) and an inline attribute (the 80 byte text).  This means that the file contents are retrieved when the file’s meta data is retrieved, which eliminates a significant amount of head seek time (A normal file retrieval requires first accessing the metadata (somewhere on the disk) then the actual file contents, which are typically elsewhere, hence a head seek from the meta data position to the file content position is needed.

Summary

Apple has decided to compress the majority of the Snow Leopard system files and it is clear why they would do that, there is both space savings and performance (load time) to be gained (at least with tranditional hard disks, see caveats below).  These system files are generally small and read only.   Compressing everything on the hard disk would likely not be a wise choice as it would negatively affect performance for frequently updated (swap files, database files) or already compressed data (zipped files).  When determining what to compress, the workload and typical uses of the machine must be taken into account.

Caveats

  • I cheated a bit on the 404 MB PDF file as the Apple provided ditto command will not compress a file that large.  I used afsctool with the -c and -5 (zlib level 5 compression) parameters to achieve the compression.
  • These tests were run on a Mac Book Pro with a relatively slow 5,400 RPM hard drive.  Using a faster disk (7,200 RPM or 10,000 RPM) could produce different results as the transfer time component would be reduced.
  • Read only access performance was tested, as updates and writes are not supported currently by the operating system.  The additional overhead of compressing frequently updated files could influence performance negatively.
  • Much of the performance gain is due to rotational disk technology limitations, an SSD (Solid State Drive) would exhibit different characteristics and may not exhibit a performance increase.  Further developments in SSDs – which show great potential in both performance, noise reduction, and power consumption – will likely drastically shape the future of mass storage, but as of now they remain problematic and very expensive.
Posted in File Systems, Mac OS X | Leave a comment