Sunday, October 7, 2007

LZMA, streaming API, -tv support

There have been a lot of developmental changes to xar lately, the highlights being:
1) LZMA support, added by Anders Björklund of the RPM5 project. LZMA support depends on the currently beta LZMA utils library, which is under GPL. LZO support was investigated, although the current LZO library is under GPL.
2) A new streaming API has been added, allowing the caller to extract a file's data in chunks rather than requiring it to exist entirely in memory or on disk. This functionality was added by Charles Srstka, author of Pacifist.
3) -tv support in the xar(1) command line utility now outputs in a format similar to tar, displaying file mode, owner, group, size, and name.
4) A number of other bugs were fixed, and minor features added, including a --keep-existing option, and the options k, j, and z added for compatibility with tar, the various compression modules were cleaned up, and some bugs with malformed archives were resolved.

All of these changes can be found in trunk of xar's subversion repository.

Sunday, June 17, 2007

QuickLook

This weekend a QuickLook module for the Mac OS X 10.5 (seeds) was checked into the subversion respository. This brings the list of Mac OS X 10.5 integration utilities up to:

  • Finder Contextual Menu PlugIn

  • Spotlight PlugIn

  • QuickLook PlugIn



There are no official binaries of these available yet, but you're welcome to download and build them yourself. I've also put together a trial .pkg for those with 10.5 seeds here.

Saturday, June 16, 2007

1.5.1, WWDC, RPM, Spotlight

Last week was Apple's developer conference, WWDC, which I was able to attend. Apple sent me a free pass, which was a great opportunity to meet with Apple's Installer team manager and talk to him about Mac OS X 10.5's use of xar for their new package format. 10.5's use of xar has already been discussed, but apparently xar has provided Apple with huge wins for Leopard. They use xar to create a single file archive, unlike the previous bundles (aka directories full of files), the archive signing brought package signing to 10.5 for free, and the bzip2 compression has provided huge space savings for them on the install media.

Another prominent use of xar lately has been the new rpm5. The new version of RPM will be using xar.

Right before WWDC, xar 1.5 and 1.5.1 were released, which has a TON of new bug fixes, including much better EA archival. This release seems to have shaped up to be the best, most reliable one yet.

While at WWDC, I attended the Spotlight and QuickLook sessions, and wrote a Spotlight plugin for xar, checked in to the repository here. The plugin just sets the creator to xar, the file type to xar, and puts the content listing into the text content field. This basically allows for searching for filenames within an archive, and it returning the archive. A QuickLook plugin is also on the drawing board.

Tuesday, April 17, 2007

Mac OS X ACLs Added

Today support for Mac OS X 10.4 ACLs was added to xar. Mac OS X has a rather bizarre ACL API.
It uses the same function names as POSIX draft ACLs implemented by Linux and FreeBSD, however the return values and arguments have different semantics.
For example, acl_get_entry() on Linux and FreeBSD returns 1 on success. The same call on Mac OS X returns 0 on success.
Additionally, Mac OS X uses a different ACL system using UUIDs to identify users instead of username/uid. Presumably this is because a uid/username is not considered unique on Mac OS X, due to 'mobile' environments. This means instead of using ACL_TYPE_DEFAULT or ACL_TYPE_ACCESS as an argument to acl_get_file(), on Mac OS X we must use ACL_TYPE_EXTENDED. These "extended" acls do not format the same when calling acl_to_text() so it prevents the acls from being portable to other systems, but at least acl_to_text() works on Mac OS X and is read by acl_from_text().

Monday, April 16, 2007

EA handling change

The way xar handles extended attributes has been rewritten today. This will likely introduce some bugs, and breaks (at least for now) EA extraction on old archives.
Before, xar would represent EAs as:
<ea>
    <user.foo>
        ...
    </user.foo>
    <user.bar>
        ...
    </user.bar>
</ea>

However, XML has a limited characterset that can be represented in those <keys>. So, clearly this will not work for EAs with non-UTF8 characters, or EAS containing characters reserved in xml, such as <

To solve this, EA representation has changed to:
<ea>
    <name>user.foo</name>
    ...
</ea>
<ea>
    <name>user.bar</name>
    ...
</ea>

This allows us to handle the case of reserved characters and non-UTF8 characters the same way we handle them for filenames. However, xar used a path based reference system for properties in the past. So, the string "ea/user.foo" would identify the property that contained all the information about EA user.foo. Additionally, the root level identifier for properties has become non-unique. So, internally xar needed to reference the actual data structure representing the EA instead of just the path to the EA. This change affected almost every source file in libxar. However, this was a pretty glaring bug, and should make xar's EAs actually useful now.

Sunday, February 4, 2007

Mach-O parsing is back

The mach-o parsing was rewritten this weekend to be vastly superior to the previous implementation. This one seems to actually work and has been enabled in the default build. When xar archives a file, it checks the file magic to see if it is a mach-o file, be it a skinny or fat executable, dylib, archive, object file, or any other kind of mach-o file. The type of the file and architecture will be noted in the xar TOC, as will any dylibs the file happens to depend on. The list of dylibs is generated by parsing out the load commands, and will include any @executable_path strings.

The layout in the TOC looks something like this:
<contents>
<type>Mach-O Fat File</type>
<ppc>
<type>Mach-O Executable</type>
<library>/usr/lib/libSystem.B.dylib</library>
</ppc>
<i386>
<type>Mach-O Executable</type>
<library>/usr/lib/libSystem.B.dylib</library>
</i386>
</contents>

This would allow things built on top of xar to ensure an dependencies are satisfied prior to extraction, or dependency graph generation from the table of contents of an archive, or other analysis.

Analysis of the Mach-O file happens as each individual chunk is read in. Due to the way the Mach-O file is constructed, it is possible to generate a valid Mach-O file which cannot be parsed in a single linear scan. However, contriving such a file would not only result in poor execution performance, I am not sure it would even successfully execute on Mac OS X. In such a pathological case, xar will note the type of the file, but will not record load commands and dependent libraries.

Wednesday, January 31, 2007

path archival problem

If you run: xar -cf foo.xar /path/to/file, and extract it again, what do you expect to happen?
In all versions up to current trunk in subversion, xar would non-recursively archive path and path/to, so when you extract the above archive, you'd get path/to/file with all aspects of path and path/to preserved. Unfortunately, when path or path/to is a symlink, this causes problems. If path/to is a relative symlink pointing to path/from, and you extract the archive, it will fail to extract because path/from doesn't exist, and therefore can't create path/to/file.

With tar, path and path/to are not archived at all. The fact that file existed in path/to/ is preserved, but those directories are not archived. This avoids the problem entirely, although might not be what you're thinking of when you type tar cf foo.tar /path/to/file.

One alternative to tar's approach is to have a flag to xar that tells it to treat symlinks to directories (xar knows what type of file a symlink was pointing to at the time of archival, if anything), and extract it as a directory instead of a symlink. However, this is problematic in the case of xar -cf foo.xar /path/to/dir, where path/to/dir/link points to a directory, but you wanted to preserve the symlink.

It seems tar's approach makes the most sense here, and it's hard to go wrong following the behavior of such a well established and well known tool.