xar blog

Tuesday, April 17, 2007

Mac OS X ACLs Added

Today support for Mac OS X 10.4 ACLs was added to xar. Mac OS X has a rather bizarre ACL API.
It uses the same function names as POSIX draft ACLs implemented by Linux and FreeBSD, however the return values and arguments have different semantics.
For example, acl_get_entry() on Linux and FreeBSD returns 1 on success. The same call on Mac OS X returns 0 on success.
Additionally, Mac OS X uses a different ACL system using UUIDs to identify users instead of username/uid. Presumably this is because a uid/username is not considered unique on Mac OS X, due to 'mobile' environments. This means instead of using ACL_TYPE_DEFAULT or ACL_TYPE_ACCESS as an argument to acl_get_file(), on Mac OS X we must use ACL_TYPE_EXTENDED. These "extended" acls do not format the same when calling acl_to_text() so it prevents the acls from being portable to other systems, but at least acl_to_text() works on Mac OS X and is read by acl_from_text().

Monday, April 16, 2007

EA handling change

The way xar handles extended attributes has been rewritten today. This will likely introduce some bugs, and breaks (at least for now) EA extraction on old archives.
Before, xar would represent EAs as:
<ea>
    <user.foo>
        ...
    </user.foo>
    <user.bar>
        ...
    </user.bar>
</ea>

However, XML has a limited characterset that can be represented in those <keys>. So, clearly this will not work for EAs with non-UTF8 characters, or EAS containing characters reserved in xml, such as <

To solve this, EA representation has changed to:
<ea>
    <name>user.foo</name>
    ...
</ea>
<ea>
    <name>user.bar</name>
    ...
</ea>

This allows us to handle the case of reserved characters and non-UTF8 characters the same way we handle them for filenames. However, xar used a path based reference system for properties in the past. So, the string "ea/user.foo" would identify the property that contained all the information about EA user.foo. Additionally, the root level identifier for properties has become non-unique. So, internally xar needed to reference the actual data structure representing the EA instead of just the path to the EA. This change affected almost every source file in libxar. However, this was a pretty glaring bug, and should make xar's EAs actually useful now.

Sunday, February 4, 2007

Mach-O parsing is back

The mach-o parsing was rewritten this weekend to be vastly superior to the previous implementation. This one seems to actually work and has been enabled in the default build. When xar archives a file, it checks the file magic to see if it is a mach-o file, be it a skinny or fat executable, dylib, archive, object file, or any other kind of mach-o file. The type of the file and architecture will be noted in the xar TOC, as will any dylibs the file happens to depend on. The list of dylibs is generated by parsing out the load commands, and will include any @executable_path strings.

The layout in the TOC looks something like this:
<contents>
<type>Mach-O Fat File</type>
<ppc>
<type>Mach-O Executable</type>
<library>/usr/lib/libSystem.B.dylib</library>
</ppc>
<i386>
<type>Mach-O Executable</type>
<library>/usr/lib/libSystem.B.dylib</library>
</i386>
</contents>

This would allow things built on top of xar to ensure an dependencies are satisfied prior to extraction, or dependency graph generation from the table of contents of an archive, or other analysis.

Analysis of the Mach-O file happens as each individual chunk is read in. Due to the way the Mach-O file is constructed, it is possible to generate a valid Mach-O file which cannot be parsed in a single linear scan. However, contriving such a file would not only result in poor execution performance, I am not sure it would even successfully execute on Mac OS X. In such a pathological case, xar will note the type of the file, but will not record load commands and dependent libraries.

Wednesday, January 31, 2007

path archival problem

If you run: xar -cf foo.xar /path/to/file, and extract it again, what do you expect to happen?
In all versions up to current trunk in subversion, xar would non-recursively archive path and path/to, so when you extract the above archive, you'd get path/to/file with all aspects of path and path/to preserved. Unfortunately, when path or path/to is a symlink, this causes problems. If path/to is a relative symlink pointing to path/from, and you extract the archive, it will fail to extract because path/from doesn't exist, and therefore can't create path/to/file.

With tar, path and path/to are not archived at all. The fact that file existed in path/to/ is preserved, but those directories are not archived. This avoids the problem entirely, although might not be what you're thinking of when you type tar cf foo.tar /path/to/file.

One alternative to tar's approach is to have a flag to xar that tells it to treat symlinks to directories (xar knows what type of file a symlink was pointing to at the time of archival, if anything), and extract it as a directory instead of a symlink. However, this is problematic in the case of xar -cf foo.xar /path/to/dir, where path/to/dir/link points to a directory, but you wanted to preserve the symlink.

It seems tar's approach makes the most sense here, and it's hard to go wrong following the behavior of such a well established and well known tool.

Monday, January 29, 2007

chflags support added

Initial chflags(2) support has been added to the repository. This support has not been integrated with the ext2 flags support, although there is some feature overlap. This has been tested on Mac OS X 10.4 and Fedora Core 5 i386. Please feel free to test and let the list know how it goes.

To elaborate on chflags(2) vs ext2 flags, here is what each looks like in the toc:
ext2 flags:


<attribute>
     <NoDump fstype="ext2"/>
</attribute>

chflags(2):


 <flags>
     <UserNoDump/>
 </flags>

Clearly nodump has similar meaning between these two representations. The obvious desire is to have a mapping engine within xar "just know" that chflags nodump is the same as ext2's nodump. Similar with some of the immutable bits.

Sunday, January 28, 2007

API Documentation updated

The API documentation has finished its migration from the web pages on opendarwin to the xar project wiki.
All the documentation should be reasonably up to date on the project wiki.

I've also created a xar-bugs googlegroup and am sending all of the Issue updates to the group. Feel free to subscribe to avoid missing out on all the juicy bug goodness.

Wednesday, January 10, 2007

xar move to code.google

xar is moving from OpenDarwin to code.google.com hosting and is now located here: http://code.google.com/p/xar/
The intent of this blog is to post xar discussion, features, and design considerations.

Currently, progress is being made on transcribing the API documentation from html pages to the wiki at code.google.
The xar code has been imported into the code.google repository, and the 1.4 release tarball and xarball are now available from the project download section.
There are 2 google groups for the xar project, one for discussion, and one that receives commit messages from the subversion repository.