Sunday, February 4, 2007

Mach-O parsing is back

The mach-o parsing was rewritten this weekend to be vastly superior to the previous implementation. This one seems to actually work and has been enabled in the default build. When xar archives a file, it checks the file magic to see if it is a mach-o file, be it a skinny or fat executable, dylib, archive, object file, or any other kind of mach-o file. The type of the file and architecture will be noted in the xar TOC, as will any dylibs the file happens to depend on. The list of dylibs is generated by parsing out the load commands, and will include any @executable_path strings.

The layout in the TOC looks something like this:
<contents>
<type>Mach-O Fat File</type>
<ppc>
<type>Mach-O Executable</type>
<library>/usr/lib/libSystem.B.dylib</library>
</ppc>
<i386>
<type>Mach-O Executable</type>
<library>/usr/lib/libSystem.B.dylib</library>
</i386>
</contents>

This would allow things built on top of xar to ensure an dependencies are satisfied prior to extraction, or dependency graph generation from the table of contents of an archive, or other analysis.

Analysis of the Mach-O file happens as each individual chunk is read in. Due to the way the Mach-O file is constructed, it is possible to generate a valid Mach-O file which cannot be parsed in a single linear scan. However, contriving such a file would not only result in poor execution performance, I am not sure it would even successfully execute on Mac OS X. In such a pathological case, xar will note the type of the file, but will not record load commands and dependent libraries.