NuFX Addendum
Home ] Up ] [ NuFX Addendum ] ProDOS Attribute Preservation ]


NuFX Addendum - By Andy McFadden - Last revised 2022/11/06

This addendum clarifies and extends certain aspects of the NuFX specification. This is not an "official" modification of the original document - it has not been reviewed and approved by the original author - but anyone developing NuFX utilities would do well to follow these recommendations.

Purpose

The NuFX specification defines a very loose structure, and leaves much to the imagination of the implementer.  For example, "If a utility finds a redundancy in a Thread Record, it must decide whether to skip the record or to do something with that particular thread...".  A strict specification would define a standard approach that all applications must follow when dealing with the anomalous condition, to ensure consistent handling of all archives.

This document refines the NuFX specification and brings some of the "fuzzy" areas into sharper focus.  Nothing in this document contravenes the original document.

In the text below, "must" is an imperative that has to be obeyed, and "should" is a recommendation that authors are strongly encouraged to follow.


Clarifications

Pronunciation

What's the correct way to pronounce "NuFX"? One approach is letter-by-letter ("en you eff ecks"), another is minimal-syllable ("new fix"). According to the file type note, it's a bit of both ("new eff ecks").

 

Use of ".SDK" suffix

Originally, only ".SHK" was used to represent a NuFX archive.  Over time, a convention of using ".SDK" to represent archives with a single disk image in them has arisen.  This is very convenient for emulators on systems that rely on the file extension (e.g. Windows), so use of ".SDK" is encouraged.

 

Archives with no records

An archive without records, i.e. nothing but a master header block, serves no purpose. However, it can be useful to have a "create new archive" operation that creates an empty file to be populated later.

Creating: Archives without any records in them may be created.

Opening: If asked to open a record-less archive, the application should recognize that the archive is empty and proceed as if it were a new archive.

Modifying: If all records in an archive are deleted, the archive file should be deleted as well.

 

Records with no threads

A record without threads is pretty pointless. The initial release of the NuFX spec mandated that there be at least one thread attached to each record, but this language was removed from later versions.

Creating: Records without threads must never be created.  All records must have at least one thread.

Extracting: Empty records should be ignored.

 

Records with only a filename thread

GS/ShrinkIt v1.1 has a bug that prevents it from creating an empty data thread when asked to add a zero-byte file.  This results in a thread with a filename and nothing else.  (If it was the first new record added, it will have an empty comment thread as well.)

GS/ShrinkIt does nothing when asked to extract records without threads.

Creating: Records composed solely of a filename thread must not be created.

Extracting: Records with nothing but a filename thread should be ignored.  For GSHK v1.1 bug compatibility: if a record has a filename thread, and no other threads except "message" threads (i.e. no data threads or control threads), then a zero-byte data fork file should be created.  Otherwise, the record should be ignored.  If the ProDOS storage type field indicates an extended file, a zero-byte resource fork should also be created.

 

Records with no filename

A record without a filename thread is a curious beast.  Ideally, there wouldn't be any such thing as a filename thread, since it doesn't really make sense to have a record without one.  Expanding the record header to hold a pre-sized buffer would've made many things simpler.

This particular situation occurred with older versions of ShrinkIt (e.g. v1.1) that failed to store a volume name when compressing a DOS 3.3 disk.  There was no filename in the record header, nor one in a filename thread.

The only situation where a record without a filename makes sense is if the record holds nothing but comments or other archive "meta data", such as a "create directory" control thread.

Creating: Records without filenames must not be created, unless the record is intended to contain nothing but archive meta-data.  Deletion of the filename thread should only be done if a new filename thread is being added.  If data threads are added to a record without a filename, then a filename thread must be added as well.

Extracting: If the record contains file data, the application may either prompt the user for a filename to use, or supply a generated one.

 

Records with more than one filename thread

This is an unusual situation that should only arise if an application is buggy.  Every record created by a modern application should have no more than one filename thread.

Creating: Records with multiple filename threads must not be created.

Extracting: Applications must use the first filename thread.  If a buggy application wants to append an additional filename thread, their buggy filename will be ignored.

 

Records with filenames in two places

The old way of storing filenames, used by NuLib and old versions of ShrinkIt, was to put the filename in the record header.  To facilitate renaming, the filename was moved into a thread.  Thus, there are two possible locations where the filename may live, and no guarantee that only one will be used.

Creating: Never put the filename in the record header when creating a new record.  It's okay to leave existing records alone, but if an application has the opportunity to rewrite the record header, the record filename must be removed.

Extracting: The thread filename takes precedence over the record header filename.

 

Filename character set

Filenames in NuFX archives use the Mac OS Roman character set, which is ASCII plus some symbols and the usual set of latin language characters (see Unicode definition).  The NuFX filename definition was intended to accommodate files from HFS volumes, which may contain any character except ':'.  Control characters, including NUL ('\0'), were allowed but discouraged.

On modern systems, converting between Mac OS Roman and Unicode is useful and (mostly) straightforward.  Dealing with embedded null bytes is very annoying in C-like languages though.

Creating: Convert Unicode to Mac OS Roman, replacing any untranslatable characters with '?'.  Embedded nulls may be replaced with '?'.

Extracting: Convert Mac OS Roman to Unicode.  If embedded nulls are encountered, they may be replaced with something appropriate for the current system.  Applications should not ignore the problem and truncate the filename; if they do, they must be prepared to handle duplicate or empty filenames.

 

Filesystem separator characters

Every record header has a "file system separator" character ("fssep") in the "file_sys_info" word.  This is usually something like ':' for GS/OS or '/' for UNIX.  It's necessary to know what the separator is in order to break a pathname down into its individual components.

Not all filesystems support subdirectories, however, which means that not all filenames need to have a separator.  The appropriate separator character for such a filesystem is not defined in the NuFX spec.  Clearly it should be something illegal on the source filesystem, or we could inadvertently see pathnames where they don't exist (e.g. a file called "foo:bar" on DOS 3.3 if the fssep char were set to ':').

The trouble is, DOS 3.3 doesn't actually have any illegal characters, just a field of 30 characters padded with spaces.  Pascal disks are similar.  Since we must define an fssep for every filename, our best choice is to use '\0' (0x00), because it's unlikely to occur, and any program that stores names in C strings will find it awkward to store and scan for '\0'.

This situation also applies to archived disk images, which must be simple filenames.

The application should have some understanding of which filesystems have subdirectories and which don't, which would allow it to disregard the fssep char when it can't be relevant for a record, but it's easier to let the fssep char's usefulness be self-evident.

(NOTE: NufxLib v2.0.3 rejected 0x00 as an fssep character. This was a bug.)

Creating: When adding files directly from filesystems without subdirectories, use 0x00 as the fssep char.

Extracting: An fssep char of 0x00 means the pathname is just the filename.

 

Disk image pathnames

While files may have multiple path components (e.g. "subdir:subdir2:filename"), it makes no sense for disk images to have them.  The stored filename for a disk is either the disk's ProDOS volume name, or for non-ProDOS disks, a simple label defined by the user.  Since the eventual target is a disk device, specifying a subdirectory path makes no sense.

The issue becomes a little more confusing when storage of disk images used for emulators is considered.  At first glance, it seems useful to be able to store a hierarchy of disk images.  In practice, such images would either be archived as a hierarchy of .PO files, or as an archive of .SDK archives.

Ultimately, the disk volume name is embedded in the disk image itself. The name stored in the archive is purely decorative.

Adding/renaming Applications must strip any leading path components from disk image "storage names" (The NuFX specification does explicitly forbid the use of a filesystem separator character in a disk volume name.)

Extracting: Applications extracting directly to a disk must strip leading path components before assigning the ProDOS volume name.  Applications extracting images to a file don't need to do anything unusual.

 

Filename case sensitivity

There isn't a "filename is case-sensitive" flag in NuFX archives.  Since it was designed primarily for ProDOS and HFS filesystems, neither of which is case-sensitive, we should assume that case is not meant to be significant when determining whether two records have the same filename.  This becomes important when adding files (to test for duplicates), extracting files by name, and when attempting to display archive contents as a hierarchical tree.

HFS files will use the Mac OS Roman character set, so a simple ASCII case conversion will be inadequate. An HFS filename comparison routine must be used.

Applications should try to recognize that "foo/bar", "foo/BAR", and "FOO/bar" are the same file, but it's probably not worth "probing" a case-sensitive filesystem like Linux ext2 to guarantee such.

 

Duplicate filenames

There is nothing in the NuFX specification that prevents having more than one file with the same name in an archive.  In practice, this is inconvenient, especially for users with command-line tools.  On the other hand, if the underlying filesystem is case-sensitive, the extracted files may not actually collide, so it may not make sense for all applications to treat this as an iron-clad rule.

When comparing names, be sure to take the filesystem separator character into account.  "foo:bar" could be a simple filename or a partial pathname depending on whether ':' is the separator.  Two names should be considered identical if each distinct path component matches, so "foo/bar" and "foo:bar" are identical if the separators are '/' and ':', respectively.  Comparisons should be case-insensitive.

Adding/renaming: Applications should prevent multiple records from having the same case-insensitive filename.

 

Pre-sized or not pre-sized

The specification declares that filename threads and comments use pre-sized buffers.  It does not define what other members of the message and filename classes are, which makes it difficult to know what to do with a request to create a heretofore undefined thread type.  The NuFX format does not provide any definitive clue as to whether a thread is pre-sized, so such decisions must be based on the thread class and thread kind.

Filename threads and comment threads are pre-sized.  All other threads are not pre-sized (including other members of the "filename" and "message" classes).

 

Proper pre-size for filename threads

ShrinkIt allocates a 32-byte pre-sized buffer for the filename.  If the filename is larger than 32 bytes, the buffer grows to fit the filename exactly.  If renaming files is considered useful, then the buffer should always be slightly larger than is needed to hold the filename.  (Filenames longer than 32 characters are most likely the result of nested directories, so renaming the file itself is inhibited if the buffer length is an exact match.)

Side note: the specification does not specify a minimum or maximum length for a filename. The specification notes that "GS/OS can create 8,000-character filenames", so that seems safe to use as an upper bound. Zero-length filenames cannot be stored in a record header, because a length of zero indicates that the filename is in a thread, so it's reasonable to require that filenames be at least one character long. A zero-length filename should be treated the same as a missing filename.

Creating: If GS/ShrinkIt compatibility is not important, all filenames should have at least 8 bytes of free space in the filename thread.  For GSHK compatibility, the filename thread compThreadEOF must be the greater of 32 and the filename length.

Renaming: It is acceptable to have fewer than 8 bytes of free space remaining after a file is renamed.  However, if the filename itself exceeds the buffer size and the thread must be rebuilt, the 8-byte padding should be added.

 

Thread ordering

The NuFX specification specifies a general ordering for threads ("blocks must occur in the following fashion"), but doesn't indicate what should be done if they appear out of order. Handling out-of-order threads isn't impossible, but it can be inconvenient.

For example, if an archive is being unpacked as it is received, it is important to know the filename before receiving the data. If the filename thread comes after the data threads, the application has to write the incoming data into a temp file, and then rename it later when the filename thread finally shows up. It would also be nice to be able to display file comments as the file is being downloaded.

Creating: The filename thread must precede all other threads. The recommended ordering for common thread types is:

  • Filename
  • Message(s) (i.e. comments)
  • Data threads (data fork, resource fork, disk image)
  • all other threads

Extracting: If the filename thread does not appear before the first data-class thread, the record may be ignored.

 

Incompatible thread types

There are some combinations of threads that must never appear in a single record.

Creating:

  • If a data fork is present, the record must not contain another data fork or a disk image.
  • If a resource fork is present, the record must not contain another resource fork or a disk image.
  • If a disk image is present, the record must not contain another disk image, a data fork, or a resource fork.
  • If a control-class thread is present, the record must not contain any data-class threads.

Extracting: When incompatible threads are found, they should be ignored in favor of the earlier threads.  For example, if two data forks are found in the same record, only the first one should be extracted.  If a data-class thread is found first, subsequent control-class threads should be ignored, and vice-versa.

 

Compressed threads

Some threads are compressed, some aren't.  The specification isn't very specific.

All data-class threads may be compressed.  All other classes of threads must not be compressed.

 

ProDOS storage type

The ProDOS storage type has little meaning on most systems.  However, certain values are significant.

  • For records with only a data fork, the storage type must be one of 0, 1, 2, or 3. The specific choice is not useful to anyone, but a nonzero value (say, 1) should be used.

  • For records with a resource fork, the storage type must be "5" (ProDOS extended file).

  • For records with a disk image thread, the storage type must be equal to the disk block size (typically 512).

  • For records without data-class threads, the storage type must be "0".

Storage type 0x0d, which is used by ProDOS for directories, must not be used.

It is important to update the storage type as threads are added and deleted, so that it always accurately reflects the contents of the record.

The spec seems to claim that HFS volumes have 524 bytes per block (though the assertion was weakened from "would" to "might" in the final version). This refers to the 12 "tag" bytes available on 3.5" floppies, which are accessible from Mac OS but not actually required by HFS.

 

GS/OS option lists and HFS file types

GS/OS was designed to work with a variety of different filesystems. Instead of trying to handle all conceivable file attributes explicitly, GS/OS returns filesystem-specific values in "option lists". These can be provided to the get/set file info calls when copying files around.

Files on HFS volumes have two four-byte values, called file type and creator, that identify the file contents. These are part of the Macintosh Finder info structures, called FInfo and FXInfo. Files copied from HFS to ProDOS may have this data stored in the extended key block of a forked file (see ProDOS technical note #25). This appears as two 18-byte chunks, consisting of a size byte followed by a type byte, and then 16 bytes of FInfo or FXInfo data (which are defined in Inside Macintosh: Macintosh Toolbox Essentials, page 7-47). To expose the data to applications, certain GS/OS calls pass an "option list" with the contents. Most of the fields are uninteresting to anything but the Mac Finder on the system where the files were stored, so for our purposes the option list may be viewed simply as a way to preserve the file type and creator.

Experiments with the GS/OS Exerciser reveal that the option list returned doesn't include the size/type bytes. For an HFS file copied to ProDOS with GS/OS, the GetFileInfo call returns a 32-byte buffer that begins with FInfo. When called on an HFS volume, the option list is 36 bytes, with the last four bytes set to 02 00 00 00. GSHK appears to record these exactly as it receives them, which means the first four bytes hold the HFS file type, and the second four bytes hold the HFS creator, in big-endian byte order. Because most of the fields only have meaning to the Macintosh finder, the rest of the data is zeroes. Files archived from an HFS volume created by a Macintosh would presumably have nonzero data in more places.

When archiving files from an HFS volume under GS/OS, GSHK records the ProDOS type/auxtype rather than the full HFS file type and creator, because that's what the GS/OS file info query returns. The only way to recover the original Mac Finder types is through the option list.

Sometimes the option list found in a NuFX archive is a little messed up, e.g. the size field says 36 bytes, but there's only space for 18 bytes in the record header.

Side note: the NuFX specification reversed the values of MFS and HFS in the file_sys_id enumeration. In practice, GS/ShrinkIt correctly uses the GS/OS FST definitions: MFS=5, HFS=6.

Opening: Assume the option_size field is correct unless it exceeds attrib_count-2. If it's too large, clip it down to size. If the filesystem type is ProDOS or HFS, the option list is at least 16 bytes long, and the second 4 bytes are nonzero, use the first 4 bytes of the option list data as the file type and the second 4 bytes as the creator. If a secondary test is desired to avoid garbage, the creator value is usually ASCII.

Creating: If a record has HFS type values, generate a filesystem-specific option list (32 bytes for ProDOS or 36 bytes for HFS) and store them there.

Updating: Always output the actual record size. Do not propagate incorrect size values. Retaining option lists for ProDOS and HFS entries is required, since they may have the only copy of the original file type and creator, but only if at least one of the first 8 bytes of the option list are nonzero. Updates to the archive attributes that alter the file/aux type should usually retain the option list, since the purpose may be to improve ProDOS usability without losing the original type information.

 

ProDOS vs. HFS file types

The initial release of the specification stated that the HFS file type and creator should be stored in the record header. The final version of the specification abdicates responsibility for defining the field, stating simply, "For ProDOS 8 or GS/OS, this field should always be what the operating system returns when asked".

For reference, when an application asks GS/OS to get the information for a file on an HFS volume, it returns a ProDOS file type and aux type (usually BIN), and puts the HFS type and creator into an option list. If this behavior defines the field, then this is how the types should be stored.

However, the vague wording of the specification raises the possibility that a Mac OS-based archiver should store the file type and creator directly in the record header, because that's what "the operating system" returned. The record header does not provide a way to define the source of the type values, so an extraction program attempting to set the file info would need to draw conclusions based on whether the types are small enough values to be valid for ProDOS.

It's worth noting that files on an AppleShare volume have independent ProDOS and HFS file types. When a ProDOS file is written to the AppleShare FST, Mac OS type and creator values are generated according to a scheme documented in the AppleShare FST public ERS document. It's possible that a Mac archiver could store ProDOS file types as HFS file types that are actually ProDOS file types that must be decoded based on a collection of rules.

To avoid ambiguity, we want to follow the GS/OS behavior, regardless of what the host operating system does.

Creating: store the ProDOS file type and aux type in the record header. For files on HFS volumes, put a simple ProDOS type (BIN or TXT) in the record header, and put the file type and creator in an option list.

Extracting: if the file type and aux type do not fit in 8 and 16 bits, respectively, treat them as values from HFS.

 

Disk image size values

For a compressed disk image, the "storage_type" and "extra_type" fields take on different meanings: the extra_type field holds the block size (usually 512), and the extra_type field holds the block count (e.g. 280 for a 140KB disk).

These fields are more important than you might expect, because ShrinkIt doesn't appear to set the thread EOF value for disk images. (A quick test with ShrinkIt v3.4 on a 5.25" DOS disk yielded a thread EOF of zero, while GS/ShrinkIt v1.1 on a 3.5" ProDOS disk generated a mysterious thread EOF of $4a00.) Worse, some older versions of ShrinkIt tended to leave the "storage_type" set to 2. Apparently, ShrinkIt just uses extra_type * 512 as the uncompressed size when trying to figure out what sort of disk it has. An early version of GS/ShrinkIt went one step further: it used a block count of 280 with a block size of 256, resulting in archives that apparently held 70K disk images.

It is simple enough to disregard the thread EOF value, and replace the storage_type when it is absurdly small, but there is a deeper problem. If you delete a 140KB disk image thread and replace it with an 800KB disk image thread, the block count stored in the extra_type no longer accurately reflects the contents of the record. (This linkage between the record header and the thread contents is the reason why this document forbids mixing of disk image threads with any other data-class thread, including other disk images.)

Because the length of the disk image thread can only be determined from the extra_type field, it is important for applications that support changing the file and aux types to prevent such changes in records with disk images.

Creating: Applications must update the record's storage_type and extra_type fields whenever a disk image thread is added. The value (storage_type * extra_type) must be equal to the uncompressed size. The application should reject disk image files that are not a multiple of 512 bytes. For consistency with other applications, the thread EOF field should be zeroed.

Extracting: The application must ignore the thread EOF, and normalize storage_type to 512 if it is less than 16 (0x0f is the largest valid ProDOS storage type). The value (512 * extra_type) should be used as the uncompressed size. If the uncompressed size is zero, the thread may be ignored.

 

Access permissions

NuFX supports four boolean access permission flags (read, write, destroy, rename) and two boolean attributes (backup needed, invisible) in the "access" field.  This matches up with ProDOS capabilities nicely, but very few other operating systems support all six.

Applications authors should consider the following approaches:

  1. Preserve all.  All flags in the access field must be preserved.  It is not required that the extracted files obey the original semantics -- an "invisible" file might be visible, and a file with "rename" disabled might still be rename-able -- but when the files are re-added, the permissions must match.

  2. Locked/unlocked.  A file with read enabled, and write, destroy, rename, and invisible disabled, is considered "locked" (access 0x01 or 0x21).  All other files are considered "unlocked".  When a file is extracted and then added to an archive, the locked/unlocked status must be preserved.  Locked files are added with access 0x21, and unlocked files are added with access 0xe3.

It is acceptable for an application to find a middle ground between these two, and preserve more of the flags accurately than approach #2 does, but approach #2 should be considered the minimum acceptable level of support.

 

Empty directories

Directories do not need to be stored explicitly unless they are empty.  The NuFX specification manages to avoid describing how directories are actually supposed to be stored, saying only: "A Thread Record must exist to inform a utility that a directory is to be created through the use of the proper control_thread value."

What is in a "create directory" control thread?  It appears that the intent was to have the thread contain the pathname that needed to be created.  In theory, you could have several of these things, and create an entire hierarchy from a single record.  Such threads should not be compressed, but their compThreadEOF should always match their threadEOF (i.e. they're not pre-sized).

It's a little tricky to say, "add a control thread whenever you find a directory with nothing in it".  What if the directory has files in it, but you don't have the access permissions necessary to read the files?

Does such a record require a filename?  Probably not.  However, if it doesn't have a filename, ShrinkIt might not display the record, and you'd have no way to manipulate it.  Adding a "record label" is easy and useful.

(I'm strongly tempted to punt on the control threads and just use storage type 0x0d to indicate that a directory should be created.  This is in direct opposition to the NuFX specification, however, so I'm reluctant to do so.)

Creating: Applications not interested in preserving empty directories need do nothing.  Otherwise, the application must add a "create directory" control thread whenever a directory is encountered for which no files are added to the archive.

Extracting: A directory must be created when a control thread is present.  As noted in the NuFX specification, the application must also create any directories listed in the record's pathname that don't yet exist.

 

Message thread format

The specification says that message threads are ASCII text, but doesn't specify an EOL character. For the benefit of Apple II utilities, it's best to use a carriage return (Ctrl+M). The comments are expected to be readable on 8-bit Apple IIs, so plain ASCII rather than Mac OS Roman should be used.

Creating: Convert any EOL markers to CR, and any non-ASCII characters (i.e. bytes with the high bit set) to ASCII.

Extracting: Assume that the comment may be using CR, LF, or CRLF, and convert as needed for display. GS/ShrinkIt used a proportional font, so there is no need to worry about formatting to preserve "ASCII art" in comments.

 

Message thread maximum length

Comments are rarely used, and when they are they tend to be fairly short. The contents are never compressed, aren't covered by a CRC, and aren't extracted to files, making them a bad way to convey vital information. Adding and editing the comment field was introduced with GS/ShrinkIt, which creates a pre-sized comment on the first entry in each batch. The editor does not expand or reduce the length of the field, which is limited to 1,000 bytes. It does support longer comments created by other programs.

It's convenient to assign a maximum possible length to comments, so that they can be manipulated by code that doesn't need to handle their maximum possible length of 4GB. A cap of 64KB (same as ZIP) seems reasonable as an absolute maximum, considering likely content and what Apple II software can support.

Creating: Limit comments to 64KB. Applications may establish a lower limit, but should allow them to be at least 1000 bytes.

Updating: Truncation of comments longer than 64KB is discouraged but allowed.

 

Master EOF

For the most part, ShrinkIt correctly sets the MasterEOF field in the Master Header block. The field was introduced with version 1 of the header definition. A very old version of ShrinkIt left it set to zero (this is the same version that completely omitted the filename for DOS 3.3 disk images). GS/ShrinkIt appears to initialize it to 48 (the size of the MH block), and if the creation process is interrupted you can end up with a partial archive with a nonzero EOF.

The master EOF is useful as a quick file truncation test, but provides no other value. The record count in the header is more important.

Opening: Don't assume the master EOF is accurate. Walk through the list of records to determine the actual end-of-file before appending new records.

Updating: Applications must write the correct MasterEOF value if an archive is modified.

 


Extensions

Unofficial extensions to the NuFX specification.  Anyone working with NuFX archives should take heed.

New compression formats

Thread formats 0x0000 through 0x0005 are already defined.  The following thread format values have been added:

  • 0x0006 - deflate. The thread contains data conforming to RFC 1951 (deflate 1.3 specification), which is the compression format used by ZIP and gzip. The canonical implementation is "zlib". Visit zlib.net for more details.

  • 0x0007 - bzip2.  The thread contains BWT+Huffman compressed data as output by "libbz2". Visit sourceware.org/bzip2 for more information.

Support for these formats is nonexistent on the Apple II, so they should not be used except in situations where compatibility is unimportant (e.g. collections of disk archives for use with A2 emulators).

I found that "deflate" generally does as well or better than "bzip2" on Apple II binaries, disk images, and small text files.  Deflate is also faster and uses less memory, and you're more likely to find libz installed on a given system than you are libbz2  For these reasons, use of deflate should be encouraged in favor of bzip2.


NuFX Quirks

This section identifies some quirks in NuFX or ShrinkIt that, while not bugs, are worth noting.

Filename separator character

Originally, the filename was stored in the record header, so it made sense that the filename separator character ("fssep char") should also be there.  When the filenames were moved into threads, the fssep char got left behind.  If a record has two filenames, they'd better have the same fssep char, or interpreting one of them will be impossible.  (This is one of the reasons why it's important to clearly define which filename takes precedence in all circumstances.)

Files with zero or two CRCs

The "threadCRC" field in the thread header block can have one of three meanings: nothing (v0, v1), the CRC of the compressed data (v2), or the CRC of the uncompressed data (v3). Version 2 records weren't generated by anything significant, and can be ignored. (If you actually find an archive with v2 records, it's reasonable to just treat them as v1.)

Version 1 records generally have threads compressed with LZW/1 data. The LZW/1 compression format includes the 16-bit CRC of the uncompressed data at the start of the thread. Version 3 records generally have threads compressed with LZW/2 data, which does not include a CRC.

Applications like P8 ShrinkIt and NuLib create v1 records and compress with LZW/1, while GS/ShrinkIt and NuLib2 create v3 records and compress with LZW/2. This means that each compressed thread has exactly one CRC. (Uncompressed data stored by P8 ShrinkIt has no CRC at all.) So what happens if you tell NuLib2 to create a new record with LZW/1, or tell it to add a new LZW/2 thread to an existing v1 record?

In one case, you end up with two CRCs; in the other, you end up with no CRC on your data at all. Unfortunately, the v3 thread CRC is computed with a different initial value, so it is necessary to compute the CRC twice for LZW/1 data, not merely store the same value twice.

When replacing a data thread in an existing record, it's tempting to update the record to the latest (v3), but this may come at a cost. For example, if the record has both resource and data forks, and only the data fork is being replaced, it would be necessary to uncompress the resource fork to calculate its uncompressed CRC. Programs that rewrite records should be prepared to output v1 or v3.

Extra data in compressed threads

ShrinkIt adds an extra byte at the end of all LZW compressed data, probably due to an off-by-one bug in the compression code.  It turns out that it's possible to get even more "extra" bytes at the end.

ShrinkIt's LZW-I algorithm always operates on a 4K buffer, largely because it was originally designed for compressing 5.25" disks with 4K tracks.  On small files, or at the end of a large one, the last bit of data is padded out to 4K and then compressed.  Ordinarily this is barely noticeable, because the compression routines do an RLE (Run-Length Encoding) pass before applying LZW.

However, if both RLE and LZW fail to make the 4K block any smaller, it is stored without compression.  This means the whole 4K, complete with padding, gets written to the archive.  This doesn't cause any problems, but can make you wonder where all the extra bits came from.

The SQ compression algorithm, as implemented by Don Elton's SQ3, appears to add an extra 0xff to the end of the compressed data.  It can safely be ignored.

Preserving BXY and SEA wrappers

Preserving BXY wrappers is pretty easy, since the Binary II format is well documented.  Updating block counts and file lengths is all that is required.

Preserving SEA wrappers is a little more obscure, since there is no documentation on the format. A bit of reverse engineering reveals that SEA files are OMF executables with two segments. The first segment holds the extraction code, and is the same for all archives. The second holds the NuFX data, and requires that a few length values in the segment header be adjusted. Also, to be correct, the file must have a $00 byte appended after the NuFX data (it's an OMF "END" opcode).

The archives have a minor bug: an offset field in the header is off by one, so actually loading the segment in GS/OS would likely fail. The segment header has the "skip" flag set, though, so this isn't a problem in practice.

Y2K

The NuFX standard says that the Date/Time format is the same as that returned by the IIgs ReadTimeHex toolbox call.  That call returns the year as (year - 1900), so the year 2000 is stored as "100".  ProDOS 8 clock drivers, on the other hand,  return 40-99 for 1940-1999, and 0-39 for 2000-2039.  As a result, archives created with P8 ShrinkIt use 0 for the year 2000 instead of 100.

When creating archives, always use 100 for the year 2000, but also accept the year 0.  However, if you find a Date/Time with zero in all useful fields (second, minute, hour, day, month, year), treat it as an unspecified date rather than midnight of January 1, 2000.


This document is Copyright © 2000-2004 by Andy McFadden.  All Rights Reserved.

The latest version can be found on the NuLib web site at https://www.nulib.com/.