Minggu, 18 Maret 2012

ZIP File Strukture


Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is. The zip file format permits a number of compression algorithms.
The format was originally created in 1989 by Phil Katz, and was first implemented in PKWARE's PKZIP utility,as a replacement for the previous ARC compression format by Thom Henderson.
The zip format is now supported by many software utilities other than PKZIP. Microsoft has included built-in zip support (under the name "compressed folders") in versions of Microsoft Windows since 1998. Apple has included built-in zip support in Mac OS X 10.3 (via BOMArchiveHelper, now Archive Utility) and later, along with other compression formats.
Zip files generally use the file extensions ".zip" or ".ZIP" and the MIME media type application/zip.Zip is used as a base file format by many programs, usually under a different name.
Zip files are often represented by a document or other object prominently featuring a zipper.


Structure
A zip file is identified by the presence of a central directory which is located at the end of the structure in order to allow the appending of new files. The central directory stores a list of the names of the entries (files or directories) stored in the zip file, along with other metadata about the entry, and an offset into the zip file, pointing to the actual entry data. This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files. The entries in the zip file also include this information for redundancy.
The order of the file entries in the directory need not coincide with the order of file entries in the archive.
Each entry is introduced by a local header with information about the file such as the comment, file size and file name, followed by optional "Extra" data fields, and then the possibly compressed, possibly encrypted file data. The "Extra" data fields are the key to the extensibility of the zip format. "Extra" fields are exploited to support the ZIP64 format, WinZip-compatible AES encryption, file attributes, and higher-resolution NTFS or Unix file timestamps. Other extensions are possible via the "Extra" field. Zip tools are required by the specification to ignore Extra fields they do not recognize.

The zip format
The zip format uses specific 4-byte "signatures" to denote the various structures in the file. Each file entry is marked by a specific signature. The beginning of the central directory is indicated with a different signature, and each entry in the central directory is marked with yet another particular 4-byte signature.
There is no BOF or EOF marker in the zip specification. Often the first thing in a zip file is a zip entry, which can be identified easily by its signature. But it is not necessarily the case that a zip file begins with a zip entry, and is not required by the zip specification.
Tools that correctly read zip archives must scan for the signatures of the various fields, the zip central directory. They must not scan for entries because only the directory specifies where a file chunk starts. Scanning could lead to false positives, as the format doesn't forbid other data to be between chunks, or uncompressed stream containing such signatures.
The zip specification also supports spreading archives across multiple filesystem files. Originally intended for storage of large zip files across multiple 1.44 MB floppy disks, this feature is now used for sending zip archives in parts over email, or over other transports or removable media.
The FAT filesystem of DOS has a timestamp resolution of only two seconds; zip file records mimic this. As a result, the built-in timestamp resolution of files in a zip archive is only two seconds, though extra fields can be used to store more accurate timestamps.
In September 2007, PKWARE released a revision of the zip specification that contains a provision to store file names using UTF-8, finally adding Unicode compatibility to zip.
File headers
All multi-byte values in the header are stored in little-endian byte order. All length fields count the length in bytes.
Local file header
OffsetBytesDescription
04Local file header signature = 0x04034b50 (read as a little-endian number)
42Version needed to extract (minimum)
62General purpose bit flag
82Compression method
102File last modification time
122File last modification date
144CRC-32
184Compressed size
224Uncompressed size
262File name length (n)
282Extra field length (m)
30nFile name
30+nmExtra field

The extra field contains a variety of optional data such as OS-specific attributes. It is divided into chunks, each with a 16-bit ID code and a 16-bit length.
This is immediately followed by the compressed data.
If bit 3 (0x08) of the general-purpose flags field is set, then the CRC-32 and file sizes are not known when the header is written. The fields in the local header are filled with zero, and the CRC-32 and size are appended in a 12-byte structure (optionally preceded by a 4-byte signature) immediately after the compressed data:
Data descriptor
OffsetBytesDescription
00/4Optional data descriptor signature = 0x08074b50
0/44CRC-32
4/84Compressed size
8/124Uncompressed size




he central directory entry is an expanded form of the local header:
Central directory file header
OffsetBytesDescription
04Central directory file header signature = 0x02014b50
42Version made by
62Version needed to extract (minimum)
82General purpose bit flag
102Compression method
122File last modification time
142File last modification date
164CRC-32
204Compressed size
244Uncompressed size
282File name length (n)
302Extra field length (m)
322File comment length (k)
342Disk number where file starts
362Internal file attributes
384External file attributes
424Relative offset of local file header. This is the number of bytes between the start of the first disk on which the file occurs, and the start of the local file header. This allows software reading the central directory to locate the position of the file inside the ZIP file.
46nFile name
46+nmExtra field
46+n+mkFile comment
After all the central directory entries comes the end of central directory record, which marks the end of the ZIP file:
End of central directory record
OffsetBytesDescription
04End of central directory signature = 0x06054b50
42Number of this disk
62Disk where central directory starts
82Number of central directory records on this disk
102Total number of central directory records
124Size of central directory (bytes)
164Offset of start of central directory, relative to start of archive
202Comment length (n)
22nComment

This ordering allows a zip file to be created in one pass, but it is usually decompressed by first reading the central directory at the end.

Compression methods
The .ZIP File Format Specification documents the following compression methods: stored (no compression), Shrunk, Reduced (methods 1-4), Imploded, Tokenizing, Deflated, Deflate64, bzip2, LZMA (EFS), WavPack, PPMd. The most commonly used compression method is DEFLATE, which is described in IETF RFC 1951.
Compression methods mentioned, but not documented in detail in the specification include: PKWARE Data Compression Library (DCL) Imploding (old IBM TERSE), IBM TERSE (new), IBM LZ77 z Architecture (PFS)
Encryption
Zip supports a simple password-based symmetric encryption system which is documented in the zip specification, and known to be seriously flawed. In particular it is vulnerable to known-plaintext attacks which are in some cases made worse by poor implementations of random number generators.
New features including new compression and encryption (e.g. AES) methods have been documented in the .ZIP File Format Specification since version 5.2. A WinZip-developed AES-based standard is used also by 7-Zip, XCeed, and DotNetZip, but some vendors use other formats.[20] PKWARE SecureZIP also supports RC2, RC4, DES, Triple DES encryption methods, Digital Certificate-based encryption and authentication (X.509), and archive header encryption.
ZIP64
The original zip format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a zip archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limitation to 16 EiB (264 bytes).
The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip does support ZIP64 from version Java 7.
Combination with other file formats
The zip file format allows for a comment containing any data to occur at the end of the file after the central directory. Also, because the central directory specifies the offset of each file in the archive with respect to the start, it is possible in practice for the first file entry to start at an offset other than zero.
This allows arbitrary data to occur in the file both before and after the zip archive data, and for the archive to still be read by a zip application. A side-effect of this is that it is possible to author a file that is both a working zip archive and another format, provided that the other format tolerates arbitrary data at its end, beginning, or middle. Self-extracting archives (SFX), of the form supported by WinZip and DotNetZip, take advantage of this—they are .exe files that conform to the PKZIP AppNote.txt specification and can be read by compliant zip tools or libraries.
This property of the zip format, and of the JAR format which is a variant of zip, can be exploited to hide harmful Java classes inside a seemingly harmless file, such as a GIF image uploaded to the web. This so-called GIFAR exploit has been demonstrated as an effective attack against web applications such as Facebook.
Limits
The minimum size of a zip file is 22 bytes.
The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (232−1 bytes, or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EiB) for ZIP64.
Proprietary extensions
When WinZip 9.0 public beta was released in 2003, WinZip introduced its own AES-256 encryption, using a different file format, along with the documentation for the new specification.The encryption standards themselves were not proprietary, but PKWARE had not updated APPNOTE.TXT to include Strong Encryption Specification (SES) since 2001, which had been used by PKZIP versions 5.0 and 6.0. WinZip technical consultant Kevin Kearney and StuffIt product manager Mathew Covington accused PKWARE of withholding SES, but PKZIP chief technology officer Jim Peterson claimed that Certificate-based encryption was still incomplete.
To overcome this shortcoming, contemporary products such as PentaZip implemented strong zip encryption by encrypting zip archives into a different file format.
In another controversial move, PKWare applied for a patent on 2003-07-16 describing a method for combining zip and strong encryption to create a secure file.
In the end, PKWARE and WinZip agreed to support each other's products. On 2004-01-21, PKWARE announced the support of WinZip-based AES compression format.[28] In a later version of WinZip beta, it was able to support SES-based zip files. PKWARE eventually released version 5.2 of the .ZIP File Format Specification to the public, which documented SES. The Free Software project 7-Zip also supports AES in zip files (as does its POSIX port p7zip).
Implementation
There are numerous zip tools available, and numerous zip libraries for various programming environments; licenses used include commercial and open source. For instance, WinZip is one well-known zip tool running on Windows and WinRAR, IZarc, Info-zip, 7-Zip, PeaZip and DotNetZip are other tools, available on various platforms. Some of those tools have library or programmatic interfaces.
Some development libraries licensed under open source agreement are the GNU gzip project and Info-ZIP. For Java: Java Platform, Standard Edition contains the package "java.util.zip" to handle standard zip files; the Zip64File library specifically supports large files (larger than 4 GB) and treats zip files using random access; and the Apache Ant tool contains a more complete implementation released under the Apache Software License.
For .NET applications, there is a no-cost open-source library called DotNetZip available in source and binary form under the Microsoft Public License.It supports many zip features, including passwords for traditional zip encryption or WinZip-compatible AES encryption, Unicode, ZIP64, zip comments, spanned archives, and self-extracting archives. The Microsoft .NET 3.5 runtime library includes a class System.IO.Packaging.Package that supports the zip format. It is primarily designed for document formats using the ISO/IEC international standard Open Packaging Conventions.
The Info-ZIP implementations of the zip format adds support for Unix filesystem features, such as user and group IDs, file permissions, and support for symbolic links. The Apache Ant implementation is aware of these to the extent that it can create files with predefined Unix permissions. The Info-ZIP implementations also know how to use the error correction capabilities built into the zip compression format. Some programs (such as IZArc) do not and will choke on a file that has errors.
The Info-ZIP Windows tools also support NTFS filesystem permissions, and will make an attempt to translate from NTFS permissions to Unix permissions or vice-versa when extracting files. This can result in potentially unintended combinations, e.g. .exe files being created on NTFS volumes with executable permission denied.
Versions of Microsoft Windows have included support for zip compression in Explorer since the Plus! pack was released for Windows 98.Microsoft calls this feature "Compressed Folders". Not all zip features are supported by the Windows Compressed Folders capability. For example, AES Encryption, split or spanned archives, and Unicode entry encoding are not known to be readable or writable by the Compressed Folders feature in Windows XP or Windows Vista.

Tidak ada komentar:

Posting Komentar