Zip is a file format used for data compression and
archiving. A zip file contains one or more files that have been compressed, to
reduce file size, or stored as is. The zip file format permits a number of
compression algorithms.
The format was originally created in 1989 by Phil Katz, and
was first implemented in PKWARE's PKZIP utility,as a replacement for the
previous ARC compression format by Thom Henderson.
The zip format is now supported by many software utilities
other than PKZIP. Microsoft has included built-in zip support (under the name
"compressed folders") in versions of Microsoft Windows since 1998.
Apple has included built-in zip support in Mac OS X 10.3 (via BOMArchiveHelper,
now Archive Utility) and later, along with other compression formats.
Zip files generally use the file extensions ".zip"
or ".ZIP" and the MIME media type application/zip.Zip is used as a
base file format by many programs, usually under a different name.
Zip files are often represented by a document or other
object prominently featuring a zipper.
Structure
Structure
A zip file is identified by the presence of a central
directory which is located at the end of the structure in order to allow the
appending of new files. The central directory stores a list of the names of the
entries (files or directories) stored in the zip file, along with other
metadata about the entry, and an offset into the zip file, pointing to the
actual entry data. This allows a file listing of the archive to be performed
relatively quickly, as the entire archive does not have to be read to see the
list of files. The entries in the zip file also include this information for
redundancy.
The order of the file entries in the directory need not
coincide with the order of file entries in the archive.
Each entry is introduced by a local header with information
about the file such as the comment, file size and file name, followed by
optional "Extra" data fields, and then the possibly compressed,
possibly encrypted file data. The "Extra" data fields are the key to
the extensibility of the zip format. "Extra" fields are exploited to
support the ZIP64 format, WinZip-compatible AES encryption, file attributes,
and higher-resolution NTFS or Unix file timestamps. Other extensions are possible
via the "Extra" field. Zip tools are required by the specification to
ignore Extra fields they do not recognize.
The zip format
The zip format uses specific 4-byte "signatures"
to denote the various structures in the file. Each file entry is marked by a
specific signature. The beginning of the central directory is indicated with a
different signature, and each entry in the central directory is marked with yet
another particular 4-byte signature.
There is no BOF or EOF marker in the zip specification.
Often the first thing in a zip file is a zip entry, which can be identified
easily by its signature. But it is not necessarily the case that a zip file
begins with a zip entry, and is not required by the zip specification.
Tools that correctly read zip archives must scan for the
signatures of the various fields, the zip central directory. They must not scan
for entries because only the directory specifies where a file chunk starts.
Scanning could lead to false positives, as the format doesn't forbid other data
to be between chunks, or uncompressed stream containing such signatures.
The zip specification also supports spreading archives
across multiple filesystem files. Originally intended for storage of large zip
files across multiple 1.44 MB floppy disks, this feature is now used for
sending zip archives in parts over email, or over other transports or removable
media.
The FAT filesystem of DOS has a timestamp resolution of only
two seconds; zip file records mimic this. As a result, the built-in timestamp
resolution of files in a zip archive is only two seconds, though extra fields
can be used to store more accurate timestamps.
In September 2007, PKWARE released a revision of the zip
specification that contains a provision to store file names using UTF-8,
finally adding Unicode compatibility to zip.
File headers
All multi-byte values in the header are stored in
little-endian byte order. All length fields count the length in bytes.
Offset | Bytes | Description |
---|---|---|
0 | 4 | Local file header signature = 0x04034b50 (read as a little-endian number) |
4 | 2 | Version needed to extract (minimum) |
6 | 2 | General purpose bit flag |
8 | 2 | Compression method |
10 | 2 | File last modification time |
12 | 2 | File last modification date |
14 | 4 | CRC-32 |
18 | 4 | Compressed size |
22 | 4 | Uncompressed size |
26 | 2 | File name length (n) |
28 | 2 | Extra field length (m) |
30 | n | File name |
30+n | m | Extra field |
The extra field contains a variety of optional data such as
OS-specific attributes. It is divided into chunks, each with a 16-bit ID code
and a 16-bit length.
This is immediately followed by the compressed data.
If bit 3 (0x08) of the general-purpose flags field is set,
then the CRC-32 and file sizes are not known when the header is written. The
fields in the local header are filled with zero, and the CRC-32 and size are
appended in a 12-byte structure (optionally preceded by a 4-byte signature)
immediately after the compressed data:
Offset | Bytes | Description |
---|---|---|
0 | 0/4 | Optional data descriptor signature = 0x08074b50 |
0/4 | 4 | CRC-32 |
4/8 | 4 | Compressed size |
8/12 | 4 | Uncompressed size |
he central directory entry is an expanded form of the local header:
Offset | Bytes | Description |
---|---|---|
0 | 4 | Central directory file header signature = 0x02014b50 |
4 | 2 | Version made by |
6 | 2 | Version needed to extract (minimum) |
8 | 2 | General purpose bit flag |
10 | 2 | Compression method |
12 | 2 | File last modification time |
14 | 2 | File last modification date |
16 | 4 | CRC-32 |
20 | 4 | Compressed size |
24 | 4 | Uncompressed size |
28 | 2 | File name length (n) |
30 | 2 | Extra field length (m) |
32 | 2 | File comment length (k) |
34 | 2 | Disk number where file starts |
36 | 2 | Internal file attributes |
38 | 4 | External file attributes |
42 | 4 | Relative offset of local file header. This is the number of bytes between the start of the first disk on which the file occurs, and the start of the local file header. This allows software reading the central directory to locate the position of the file inside the ZIP file. |
46 | n | File name |
46+n | m | Extra field |
46+n+m | k | File comment |
After all the central directory entries comes the end of central directory record, which marks the end of the ZIP file:
Offset | Bytes | Description |
---|---|---|
0 | 4 | End of central directory signature = 0x06054b50 |
4 | 2 | Number of this disk |
6 | 2 | Disk where central directory starts |
8 | 2 | Number of central directory records on this disk |
10 | 2 | Total number of central directory records |
12 | 4 | Size of central directory (bytes) |
16 | 4 | Offset of start of central directory, relative to start of archive |
20 | 2 | Comment length (n) |
22 | n | Comment |
This ordering allows a zip file to be created in one pass,
but it is usually decompressed by first reading the central directory at the
end.
Compression methods
The .ZIP File Format Specification documents the following
compression methods: stored (no compression), Shrunk, Reduced (methods 1-4),
Imploded, Tokenizing, Deflated, Deflate64, bzip2, LZMA (EFS), WavPack, PPMd.
The most commonly used compression method is DEFLATE, which is described in
IETF RFC 1951.
Compression methods mentioned, but not documented in detail
in the specification include: PKWARE Data Compression Library (DCL) Imploding
(old IBM TERSE), IBM TERSE (new), IBM LZ77 z Architecture (PFS)
Encryption
Zip supports a simple password-based symmetric encryption
system which is documented in the zip specification, and known to be seriously
flawed. In particular it is vulnerable to known-plaintext attacks which are in
some cases made worse by poor implementations of random number generators.
New features including new compression and encryption (e.g.
AES) methods have been documented in the .ZIP File Format Specification since
version 5.2. A WinZip-developed AES-based standard is used also by 7-Zip,
XCeed, and DotNetZip, but some vendors use other formats.[20] PKWARE SecureZIP
also supports RC2, RC4, DES, Triple DES encryption methods, Digital
Certificate-based encryption and authentication (X.509), and archive header
encryption.
ZIP64
The original zip format had a 4 GiB limit on various things
(uncompressed size of a file, compressed size of a file and total size of the
archive), as well as a limit of 65535 entries in a zip archive. In version 4.5
of the specification (which is not the same as v4.5 of any particular tool),
PKWARE introduced the "ZIP64" format extensions to get around these
limitations, increasing the limitation to 16 EiB (264 bytes).
The File Explorer in Windows XP does not support ZIP64, but
the Explorer in Windows Vista does. Likewise, some libraries, such as DotNetZip
and IO::Compress::Zip in Perl, support ZIP64. Java's built-in java.util.zip
does support ZIP64 from version Java 7.
Combination with other file formats
The zip file format allows for a comment containing any data
to occur at the end of the file after the central directory. Also, because the
central directory specifies the offset of each file in the archive with respect
to the start, it is possible in practice for the first file entry to start at
an offset other than zero.
This allows arbitrary data to occur in the file both before
and after the zip archive data, and for the archive to still be read by a zip
application. A side-effect of this is that it is possible to author a file that
is both a working zip archive and another format, provided that the other
format tolerates arbitrary data at its end, beginning, or middle.
Self-extracting archives (SFX), of the form supported by WinZip and DotNetZip, take
advantage of thisâthey are .exe files that conform to the
PKZIP AppNote.txt specification and can be read by compliant zip tools or
libraries.
This property of the zip format, and of the JAR format which
is a variant of zip, can be exploited to hide harmful Java classes inside a
seemingly harmless file, such as a GIF image uploaded to the web. This
so-called GIFAR exploit has been demonstrated as an effective attack against
web applications such as Facebook.
Limits
The minimum size of a zip file is 22 bytes.
The maximum size for both the archive file and the
individual files inside it is 4,294,967,295 bytes (232â1 bytes,
or 4 GiB) for standard ZIP, and 18,446,744,073,709,551,615 bytes (264â1
bytes, or 16 EiB) for ZIP64.
Proprietary extensions
When WinZip 9.0 public beta was released in 2003, WinZip
introduced its own AES-256 encryption, using a different file format, along
with the documentation for the new specification.The encryption standards
themselves were not proprietary, but PKWARE had not updated APPNOTE.TXT to
include Strong Encryption Specification (SES) since 2001, which had been used
by PKZIP versions 5.0 and 6.0. WinZip technical consultant Kevin Kearney and
StuffIt product manager Mathew Covington accused PKWARE of withholding SES, but
PKZIP chief technology officer Jim Peterson claimed that Certificate-based
encryption was still incomplete.
To overcome this shortcoming, contemporary products such as
PentaZip implemented strong zip encryption by encrypting zip archives into a
different file format.
In another controversial move, PKWare applied for a patent
on 2003-07-16 describing a method for combining zip and strong encryption to
create a secure file.
In the end, PKWARE and WinZip agreed to support each other's
products. On 2004-01-21, PKWARE announced the support of WinZip-based AES
compression format.[28] In a later version of WinZip beta, it was able to
support SES-based zip files. PKWARE eventually released version 5.2 of the
.ZIP File Format Specification to the public, which documented SES. The Free
Software project 7-Zip also supports AES in zip files (as does its POSIX port
p7zip).
Implementation
There are numerous zip tools available, and numerous zip
libraries for various programming environments; licenses used include
commercial and open source. For instance, WinZip is one well-known zip tool
running on Windows and WinRAR, IZarc, Info-zip, 7-Zip, PeaZip and DotNetZip are
other tools, available on various platforms. Some of those tools have library
or programmatic interfaces.
Some development libraries licensed under open source
agreement are the GNU gzip project and Info-ZIP. For Java: Java Platform,
Standard Edition contains the package "java.util.zip" to handle
standard zip files; the Zip64File library specifically supports large files
(larger than 4 GB) and treats zip files using random access; and the Apache Ant
tool contains a more complete implementation released under the Apache Software
License.
For .NET applications, there is a no-cost open-source
library called DotNetZip available in source and binary form under the
Microsoft Public License.It supports many zip features, including passwords for
traditional zip encryption or WinZip-compatible AES encryption, Unicode, ZIP64,
zip comments, spanned archives, and self-extracting archives. The Microsoft
.NET 3.5 runtime library includes a class System.IO.Packaging.Package that
supports the zip format. It is primarily designed for document formats using
the ISO/IEC international standard Open Packaging Conventions.
The Info-ZIP implementations of the zip format adds support
for Unix filesystem features, such as user and group IDs, file permissions, and
support for symbolic links. The Apache Ant implementation is aware of these to
the extent that it can create files with predefined Unix permissions. The
Info-ZIP implementations also know how to use the error correction capabilities
built into the zip compression format. Some programs (such as IZArc) do not and
will choke on a file that has errors.
The Info-ZIP Windows tools also support NTFS filesystem
permissions, and will make an attempt to translate from NTFS permissions to
Unix permissions or vice-versa when extracting files. This can result in
potentially unintended combinations, e.g. .exe files being created on NTFS
volumes with executable permission denied.
Versions of Microsoft Windows have included support for zip
compression in Explorer since the Plus! pack was released for Windows 98.Microsoft
calls this feature "Compressed Folders". Not all zip features are
supported by the Windows Compressed Folders capability. For example, AES
Encryption, split or spanned archives, and Unicode entry encoding are not known
to be readable or writable by the Compressed Folders feature in Windows XP or
Windows Vista.
Tidak ada komentar:
Posting Komentar