next up previous contents
Next: Other Proposed Conventions Up: Advanced FITS Previous: STScI Inheritance Convention

5.5 Checksum Proposal

Seaman and Pence (1995) have proposed a means for embedding a checksum within a FITS files. The proposal has been presented for discussion to the community as the first step in the process of seeking endorsement by the IAUFWG. This section summarizes their proposal; more details can be found in the paper itself. The purpose of such a checksum is to verify that there has been no error in a data transmission. A checksum is calculated for the data that are to be sent and then calculated in the same way where the data are received; the two are then compared. All checksums are calculated using ones-complement arithmetic; that is, the result of overflow in addition in the most significant bit is carried over to the least significant bit.

Three keywords are reserved for this proposal:

For purposes of calculating the checksum, the header and data of each HDU are treated as a series of 32-bit words. Checksums are calculated by adding these words, using ones-complement arithmetic, to yield a 32-bit integer. First the ones-complement checksum of the data records is calculated. The unsigned integer result is then formatted as a string and put in the value of the DATASUM keyword. For example, if the checksum of the data records is the unsigned integer 2503531142, then the DATASUM keyword in the header will appear as

DATASUM='2503531142'
The checksum keyword in the header is then set to an encoded ASCII zero:
CHECKSUM='0000000000000000'

The ones complement checksum of the header is then calculated, including the CHECKSUM and DATASUM keywords with the above values, and the already calculated data checksum is added to it, yielding a checksum for the entire HDU. The bit complement of the total, which is its ones-complement additive inverse, is then calculated such that

original_checksum + bit_complement = 0

This bit complement is then ASCII encoded. As part of the encoding process an encoded ASCII zero is added:

encoded_complement = bit_complement + ASCII_zero

This encoded complement replaces the original value of the checksum keyword, the ASCII zero, in effect subtracting it out:

 
new_checksum = orginal_checksum + encoded_complement - ASCII_zero
  = orginal_checksum + bit_complement
  = 0

Thus, with the encoded complement in the header, the checksum of the HDU is now zero.

Seaman and Pence recommend an algorithm for encoding the checksum complement for inclusion as the value of the CHECKSUM keyword. This scheme encodes not only the meaning of the binary data but the checksum's actual 32-bit unsigned integer value. Each of the four bytes in this value is numerically divided by four and the result placed in the corresponding byte of four quotients in four successive integer aligned fields; the remainder for each byte is added to the first quotient. For example, the first byte of the second quotient field contains a value equal to one-fourth of the first byte of the original checksum value. The sum of the four quotients is the original checksum; in addition, the byte alignment is maintained: the sum of the first bytes of the four quotients is the first byte of the original checksum complement, similarly with the other three. An ASCII zero is then added to all the bytes, thus creating a 16-character printable ASCII string. To improve readability, if byte values are not alphanumeric, then the values will be shifted by simultaneously incrementing one byte and decrementing the corresponding byte in another of the four quotients in a systematic way, until both are alphanumeric. In this process, the sum is not changed. In accordance with the FITS fixed format rules for character strings, the initial quote is in column 11 and the first byte will be in column 12. In order to maintain proper byte alignment within the four byte word, the final byte of the encoded checksum is permuted to column 12; thus, it remains the fourth byte of a four-byte word, the next byte is the first, and all the rest maintain proper position. This value is then set in the value field of the checksum. Seaman and Pence present a detailed example. The string can easily be decoded by permuting the encoded string back to its original alignment, subtracting the hexadecimal 30 offset, and summing the four parts.

This algorithm is not mandated by the proposal; other encodings are possible but must satisfy the following constraints:

The checksum is zero for the entire file because it is zero for each individual HDU of . Requiring a zero checksum for individual HDUs rather than only for the file as a whole allows HDUs to be removed from or added to the file without altering the checksum.

The following strong recommendations are made regarding the usage of these keywords:

The last two recommendations must be followed to avoid creating HDUs containg checksum keywords with values inconsistent with the HDU contents.

Seaman and Pence present a detailed example of the use of the checksum convention and the encoding described above, as well as software to in both C and FORTRAN 77 to implement the encoding. The C routines are adapted from the on-line software for the NOAO/IRAF Save the Bits archive; the FORTRAN 77 routines are adapted from the FITSIO package.


next up previous contents
Next: Other Proposed Conventions Up: Advanced FITS Previous: STScI Inheritance Convention