Forums » #117 - DFDL v1.0 Revision »
byteOrder not sufficient to implement MIL-STD-2045-47001D
Added by Michael Beckerle about 9 years ago
This standard, which is very similar to many other MIL formats, is publicly available.
DFDL is not providing enough control over binary data formats.
dfdl:byteOrder not sufficient to implement MIL-STD-2045-47001D (http://tinyurl.com/peqz8ch redirects to one official place to get it. This standard is public.)
This format requires that bytes be interpreted with the bits least-significant-bit first.
DFDL only allows describing data where the bytes are placed in big-endian or little-endian order, but it has no control over the bits within a byte.
E.g., Suppose I have these two bytes expressed as hex: 1a2b
Interpreting these as bigEndian the value is 0x1a2b.
Interpreting these as littleEndian (byte order), the value is 0x2b1a.
MIL-STD 2045 47001D requires that this be interpreted as the value 0xD458, which is what you get if you use littleEndian byte order, and little-endian interpretation of the bits within a byte: If you expand 1a2b into bits you get 00011010 00101011. If you then reverse this bit string you get 11010100 01011000, and then convert back to hex you get hex D458, which is the correct interpretation as far as MIL-STD 2045 47001 is concerned.
I believe we need a new property. Suggest dfdl:bitOrder with values "leastSignificantBitFirst" "mostSignificantBitFirst". Current functionality of DFDL is preserved with dfdl:bitOrder="mostSignificantBitFirst".
Replies (13)
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Steve Hanson about 9 years ago
Mike, what logical and physical data types does this apply too?
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle about 9 years ago
This format contains strings that use us-ascii-7-bit-packed encoding, signed integers, and single-bit flags. The format in question is a header format. It is also possible for them to contain "payloads" that use 8-bit characters.
However, complex types composed of these simple types all depend on this bit ordering as well.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Steve Hanson about 9 years ago
I'm trying to work out whether bit-order is an orthogonal concept to encoding and byte-order (requiring a new property), or whether what we have is just a different encoding and a different byte-order.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle about 9 years ago
Bit order is orthogonal to byte order and encoding.
You read exactly one byte of data, and it either represents one number, or another. So byte order is irrelevant, and encoding is irrelevant.
bit order MSB first, 0x14, is 00010100, is the same byte as LSB first 00101000 or 0x28.
If you have two bytes, then when MSB is first, the MSB of the 2nd byte is considered adjacent to the LSB of the first byte. This is also orthogonal to byte order. When LSB is first, the LSB of the 2nd byte is considered adjacent to the MSB of the first byte.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle about 9 years ago
In the spec document for MIL-STD-2045-47001D, Table B-I on page 70 gives a worked example showing the bit positions of each field within a byte.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle about 9 years ago
An additional consideration.
If a bitOrder property was implemented, then what does it mean for one element to have bitOrder="mostSignificantBitFirst", and the adjacent element bitOrder="leastSignificantBitFirst".
If the two elements are separated by a byte boundary, then there is no issue.
But if the boundary of the two elements is not on a byte boundary, then the concept isn't well defined.
So this would have to be a constraint when composing DFDL schema elements together, that the bitOrder can only be switched on an aligned 8-bit boundary.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle about 9 years ago
Section 3.1.1 of http://www.gzip.org/zlib/rfc-deflate.html#packing
describes the packing of bits into bytes for the deflate algorithm and this also uses LSBit first ordering.
Just so as to point out it is more than just this one family of MIL standards.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Tim Kimber about 9 years ago
Let's assume that an implementation could, in principle, cater for bit-reversed regions by inserting a bit-reversal stage between the byte stream and the DFDL processor. I'm assuming that every bit-reversed region would occupy an integer number of octets. Such an implementation would be quite straightforward because all other processing algorithms would be unaffected.
That makes me wonder whether we should preserve our 'freedom of movement' for the future by making this an enum. I don't think we will get any more requirements for simple bit-shuffling, but I have been asked whether DFDL can operate on base64 data fields. That would be another case of a whole number of octets that need to be decoded into a byte stream. Encrypted regions or zipped regions could be handled in the same way.
I don't think these should be mandatory features of a DFDL implementation. If standardized at all, they would need to be optional features. One option would be to specify that the enum for bit-order can accept implementation-specific enums for handling bottom-layer byte stream pre-processing.
RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle about 9 years ago
It is not quite that simple. Sorry if I made it seem that way.
Let's look at 'abc' in 7-bit characters, stored least-significant-bit-first.
Ascii codes for abc are 0x61 0x62 and 0x63.
Stored LSB first, those will be (in bits from left to right)
1000011 0100011 1100011
Grouped into bytes:
10000110 10001111 00011 + 000 (3 padding bits to fill out the final byte)
Now, if we simply flip the bits of each byte around:
01100001 11110001 00011000
The data did not magically become normal MSB-first data. The 'a' character is still occupying the 7 least-significant bits of the first byte. Viewed Most-significant bit first, the 'a' character is bits 2 to 8 (1-based indexing). Bit 1 is actually part of the 'b' character. In fact the only thing flipping the bits around achieved is now the seven bits of the 'a' character are in the expected place-value order the way we normally right a number down: 1100001. That is, the numeric value of the field is now most-significant-bit first, but the bits themselves are not first within the byte in msb-first order. We still have to treat the bits as numbered 1 to 8 starting on the right of each byte, and proceeding left.
So it isn't so simple as just reversing the bits of each byte, as the way the bits are numbered within a byte remains fundamentally different.
Deferred: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Steve Hanson about 9 years ago
Deferring until the Daffodil implementation has had a stab at actually handling this.
Deferred: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Michael Beckerle over 8 years ago
An experience document describing a bitOrder property and various other things needed to fix this limitation is
here: http://redmine.ogf.org/dmsf_files/13268.
DONE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Steve Hanson about 8 years ago
The above experience document has evolved into this official experience document http://redmine.ogf.org/dmsf_files/13337.
This describes a new dfdl:bitOrder property which addresses this public comment.
DONE - RE: byteOrder not sufficient to implement MIL-STD-2045-47001D - Added by Steve Hanson about 8 years ago
(1-13/13)