This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /boards/15/topics/27?r=377 at Thu, 03 Nov 2022 15:42:36 GMT Document the endOfParent case where RightFill gets used due to encoding - Public Comments Archive - Open Grid Forum

Document the endOfParent case where RightFill gets used due to encoding

Added by Steve Hanson about 9 years ago

When parent lengthUnits is 'bytes' and last element has lengthKind 'endOfParent' and has text rep and encoding is not SBCS, it is possible for the element not to quite fit the parent length. In this case RightFill grammar kicks in.


Replies (6)

RE: Document the endOfParent case where RightFill gets used due to encoding - Added by Michael Beckerle about 9 years ago

This is from my notes on this:

12.3.7.3 doesn't deal with the fragment of a character issue for complex types.

When dfdl:lengthUnits='bytes', and a simple element of text representation is at the end of the complex type, and the dfdl:lengthKind of that element is 'endOfParent', then the simple element extends to the end of the enclosing complex element. Since this is measured in units of bytes, then when the encoding is not SBCS it is possible for a number of bytes to appear at the end that are insufficient to store a character codepoint. These are considered the RightFill region of the simple element, not the ElementUnused region of the complex element.

Alas - I am unable to reconstruct our reasoning about the partial character thing from the DFDL Workgroup call on 2013-08-28. The only way I could do it is with 'endOfParent' on a string at the end of the complex type having lengthUnits='bytes'. See last paragraph of section. In this case the scenario is similar to the 12.3.7.1.1 fragment situation, but 'endOfData' is not a specified length according to our current definitions.

Some earlier draft I had made 'endOfData' a specified length, but only in the case when the enclosing parent has specified length and with lengthUnits equal to the parent's length units. But we removed that some time back because it is complex to explain, so currently 'specified length' does not include 'endOfData' at all. Maybe we can introduce something like "inferred specified length" to describe this situation where 'endOfData' is the lengthKind, but the enclosing parent is providing the specification of length.

RE: Document the endOfParent case where RightFill gets used due to encoding - Added by Steve Hanson about 9 years ago

Change section 12.3.7.3 to read as follows:

A complex element of specified length is defining a 'box' in which its child elements exist. An example of this would be a fixed length record element with a variable number of children elements. The dfdl:lengthUnits may be 'bytes' or 'characters' and it is a schema definition error otherwise.

It is possible that the children may not entirely fill the full length of the complex element. An example is a complex element with a specified length of 100 characters, which contains a sequence of child elements that use up less than 100 characters of data, perhaps because an optional element is not present. In this case the remaining unused data is called the ElementUnused region in the data syntax grammar of section 9.2. Another example is a complex element with a specified length of 100 bytes, which contains a sequence of child elements the last of which has dfdl:lengthKind 'endOfParent', dfdl:representation 'text' and a multi-byte dfdl:encoding such that the element does not use up all the bytes of data. In this case the remaining unused bytes comprise the child element's RightFill region in the data syntax grammar of section 9.2. In both examples, the unused area is skipped when parsing, and is filled with the dfdl:fillByte on unparsing.

Note that a poorly chosen value for dfdl:fillByte may fill the region with data that cannot be decoded in the character set encoding, resulting in a decode error when this data is subsequently parsed again. When dfdl:lengthUnits is 'characters' the value for dfdl:fillByte should be chosen so as to avoid this error.

(1-6/6)

This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /boards/15/topics/27?r=377 at Thu, 03 Nov 2022 15:42:36 GMT