Forums » #117 - DFDL v1.0 Revision »
Request clarifications of the Escape Schemes.
Added by Taylor Wise about 9 years ago
- Does any character effectively escape the block start or is the block start inside the data a syntax error (is a valid escape block start only one that appears at the beginning of the data?).
- Does an end block have to be followed by the delimiter (optionally padding first) or does the absence of a delimiter mean that it is not an end block?
- Without an escape block start are the escape escape characters still interpreted.
- Does extra escape characters require escape kind = escapeCharacter.
- What is the appropriate behavior for the following:
Assuming escapeBlockStart="Start", escapeBlockEnd="END", escapeEscapeCharacter="!"
,4StartStart1!!23END4!ENDEND, ( comma is delimiter)
Replies (5)
Action 235 RE: Request clarifications of the Escape Schemes. - Added by Michael Beckerle almost 9 years ago
Added action 235 to subject line. Have to put something here in the body or it won't add the comment with just a subject line change (grrrr.)
Resolved - RE: Request clarifications of the Escape Schemes. - Added by Michael Beckerle almost 9 years ago
Per DFDL Workgroup discussion on Nov 12, 2013
Answers:
1. Yes
Any character effectively escapes the block start, so long as the character is not the pad character when the content is being trimmed on the left meaning it is right or center justified.
2. Yes - no lookahead. To be clear:
There may not be a delimiter. When following a block start, the block end, not preceded by an escape escape character, is always interpreted as ending the content region. It may be followed by a delimiter if that is what is expected in the model; however, there is no lookahead for the delimiter or anything else.
For an element with dfdl:lengthKind='delimited', it is a processing error if the block end is not followed by optional padding and a delimiter.
3. No - without a block start nothing will be interpreted as an escape escape character nor as a block end.
4. No - For escapeKind="escapeBlock" presence of any of the extra escaped characters in the data implies that the data must be surrounded by the block start and block end when unparsing. This is stated in the spec. See dfdl:generateEscapeBlock.
5. <SMH>Taking the example above: ,4StartStart1!!23END4!ENDEND, ( comma is delimiter).
a) If the leading '4' is not trimmed as a padding character, then the escape block start is not treated as such because it is not at the start of the data, so the infoset contains '4StartStart1!!23END4!ENDEND' - no escaping is applied.
b) If the leading '4' is trimmed as a padding character, then the first 'Start' is treated as escape block start, and the first unescaped 'END' is treated as escape block end. The '4' after 'END' may also be trimmed as a padding character if justification is 'center'. But the first '!' will cause a processing error, because the next character is expected to be the ',' delimiter.
Once you find an unescaped block end, you are done and no longer in an escaped block. The only thing that can follow it then is optional padding (if allowed on the right), or any delimiters.
c) If the data was instead ',4StartStart1!!23!END!ENDEND,' and the leading '4' is trimmed' as per b) then the first two occurrences of 'END' are escaped by the '!' and the last 'END' is treated as the escape block end. The infoset contains 'Start1!!23ENDEND' (because
spec says the escape escape character is not removed when it does not precede the escape block end **).</SMH>
The definition of escapeKind for escapeBlock needs clarification, because it implies one can isolate the data without interpreting the block start and block end. For delimited formats, the block start and block end are integral to identification of the delimiter.
<SMH> Agree. And it's just not delimited formats. The text needs to be processed from start to finish to handle the escape escape character. </SMH>
Need to clarify that the escape escape character does not apply to the block start ever.
Consider expressing this with a small grammar.
- <SMH>Is this really correct, or should the escape escape character always be removed? </SMH>
Answer: not it is only removed when active (i.e., it is context sensitive when it is removed.)
Resolved: Request clarifications of the Escape Schemes. - Added by Steve Hanson over 8 years ago
New errata 4.5 in experience document 1.
DONE - RE: Request clarifications of the Escape Schemes. - Added by Michael Beckerle over 8 years ago
change in draft draft-gwdrp-dfdl-v1.0.4-r05.docx
(1-5/5)