Forums » #117 - DFDL v1.0 Revision »
padCharacter alternative zeros - internationalization
Added by Michael Beckerle about 9 years ago
Re: padCharacter
The spec says: When parsing, if the pad character is '0' and the SimpleContent region consists entirely of '0' characters, then the last remaining '0' is not trimmed and a single '0' is the result of the trimming. This rule also applies when the pad character is a DFDL character entity equivalent to '0'. This rule does not apply when the pad character is any other character nor when a pad byte is specified.
For Internationalization (i18n) reasons, do we need this to be more general than just '0'. Does it need to include other Unicode characters that are zeros?
Replies (4)
RE: padCharacter alternative zeros - internationalization - Added by Michael Beckerle almost 9 years ago
Action item 238 to determine the set of zero-equivalent characters and consider whether we can pad/trim with respect to the set, or specify any one of them, or other possible ways to resolve this.
Action 238 RE: padCharacter alternative zeros - internationalization - Added by Michael Beckerle almost 9 years ago
Adding Action 238 in subject line.
Resolved - Action 238 RE: padCharacter alternative zeros - internationalization - Added by Michael Beckerle almost 9 years ago
Adding properties for controlling this will not be considered for DFDL v1.0.
It is useful to consider this for a future version of DFDL.
A few observations: In unicode, all sets of digits in various languages are always 10 consecutive code points.
General support of varying locale use per simple type could be added by new properties, but opens a large number of issues.
Closed: padCharacter alternative zeros - internationalization - Added by Steve Hanson over 8 years ago
No update to experience documents or specification needed.
(1-4/4)