This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /issues/238 at Thu, 03 Nov 2022 23:30:53 GMT document #238: Trimming of text numbers when pad charaacter is '0' - DFDL WG - Open Grid Forum

document #238

Trimming of text numbers when pad charaacter is '0'

Added by Steve Hanson almost 8 years ago. Updated over 5 years ago.

Status:closed Start date:11/12/2014
Priority:Normal Due date:
Assignee:Steve Hanson % Done:

100%

Category:-
Target version:-
Document Type:Proposed Recommendation

Description

DFDL spec has a clause which prevents the parser from trimming away a number that is all '0's when the pad character is also '0'. But the wording only covers when the entire content region is '0's, there are other cases such as signs being present.

History

Updated by Steve Hanson almost 8 years ago

  • Status changed from submitted to accepted
  • % Done changed from 0 to 100

Relevant paragraph in section 13.6 updated to read:

"When parsing, if the pad character is '0' and dfdl:textTrimKind is 'padChar' then the SimpleContent region is trimmed of the '0' characters as defined by the trimming rules. If this trimming results in the next character in the SimpleContent region being a character other than a digit, or in all of the SimpleContent region being trimmed, then the last '0' character is re-instated and not trimmed. This rule also applies when the pad character is a DFDL character entity equivalent to '0'. This rule does not apply when the pad character is any other character nor when a pad byte is specified."

Updated by Andy Edwards almost 8 years ago

I've spotted a potential problem with this. Consider the situation where the pad character is '0' but the value being represented is something like 'infinity' or 'not a number'. When the infinity representation is something like 'INF', then for an element with length of 5 characters (say) that data is going to be '00INF'. We will parse this, trim the first two zeroes down to 'INF', then see that there is a non-digit character and put a zero back, leaving us with '0INF', which doesn't make sense.

I believe that this exception/rule only exists to make sure that when trimming, a logical value of zero is parsed as zero. Any other logical value can have all of it's zeroes trimmed and still make sense, and any other pad character would leave us with all zeroes, and result in a parsed value of zero anyway.

1 - Does the spec need to be more specific about the characters that trigger us to put a zero back, so not just non-digit, but the representation of decimal points, exponentials, signs? Or do we say that Inf and NaN are the only "special" numbers as they are conceptual numbers and not specific decimals, so they don't trigger us to put a zero back.

2 - Alternatively, could we say that anything that logically represents zero will be parsed as zero (so ignore trimming etc, we just bypass straight to zero) and anything else is trimmed and parsed normally. This might be easier to explain in words and it's orthogonal to how to program it (i.e. it isn't describing a specific algorithm of 'if-this-then-that')

Updated by Steve Hanson almost 8 years ago

Revised words:

"When parsing, if the pad character is '0' and dfdl:textTrimKind is 'padChar' then the SimpleContent region is trimmed of the '0' characters as defined by the trimming rules. If at least one '0' character is removed and the trimmed text causes a processing error when parsed, a single '0' character is re-instated and the text is parsed again. This is to handle the case when '0' characters are trimmed away leaving no digits. This rule also applies when the pad character is a DFDL character entity equivalent to '0'. This rule does not apply when the pad character is any other character nor when a pad byte is specified."

Updated by Steve Hanson almost 8 years ago

Erratum 5.7 in DFDL experience document 4

Updated by Michael Beckerle over 5 years ago

  • Status changed from accepted to closed

(Other formats not available in this archive.

This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /issues/238 at Thu, 03 Nov 2022 23:30:53 GMT