Forums » #117 - DFDL v1.0 Revision »
occursCountKind when minOccurs = maxOccurs = 1
Added by Steve Hanson about 9 years ago
Spec says that if an element carries minOccurs = maxOccurs = 1 then occursCountKind is not examined. The result of this is that is not possible to parse such an element such that the omission of the element causes a validation error. This is in contrast to minOccurs = maxOccurs = 2 where I could specify occursCountKind 'parsed' and cause a validation error if 2 occurrences not found. This seems inconsistent.
Replies (9)
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Steve Hanson about 9 years ago
This is related to the public comment on recoverable errors. It's allowing parsers to stay on their feet and report errors rather than stopping parsing.
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Michael Beckerle about 9 years ago
This seems ok. If I put occursCountKind='implicit' into scope then I get the current behavior for all scalar elements (right?)
If arrays generally do not have 'implicit' behavior - e.g., you want a format with parsed or expression semantics for arrays, but still want hard required scalars (a very common case I expect), then you have to jump through some hoops in constructing your schema because you have one dfdl property that has to frequently have different values. You can't put it into scope and have it just "do the right thing" everywhere unless 'implicit' is the right semantics for your arrays.
Implementation wise, this suggestions is no big deal in complexity. We would simply tighten the definition of required scalar element to include occursCountKind='implicit'. It is literally less than one line of code.
I am more worried about the required changes to the way schemas must be authored.
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Steve Hanson about 9 years ago
The dfdl:format annotations created IBM's DFDL schema wizards either set occursCountKind 'implicit' or 'fixed' (the latter for COBOL).
For the industry schemas we have created so far, the only one which does not have 'implicit' is ISO8583 which uses 'expression'. So changing the rules will cause a problem the latter, and is something IBM would have to manage. Normally I would be against a disruptive change such as this, but the current asymmetry has bothered me for a while.
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Steve Hanson about 9 years ago
And yes, if I put occursCountKind='implicit' into scope then I get the current behavior for all elements with minOccurs = maxOccurs = 1.
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Steve Hanson almost 9 years ago
Spec used to say that a PoU was one of:
1. Choice branch
2. Element in an unordered sequence 
3. Optional element
4. Array element where minOccurs <> maxOccurs
5. Element in a sequence with a floating element
The latest revision now calls these potential PoUs, changes 4 to be all array elements, and adds a table that says whether an array is an actual PoU depending on occursCountKind:
1. Choice
2. Element in an unordered sequence 
3. Optional element
4. Array element
5. Element in a sequence with a floating element
If you look at the negative of this list, it is saying that what is not a potential PoU is a non-floating element with minOccurs = maxoccurs = 1 if it does not appear in either the branch of a choice or an unordered sequence or a sequence with floating elements.
This public comment effectively makes ALL elements potential PoUs.
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Michael Beckerle almost 9 years ago
We should consider if we should fix this inconsistency differently.
I suggest: if occursCountKind='parsed', we should require minOccurs="0".
This is a much much smaller change to DFDL than changing the behavior of scalar elements.
That means things that are occursCountKind='parsed' are always fully variable length, which is consistent with the behavior of 'parsed' as a format. This is just requiring a better match between the logical schema's use of min/maxOccurs and what is actually going on in the data representation.
This notion: that the DFDL schema isn't "what you want out logically", but rather must reflect "how the data really behaves", is something we revisit over and over. To me, wanting minOccurs=maxOccurs=2, but occursCountKind='parsed' is a silly combination, and we just chose to assign some meaning to it arbitrarily. I think this was a mistake. We're letting the DFDL schema diverge too much from having to really describe the data format. To me, if the data is optional, then the logical model should reflect that optionality, and that is done with min/maxOccurs being different. In the case of occursCountKind='parsed', if we truly want it to mean the infoset will contain any number of occurrences, then I have no problem constraining it to minOccurs="0" (and even maxOccurs="unbounded").
In any case, I'd much prefer to change the semantics of occursCountKind 'parsed' by adding a constraint, than change the behavior of scalar elements.
RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Steve Hanson almost 9 years ago
I think you mean that occursCountKind 'parsed' can only be used when minOccurs <> maxOccurs, rather than minOccurs = "0".
We already have a constraint that in an unordered sequence, the only allowable occursCountKind is 'parsed', so we would have to revisit that.
But it doesn't solve my problem. I want the parser to be able to parse what is there without throwing a hard error, and then use validation to check the infoset. That is how XML parsing against a schema works - everything gets parsed then minOccurs/maxOccurs are used to validate.
Resolved - RE: occursCountKind when minOccurs = maxOccurs = 1 - Added by Michael Beckerle almost 9 years ago
Change is too large. Impacts unclear. Published schemas already assuming the current behavior.
So this should stay as is.
Closed: occursCountKind when minOccurs = maxOccurs = 1 - Added by Steve Hanson over 8 years ago
No update to experience documents or specification needed.
(1-9/9)