This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /boards/17/topics/128?r=231 at Thu, 03 Nov 2022 15:31:03 GMT Please support (real) unordered lists - Public Comments Archive - Open Grid Forum

Please support (real) unordered lists

Added by Roger Costello about 9 years ago

The sequenceKind="unordered" property is a strange beast. It means:

The input data can be in any order, but I (the parser) am 
going to reorder the data into the sequence listed here (in
the schema).

We need real unordered lists. That is, the parser must not muck with the data - the order of the data must be preserved.

Please support one or both of these:

1. Support the XML Schema 1.1 <all> element.

2. Support a new property, to be used with sequenceKind="unordered". The property is used to specify whether the parser is allowed to reorder the input data. How about calling it: allowedToDorkWithTheOrder = true/false


Replies (9)

RE: Please support (real) unordered lists - Added by Jonathan Cranford about 9 years ago

I'll expand a bit on Roger's request above.

The changes that XSD 1.1 made to <all> that are most relevant to supporting an unordered list capability are the following, I believe:
  • "The value of maxOccurs may now be greater than 1 on particles in an all group. The elements which match a particular particle need not be adjacent in the input." (from http://www.w3.org/TR/xmlschema11-1/#ch_models)
  • minOccurs can be greater than 1.

Here's the big question, as I see it: Is there anything that would prevent DFDL 1.0 from cherry-picking XSD 1.1 features as we're suggesting? As I understand it, the design goals of DFDL include (1) having an infoset compatible with XSD processors and (2) having DFDL schema files that are compatible with XSD processors. If DFDL 1.0 expands what is allowed in <all> the same as XSD 1.1 does, I don't think that would impact the infoset, but it would impact the schema file; the resulting schema file could only be processed by an XSD 1.1 processor. Would that be an impediment to expanding what's allowed in <all> in DFDL 1.0?

If so, <all> would carry the same restrictions as in XSD 1.0; namely, the particles within <all> would have to have maxOccurs equal to 1 and minOccurs equal to either 0 or 1. I think that would limit the utility of using <all> to represent unordered lists.

RE: Please support (real) unordered lists - Added by Steve Hanson about 9 years ago

Let's look at the options:

If DFDL mixes features from XSDL 1.0 and 1.1 then DFDL is no longer based on XSDL 1.0 and is some strange hybrid beast, neither based on one or the other. That is extremely undesirable. The underlying schema (ie, minus the DFDL annotations) is meaningless and can't be used by a validating XML parser, is not supported by XML tools, etc.

An alternative is to move entirely to XSDL 1.1. That is not on the cards for DFDL 1.0, but is something that could be considered for a future DFDL 2.0, as XSDL 1.1 provides some other useful features, such as less restrictive UPA rules and its own equivalent of asserts which could be used for validation. The one downside I see is that tools support for XSDL 1.1 in the industry seems pretty sparse.

One thing that is technically feasible for DFDL 1.0 is to allow the use of xs:all, but that will come with the restrictions on maxOccurs & minOccurs.

Another possibility is for implementations to provide 'position' information in the infoset, allowing a post-parse step to recreate the original order.

RE: Please support (real) unordered lists - Added by Steve Hanson about 9 years ago

Worth noting that the issue of order preservation applies equally to elements with floating 'yes'. Currently the spec says that these floaters will be presented in XSD sequence position whereas the desire would likely be to have them appear in data stream position.

RE: Please support (real) unordered lists - Added by Steve Hanson about 9 years ago

Meant to comment specifically on the proposal for a new property to allow the infoset to be presented in data stream order.

The spec does not provide such a property because it means the infoset would not match the underlying XSD, and therefore would not validate using an XSDL 1.0 validator. That is something we have been strictly adhering to in DFDL 1.0 so that a DFDL schema with annotations removed gives exactly the same infoset, and validates the same, when presented with the equivalent XML.

If such a property was added, perhaps it's use could be mutually exclusive with switching on validation?

RE: Please support (real) unordered lists - Added by Tim Kimber about 9 years ago

DFDL already allows info sets to be created that do not conform to the XSD. The only difference in this case is that non-conformance would be highly likely rather than merely a possibility. So I think I agree - let's provide the property, and point out that it is not compatible with schema validation.

RE: Please support (real) unordered lists - Added by Michael Beckerle almost 9 years ago

Using sequence annotations DFDL v1.0 supports unordered representations for logically ordered structures like sequences.

If the logical nature of the data is truly unordered in that the order is significant, then you must use a repeating element to capture that positioning explicitly. The index position within that repeating element array preserves the order value.

If several different things can be found at each position, that is modeled using a choice.

This is just the way it is done in DFDL v1.0. Use of xs:all constructs was considered at one time, but rejected as unnecessary given the array-of-element-containing-choice mechanism can model everything that an xs:all group could model.

I have heard no argument that array-of-element-containing-choice is not adequate for describing data formats yet, only that people familiar with XSD seem to want to use xs:all groups.

I think this issue is about having a decent tutorial telling people how to use DFDL to model many different things. Otherwise of course people try to take whatever experience with XML or XSD they have and leverage it on DFDL schema creation. This case of unordered lists is just a case where experience with XSD leads you to look for xs:all groups support, or other special keywords in DFDL. A tutorial would clear this up.

RE: Please support (real) unordered lists - Added by Michael Beckerle almost 9 years ago

Replying to my own reply here,....

The point earlier in the thread by Steve Hanson, pointing out that floating elements are almost certainly positional in nature, argues the current DFDL design point where all sequences are logically ordered even if physically unordered/floating, is not consistent. One can argue for unordered sequences, that logically ordered is a reasonable semantics, but for a floating element, that's very likely to be floating specifically for positional reasons.

I think we have a few choices:

  1. add a [dataIndex] (or other name) member to the infoset, so as to preserve the physical-position information. That solves the problem in theory, but pragmatically if people want to use DFDL to convert to/from XML documents that are validatable, it doesn't fix the issue.
  2. change spec for unordered sequences (and sequences with floating elements), and say the infoset is constructed in physical data order, hence, validation will not work for this data.
  3. add a property: dfdl:sequencePreservesDataOrder="yes" which if "no" means current semantics where data are reordered into logical order, if "yes" means the order of children in the data is preserved, but this means that the data may not be valid if the data order isn't the same as the logical order.
  4. add an xs:all group construct allowing children to be logically and physically unordered.
  5. do nothing: anything that requires preservation of physical order would have to be modeled as an array-of-element-containing-choice. Then this problem would have to be addressed in a future revision (DFDL v2.0).

Personally, I'm in favor of the "do nothing" option, but if this really has to be addressed then I think adding the xs:all group would be the next best thing to do. Adding a [dataIndex] member to the infoset is fine with me as well, but as noted above, this doesn't solve the issue, it just gives more information content allowing users to model data as sequences with unordered, and restore the physical data order information via a downstream transformation.

Action 245 - RE: Please support (real) unordered lists - Added by Michael Beckerle almost 9 years ago

To be further investigated as action 245.

Resolved - RE: Please support (real) unordered lists - Added by Michael Beckerle almost 9 years ago

On DFDL WG Call 2014-01-14, we concluded this is too large a change to consider for DFDL v1.0.

(1-9/9)

This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /boards/17/topics/128?r=231 at Thu, 03 Nov 2022 15:31:03 GMT