document #342

Remove BOM processing from DFDL 1.0

Added by Steve Hanson 26 days ago. Updated 26 days ago.

Status:submitted Start date:06/27/2019
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:-
Target version:DFDL v1.0
Document Type:Proposed Recommendation

Description

DFDL 1.0 includes some support for Unicode Byte Order Marks (BOMs). When parsing it recognizes when a document starts with a BOM and preserves the BOM in the infoset. When unparsing it uses the infoset to output a BOM at the start of the document if required. It does NOT provide support for BOMs that occur at other points in the document.

Experience has shown that few of the formats encountered by DFDL implementations use Unicode as the encoding, and there have been no use cases when a BOM was used. Given that neither IBM DFDL nor Daffodil have yet implemented BOM support, it has been agreed by the WG to drop BOM support from the DFDL 1.0 spec.

History

Updated by Steve Hanson 26 days ago

Note that it is possible and practical to create a DFDL schema to model BOMs explicitly, so dropping support does not mean that documents with BOMs become unparsable.

Updated by Steve Hanson 26 days ago

Work required to remove BOM support from spec:
- 4.1.1. Remove [unicodeByteOrderMark] enum from the infoset
- 9.2. Remove unicodeByteOrderMark from the grammar.
- 11. Remove forward reference to 11.1 from the 'Encoding' property description.
- 11.1. Remove section and two footnotes.

Updated by Michael Beckerle 26 days ago

  • Target version set to DFDL v1.0

Also available in: Atom PDF