This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /dmsf_files/8019?download= at Thu, 03 Nov 2022 23:16:27 GMT
Data Format Description Language
(DFDL) Working Group
Global Grid Forum, Data Area
Chairs:
Martin Westhead, westhead@avaya.com
Mike Beckerle,
Jim Myers, jimmyers@ncsa.uiuc.edu
Email list:
dfdl-discuss@nesc.ac.uk
Web page:
Charter:
Focus/Purpose
XML provides an essential mechanism
for transferring data between services in an application and platform neutral
format. However it is not well suited to large datasets with repetitive
structures, such as large arrays or tables. Furthermore, many legacy systems
and valuable data sets exist that do not use the XML format. The aim of this
working group is to define an XML-based language, the Data Format Description
Language (DFDL), for describing the structure of binary and character encoded
(ASCII/Unicode) files and data streams so that their format, structure, and
metadata can be exposed. This effort specifically does not aim to create a
generic data representation language. Rather, DFDL endeavors to describe
existing formats in an actionable manner that makes the data in its current format
accessible through generic mechanisms.
The DFDL description would sit in a
(logically) separate file from the data itself. The description would provide a
hierarchical description that would structure and semantically label the
underlying bits. It would capture:
- how
bits are to be interpreted as parts of low-level data types (ints, floats, strings)
- how
low-level types are assembled into scientifically relevant forms such as
arrays
- how
meaning is assigned to these forms through association with variable names
and metadata such as units
- how arrays and the overall structure of the binary file
are parameterized based on array dimensions, flags specifying optional
file components, etc.
Further, if the data file contains
highly repetitive structures, such as large arrays or tables, such a
description can be very concise.
The potential benefits to having such a standard include:
- Transparency of physical
binary representation - Preservation of information, independent of
low-level format, e.g., bit/byte ordering, blocksize
etc.
- XML without explicitly
representing the tags - An XML representation of the data could be
inferred from the description, without actually having to materialize that
representation. This could allow the user to treat the data as if it
were XML, thus enabling:
- XSL conversions,
- Xquery/Xpath,
and
- SAX read/write
directly to/from DFDL.
- Data file -> database –
DFDL makes the structure explicit.
- Vendor-independent bulk
transfer of relations between relational data bases - that is it would
provide a mechanism for concisely describing binary data relations to
allow large transfers of data between databases.
- Generic tools for:
- Browsing
- Conversion
- Manipulation
- Annotation of binary files
(e.g. these bits represent the hurricane in an image)
- Absolute bit preservation
in data archiving - can keep original bits but use the data with new
software (that may not be designed for this format) because the format is
now explicit
- Selection and integration
of data - referencing (via XPath) means that
you can select individual data objects or groups and combine them from one
or more files in any order
- Basis for standard
transformation language - based on XPath and
XSL
- General semantic labeling
- since individual data objects and groups can be referenced, meta-data
labels can be associated with them. Such labels could be generic (like
physical units e.g. degrees centigrade) or application specific.
Goals/Milestones
The goals of the group are as follows:
- To develop a proposal for a
standard Data Format Description Language (DFDL) which will consist of a
general structure description language and then an extensible set of property
libraries for which we will provide a base.
- To work with other groups
within the GGF to ensure that the DFDL proposal conforms with other emerging Grid standards.
- To foster the development of
reference implementations of libraries and tools that
use the DFDL proposal.
The group aims to be very focused
and to leverage existing implementation work (see references) in the
development of reference implementations. We propose to produce the following
documents:
- Core DFDL
standard – this will describe the syntax and semantics of the core
language and the extension mechanisms.
- Basic properties
library
- Extended
properties library
(Note: we anticipate documents 2 and 3 potentially being
split into multiple sub documents representing different areas of the
standard.)
Milestones:
- GGF15
(1) strawman
- GGF16
(1) draft (2) strawman
- GGF17 (1)
revised draft (2) draft (3) strawman
- GGF18
(1) complete (2) draft (3) draft
- GGF19
(2) complete (3) complete
References:
- BinX
http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/
- HDF http://hdf.ncsa.uiuc.edu/HDF5
- BDF/SAM
http://collaboratory.emsl.pnl.gov/docs/collab/sam
- XDR
http://www.faqs.org/rfcs/rfc1014.html
This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /dmsf_files/8019?download= at Thu, 03 Nov 2022 23:16:28 GMT