This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /dmsf_files/7393?download=11587 at Fri, 04 Nov 2022 18:38:29 GMT
OGSA, WS-RF and CEA
OGSA, WS-RF and the Common Execution Architecture
Guy Rixon (gtr@ast.cam.ac.uk); 2007-09-19
These notes describe some ways that virtual-observatory software
might use an OGSA platform and the associated requirements for that
platform. AstroGrid's Common Execution Architecture is examined as a
possible point of linkage between IVOA and OGSA grids.
CEA as a computational grid
The Common Execution Architecture (CEA) is AstroGrid's pattern for
driving web services in workflows. CEA has these characteristics:
- All calls to web services that actually do some astronomy are treated as steps in a batch job.
- All such job steps execute asynchronously; the caller is not blocked until the work completes.
- All CEA services have the same interface, captured as a standard WSDL-contract.
- The function to be executed in a call to a CEA service is
expressed in a parameter to that call: an XML document using a
vocabulary specific to CEA that specifies the application to be run and
the arguments for this particular run.
- CEA services will only execute functions from a library of applications that is specified by the service provider.
A system based on CEA is a computational grid. It supplies
astronomical processing via a standard interface. A CEA service
is a specialized form of the job-manager pattern. It is less general
than the job managers in generic grid-toolkits, e.g. Globus, because
the jobs that can be run are constrained by the service provider; users
cannot supply arbitrary programmes. The applications available on a CEA service can be made into grid
commodities by standardizing their interfaces and the descriptions of
those interfaces. However, the raw processing-power supportting the service is not available as a commodity.
In principle, a CEA service can be adapted as a facade for a local
batch-queuing system, such as Condor or GridEngine. This has not yet
been done; current CEA services run their applications either as
in-process calls to Java classes or by forking a local sub-process for
each call.
CEA also supports MySpace, AstroGrid's implementation of a data
grid. Inputs and outputs of applications can be defined to be files in
virtual storage; CEA uses temporary, local copies of these files to
hide the data grid from the application code. CEA will support
VOSpace, the IVOA standard for data grids, when that standard
comes into use.
CEA is integrated with the registry of computing resources as
defined by IVOA. The descriptions of applications used to make calls to
CEA services are also used in the registration records of those
services.
AstroGrid's approach differs from that of most other
virtual-observatory projects. The majority of virtual-observatory
services so far published are synchronous (the work completes or fails
during a single HTTP call to the service) and have WSDL contracts that
are specialized to what they do. CEA avoids synchronous processing in
order to support long-running operations. It avoids
application-specific contracts in order to simplify the interface to
the workflow engine.
It is worth noting that the application-description language in CEA
was originally intended to be a small extension to WSDL: this would
have maintained the application-specific contracts for the services. It
turned out to be easier to use standard WSDL, with a common contract
for all CEA services, and to make the extensions as schemata used in
the "types" section of the contract.
IVOA, CEA and WS-ResourceFramework
The IVOA recognizes the need for asynchronous processing, even
though the existing services mostly work synchronously. IVO has a draft
standard for using WS-ResourceFramework to manage an asynchronous
job-step. This standard addresses the initiation, management and
monitoring of job-steps, with particular emphasis on notification of
completion of the work.
CEA pre-dates WS-RF and the IVOA standard for its use; it makes no
use of WS-RF and has its own libraries and interfaces for managing
asynchronous work. The CEA semantics for managing job steps are
essentially identical to those of WS-RF. If IVOA approves the use of
WS-RF (as opposed to alternatives based on
WS-CommonApplicationFramework or WS-Coordination), then it makes sense
to refactor CEA to use WS-RF also.
Thus, we expect to see a new generation of astronomy web-services
that are "grid services" by virtue of using WS-RF; i.e. they use the
same "plumbing" as standard services specified by GGF. Some of these
new services will by synchronous; some will be asynchronous. Some will
have the CEA-standard interface; some will have application-specific
interfaces. None will necessarily comply with OGSA.
IVOA and OGSA
Virtual observatory software could converge with OGSA. This could mean either or both of two things:
- virtual observatory services implement OGSA interfaces;
- virtual observatory programmes are clients of OGSA services.
IVOA has standards (many still in development) for the following areas:
- access to astronomical image archives: Simple Image Access Protocol (SIAP) and Simple Spectral Access Protocol (SSAP);
- access to databases holding astronomical tables ("catalogues", in
the astronomy jargon; typically relational databases): Astronomical
Data Query Language (ADQL) and the SkyNode service that implements it;
- access to distributed data-storage (with no astronomical
specialization of the data stored): VOSpace (virtual-directory service)
and VOStore (file-caching service);
- registration of of resources (data collections; astronomical
algorithms; services); management and propgation of those services: the
registry service and the schemata for resource metadata;
- single-sign-on authentication and accompanying PKI: the "community service" proposal, still at an early stage.
IVOA does not have standards in job-based computing, either for
job-manager services or for workflow systems. However, virtual-observatory projects that are partners in IVOA have internal
standards for these; CEA is AstroGrid's example. It is likely that IVOA
will eventually standardize one or more of these systems.
From the list above, which service could plausibly implement OGSA interfaces?
- The astronomy-specific standards do not overlap with OGSA and will remain separate.
- The IVOA registry is a mature standard and rest of the IVOA
architecture depends upon it. IVOA will not substantially change this
standard to match OGSA. However, OGSA (or GGF outwith OGSA) might adopt
the IVOA standard; the structure of the registry is separated from the
astronomical contents, so any grid could potentially use the
architecture.
- VOSpace and VOStore could
be adapted to match OGSA. Currently, these standards are early drafts
with no detailed specifications or implementations. The specifications
are expected to become concrete during 2005. Once these standards are
adopted by IVOA and implemented in the virtual observatory, then
it will be expensive to change the interface (too many other
services will be bound to the original interface-standard) but it might
then be feasible to add
a GGF interface to the storage services. Adding an interface will only
work if the IVOA and GGF standards are semantically close enough.
- The authentication and user-registration services could very well
be aligned with OGSA, provided that OGSA's security arrangements are
specified early enough. Once the IVOA standard "sets", then it would be
unreasonably expensive to change the protocols. In addition, the
eventual services must support the sociology of the astronomical
community, and the OGSA standards for using PKIs must not compromise
this.
The mood in IVOA is that it is not especially helpful for us to
implement our services as part of the OGSA platform. The general
preference is to make the virtual observatory a client of the OGSA
platform such that the observatory may make use of generic grids. More
specifically, we aim to make our software a client of complete OGSA
platforms supplied by other developers. Because of this bias, most
participants in IVOA do not have detailed lists of requirements for
OGSA; rather, they assume that an OGSA platform can be driven from the
virtual observatory via some gateway that will be written in the
astronomical community. In this mode we have some general requirements:
- The OGSA platform should be complete and self-sufficient. A grid
offering services via OGSA should not require the client application to
provide any extra, OGSA services such as storage servers or
authentication servers.
- The interfaces into the OGSA platform should be as narrow as
possible (following the "hour-glass" paradigm" promoted for the Globus
tool-kit. The set of operations for which the virtual observatory has
to provide client-side adaptors should be kept small, in order that the
adaptors be cheap and easy to maintain.
- The number of services that a client calls in the OGSA platform
should be kept as small as possible. Ideally, each job submitted to an
OGSA platform would have exactly one contact point.
- The OGSA platform should be composable. I.e., it should have a
number of well-defined interfaces that can be composed together in
services. For each such interface, the virtual-observatory software
would generate fixed, resusable client-side stubs and delegates;
therefore, the standard interfaces must have strong version control.
For SOAP services, these standard, OGSA interfaces are logically WSDL
ports: i.e. standard port-types with standard SOAP bindings; c.f. OGSI
where the port-types were standardized but the bindings were not. The
interfaces that a particular platform provides should be obtainable
from that platform as a list of keywords (possibly URIs for the
standard contracts).
- The authentication system in OGSA must be interoperable with the
IVOA system. We think this means that both must be based on WS-Security
and on X.509v3 certificates, and that the PKIs must be in some way
compatible. However, we strongly prefer an authentication system that
is a pure service and which does not call back to the client side to
check credentials. (See the requirement above about not requiring the
client to provide any services to OGSA.)
- Astronomy is data-intensive, so inputs and output of jobs will
often need to be handled as files, not as parts of SOAP messages or as
attachments to messages. Ideally, OGSA should include a "poste
restante" system where the OGSA platform caches the inputs and outputs
and the client initiates the transfers. We prefer this to be based on
GridFTP rather than some more-complex arrangement such as Grid File
System. As an alternative, we might
be able to support a system whereby the OGSA platform pulls inputs from
GridFTP servers inside the virtual observatory and pushes outputs to a
similar set of servers; the details need to be worked out.
It is important to note that the virtual observatory is itself a
grid. Rather than interfacing to OGSA at the service level, it may be
more useful to drive job-managment systems via DRMAA from within
virtual-observatory services.
It is possible that some operations and schemata for service
metadata are common to many services in the OGSA platform. If this is
the case, then, presumably, there will be utilities that drive these
common features: e.g. metadata browsers, service up-time monitors,
utilities to manage WS-Resources. In this case, we would like to be
able to use these utilities on virtual-observatory services. Therefore,
we urge GGF to make these parts of OGSA an overt standard that will be
supported by platform implementors.
It is also conceivable to build an adaptor such that the entire
virtual-observatory platform can be driven as if it were an OGSA
platform. This would be valuable if there arose workflow systems for
OGSA comparable to the Trianna system: i.e. complete systems for
defining a scientific computation that call into OGSA to collect,
process and store the data. It is rather harder to build this kind of
adaptor than to build the client-side adaptor for the virtual
observatory to call OGSA. It is particularly difficult to make the
astronomy-specific parts of the virtual observatory look like the OGSA
platform. To do so requires job-oriented services inside the virtual
observatory and this brings us back to AstroGrid's CEA.
CEA and OGSA
It is conceivable that CEA might be redesigned
such that CEA services follow the OGSA specification for a job-manager
service. It is a feasible change because the CEA-service definition is
semantically close to the expected equivalent in OGSA. The cost of the
change might be partly offset by combining it with the change to use
WS-RF. However, AstroGrid has no current plans to make this change.
Defining the CEA service to match OGSA lets CEA clients call other OGSA
job-managers and lets workflow systems call CEA by treating it as an
OGSA clone. The specializations that make CEA good for the virtual
observatory would appear in three places only:
- in the UI for the grid, where the compute jobs are set up;
- in the job descriptions passed around by the middleware;
- in the implementation of the job-manager services (i.e. in the
way that they connect to the business logic specified in the job
descriptions).
Notably, the specialization to astronomy must not appear in
the middleware that places job steps on job-managers and which manages
the orchestration of job-steps into workflows. It is only worth
converging CEA and OGSA if this separation is maintained; otherwise,
the two systems cannot use each others' parts.
For CEA to adapt easily to OGSA, CEA needs OGSA to meet the general
requirements listed in the previous section plus some special
requirements on the specification for the job-manager service.
- OGSA-standard job-description language
- There must be a job-description language such that each job and
job-step can be defined completely. The part of the grid executing the
job should never need to refer back to the end user in order to
schedule and execute the job (this covers batch jobs and excluded jobs
steered interactively via real-time connections). This removes the need
for callbacks from OGSA to the virtual observatory.
- Workflow descriptions
- The job description in the current CEA describes a workflow as a
job broken into a number of job-steps. A job-step is an atomic
operation that can be sumitted to a job-manager service. The steps are
arranged in a pattern of sequential and parallel operations. We need to
keep this feature if we change CEA to match OGSA. Therefore, we require
the OGSA job-submission language to be a workflow language.
- Extensible job-description language
- Each job description must contain fragments that set up the
job-steps. These fragments are application specific. In the current
CEA, they use XML vocabularies and we need to retain this feature.
Therefore, the job language in OGSA must have extension points where we
can add our own schemata.
- Pluggable data grid
- The
current CEA interfaces directly to the astronomy data grid; OGSA
will not, since the astronomy data grid will not change follow OGSA
standards. We need the data grid to support data-intensive computing.
Where the OGSA job-manager interface is implented by CEA, we can add
the connection to the astronomy data grid as an extra feature; the
location to be read and written in the data grid are then specified in
the application-specific part of the job description. However, this
feature would not work with pure-OGSA clients. Therefore, there may be
scope for an API for pluggable data-grids with very basic features:
essentially, copying whole files between data-grid locations and the
local file-system as seen by the business logic. Some research is
needed to see whether this pluggability is really feasible.
Caveat
We have little input so far from the working groups inside IVOA.
None of the group leaders expressed an urgent need to link the virtual
observatory with GGF-standard grids. Some of the group leaders noted a
lack of introductory material about GGF standards: it is hard to
understand how to use GGF standards and the full implications of
adopting them.
Thus, this document reflects the views of the author rather than
IVOA policy. I have tried to capture the mood and probable direction of
IVOA. The detailed suggestions for OGSA and CEA are my own ideas,
informed by discussions with colleagues in IVOA.
This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /dmsf_files/7393?download=11587 at Fri, 04 Nov 2022 18:38:29 GMT