Common Practices in the Life Science Grid Life Science Grid - Research Group _________________________________________________________ Table of Contents 1. Areas 2. Practices 2.1. Introduction 2.2. Data 2.2.1. Information Integrator 2.2.2. OGSA-DAI 2.3. Industry Standards 2.3.1. HL7 2.4. Non-Globus 2.4.1. EnSMBL 2.5. Workflows 2.5.1. Workflow Survey 2.5.2. Legacy Applications 3. Summary and Conclusions 4. Groups 5. References With efforts spread over countless fronts, an individual or single group within a life science or health care grid is often unaware of well-tested courses of action. The purpose of this document is to provide other members of the Life Science community involved in grid activities the benefit of existing experience. The ultimate goal of this document is to be a GGF Informational (TODO: Common Practice?) Document. Whether this is done within the LSG-RG or within a separate working group remains to be seen. Wherever it's home, this document will need to be reviewed and updated regularly to accurately reflect the "Best Practices" of the grid community. Naturally, there exists a certain level of overlap between any attempt to describe common grid practices. Therefore, where possible we have made pointers to other similar efforts within the Grid community. This document does not endorse any of the mentioned solutions, applications, or frameworks. Nor should it simply be a listing of projects by life science groups. Rather it consists of distilled experiences, applicable to a wide range of users and intended for the common good. _________________________________________________________ 1. Areas Each of the common practices gathered in this document can be grouped under one or more general areas of research. In order to facilitate the organization of this document, those areas are described here. Entries should be assigned to a single category; additional areas of concern can be listed in the index. New areas or a refinement of existing classifications is possible at any time. Area: Benefits Description: What, in your experience, are the benefits of having grid resources in the life/health sciences? Is it worth the cost? Area: Creating a Grid Description: The basics of setting up a grid infrastructure. This includes installing Globus, Unicore, or any other middle-ware. Further,... Area: Data Description: A large issue :) Area: Industry Standards Description: Choosing and using industry standards can make or break a project. Which standards are a MUST, a SHOULD, or a SHOULD NOT? Area: Non-Globus Description: There are many projects that purposefully avoid the Globus toolkit. Here's why. Area: Security Description: Security is usually the Achilles' heel of many projects. What have you done to take care of this? Did it harm you that you waited until late to start securing your project? Area: Services Description: Grid services and/or services available on the grid. Services embody the attempt of groups to provide encapsulated units in the spirit of a SOA. What services can be provided easily? Which are already provided? How does a SOA force a group to think differently? Area: Workflows Description: Various groups are working on recording and automating the analysis of various inputs over many steps. Each of these efforts has its own benefits. Which do you choose? Can you use multiple workflow platforms simultaneously? Would a scripting environment work better for you than a visual tool? Area: Resources Description: A general listing of beneficial (or perhaps even to-be-avoided) resources that are available. It is not uncommon that a group will begin bootstrapping their own infrastructure without knowing what already exists. Let's help one another to stand on the shoulders of giants. _________________________________________________________ 2. Practices 2.1. Introduction _________________________________________________________ 2.2. Data 2.2.1. Information Integrator TODO (Andy): _________________________________________________________ 2.2.2. OGSA-DAI TODO: what type of installation do you have? With how many people can you share the instance? Data sizes? Are there any problems? _________________________________________________________ 2.3. Industry Standards 2.3.1. HL7 TODO (Jill?): What groups are using HL7? Has it been useful? _________________________________________________________ 2.4. Non-Globus 2.4.1. EnSMBL TODO (Chris?):(http://www.ensembl.org) genome annotation pipeline _________________________________________________________ 2.5. Workflows 2.5.1. Workflow Survey http://www.extreme.indiana.edu/swf-survey/ _________________________________________________________ 2.5.2. Legacy Applications The development of a workflow platform is a considerable undertaking. Several such projects existed before web and grid services and in general used a closed-world model not easily portable to today's SOA. These standalone applications may have been feature rich but in the long run suffer from missing out on current developments. Such legacy applications need to be brought into the modern world. (TODO: ibios?) _________________________________________________________ 3. Summary and Conclusions _________________________________________________________ 4. Groups During the initial phases, this document will be strongly linked to the groups who have submitted common practices. Therefore a listing of the groups who have taken part in this study follows. As best practices are abstracted from the common practices this information can be re-evaluated. A quick and simple method to initial make workflows available as services is ... Note: This should a subset of the POC listing available from the LSG-RG. Group: IBIOS Contact: http://www.dkfz.de/ibios _________________________________________________________ 5. References LSG Points of Contact (POC). https://forge.gridforum.org/projects/lsg-rg/document/LSG-POC/e n/3