SAGA Use Case Template: ======================= Name of use case: HSEP Contact (name and address): Emmanuel Jeannot LORIA Campus Scientifique - BP 239 54506 VANDOEUVRE-les-NANCY CEDEX France Authors (if different form contact) Gerald Monard et Emmanuel Jeannot 1. General Information: ----------------------- This section consists of check-boxes to provide some context in which to evaluate the use case. 1.1 Which best describes your organisation: Industry [ ] Academic [X] Other [ ] Please specify: ................................... 1.2 Application area: Astronomy [ ] Particle physics [ ] Bio-informatics [ ] Environmental Sc. [ ] Image analysis [ ] Other [X] Please specify: (Quantum) Chemistry 1.3 Which of the following apply to or best describe this use case Multiple selections are possible, please prioritize with numbers from 1 (low) to 5 (high): Database [3] Remote steering [ ] Visualization [ ] Security [ ] Resource discovery [ ] Resource scheduling [ ] Workflow [1] Data movement [2] High Throughput Computing [4] High Performance Computing [5] Other [ ] Please specify: ................................... 1.4 Are you an: Application user [ ] Application developer [ ] System administrator [ ] Service developer [ ] Computer science researcher [X] Other [ ] Please specify: ................................... 2. Introduction: ---------------- 2.1 Provide a paragraph introduction to your use case. Background to the project is another alternative. (E.g. 100 words). New developments in the field of theoretical chemistry require the computation of numerous Molecular Potential Energy Surfaces (PESs) to generate adequate quantum force field parameters. Because workstations alone cannot fulfill the requirements of these modern chemical advancesy, we have tackled this problem using several up-to-date computer science technology such as a GridRPC middleware (DIET), molecular databases, script interfacing... 2.2 Is there a URL with more information about the project ? ... 3. Use Case to Motivate Functionality Within a Simple API: ---------------------------------------------------------- Provide a scenario description to explain customers' needs. E.g. "move a file from A to B," "start a job." Please include figures if possible. If your use case requires multiple components of functionality, please provide separate descriptions for each component, bullet points of 50 words per functionality are acceptable. We have decided to use the DIET\footnote{\url{http://graal.ens-lyon.fr/DIET}} (D\istributed Interactive Engineering Toolbox) middleware to port our application on our grid. The DIET environment works under the gridRPC. The application needs to store information about the computations that have to be performed and the results.It is required to access to the information very often and very efficiently. Therefore, we have decided to use MySQL databases for storing the information. Since the DIET API is in the C/C++ language, some python scripts are used to interface DIET with the databases. These scripts are used to extract requests and to filter and store results in the databases. Requests and results are transferred in XML files. On the computing side, python scripts are used to parse XML requests files and call the quantum chemistry (QC) solver with the right arguments. Several solvers can be used to compute \abinitio or semiempirical energies (\ie, quantum chemical energies, or QCEs). Each time a compute node register to the DIET environment, it tells DIET which kind of operation it is able to perform. 4. Customers: ------------- Describe customers of this use case and their needs. In particular, where and how the use case occurs "in nature" and for whom it occurs. E.g. max 40 words The customer give a set iof molecule conformation and the system computes teh Potential Energy Surface of all the system. Then the system is able to fit structural chemistry parameters. 5. Involved Resources: ---------------------- 5.1 List all the resources needed: e.g. what hardware, data, software might be involved. A PC farm and a PC server linked by a network. 5.2 Are these resources geographically distributed? Not yet, but it could as soon as we will be able to bypass firewall. 5.3 How many resources are involved in the use case? E.g. how many remote tasks are executing at the same time? So far we have 8 PC, but we could have many more. There is at most one task running per resources at the same time. 5.4 Describe your codes and tools: what sort of license is available, e.g. open or closed source license; what sort of third party tools and libraries do you use, and what is their availablility; do you regularly work from source code, or use pre-compiled applications; what languages are your applications developed in (if relevant), e.g. Fortran, C, C++, Java, Perl, or Python. We use the DIET middleware (opensource) a mysql Database. The code is in C sole script are in Python. 5.5 What information sources do you require, e.g. certificate authorities, or registries. None 5.6 Do you use any resources other than traditional compute or data resources, e.g. telescopes, microscopes, medical imaging instruments. None 5.7 How often is your application used on the grid or grid-like systems? [X] Exclusively [ ] Often (say 50-50) [ ] Ocassionally on the grid, but mostly stand-alone [ ] Not at all yet, but the plan is to. 6. Environment: --------------- Provide a description of the environment your scenario runs in, for example the languages used, the tool-sets used, and the user environments (e.g. shell, scripting language, or portal). ... 7. How the resources are selected: ---------------------------------- 7.1 Which resources are selected by users, which are inherent in the application, and which are chosen by system administrators, or by other means? E.g. who is specifying the architecture and memory to run the remote tasks? No ressources are selected by the user. Resources are Server that register to the environement. 7.2 How are the resources selected? E.g. by OS, by CPU power, by memory, don't care, by cost, frequency of availability of information, size of datasets? We use the GridRPC model but run the application under the pull mode (desktop grid model). 7.3 Are the resource requirements dynamic or static? static 8. Security Considerations: --------------------------- 8.1 What things are sensitive in this scenario: executable code, data, computer hardware? I.e. at what level are security measures used to determine access, if any? Resukts stire inthe data base are the added value of teh application. Only the administrator can perform critical things on it. 8.2 Do you have any existing security framework, e.g. Kerberos 5, Unicore, GSI, SSH, smartcards? none 8.3 What are your security needs: authentication, authorisation, message protection, data protection, anonymisation, audit trail, or others? none 8.4 What are the most important issues which would simplify your security solution? Simple API, simple deployment, integration with commodity technologies. none 9. Scalability: --------------- What are the things which are important to scalability and to what scale - compute resources, data, networks ? Since the grain is relatively small, we need sufficient data in order to use all the ressources. 10. Performance Considerations: ------------------------------- Explain any relevant performance considerations of the use case. ... 11. Grid Technologies currently used: ------------------------------------- If you are currently using or developing this scenario, which grid technologies are you using or considering? GridRPC for a desktop grid app. 12. What Would You Like an API to Look Like? -------------------------------------------- Suggest some functions and their prototypes which you would like in an API which would support your scenario. It should enable data management (persistence and redistribution). 13. References: --------------- List references for further reading. ...