-------------------------------------------------------------------- SAGA Use Case Template: ======================= Name of use case: RobGrid Contact (name and address): Stephane Vialle, SUPELEC, 2 rue Edouard Belin, 57070 Metz, France Authors (if different form contact) ................................ 1. General Information: ----------------------- This section consists of check-boxes to provide some context in which to evaluate the use case. 1.1 Which best describes your organisation: Industry [ ] Academic [X] Other [ ] Please specify: ................................... 1.2 Application area: Astronomy [ ] Particle physics [ ] Bio-informatics [ ] Environmental Sc. [ ] Image analysis [ ] Other [X] Please specify: Computing Science & Process control 1.3 Which of the following apply to or best describe this use case Multiple selections are possible, please prioritize with numbers from 1 (low) to 5 (high): Database [ ] Remote steering [ ] Visualization [ ] Security [1] Resource discovery [ ] Resource scheduling [ ] Workflow [ ] Data movement [ ] High Throughput Computing [ ] High Performance Computing [1] Other [ ] Please specify: Remote and redundant control of process (with time constraints) 1.4 Are you an: Application user [ ] Application developer [ ] System administrator [ ] Service developer [ ] Computer science researcher [X] Other [ ] Please specify: ................................... 2. Introduction: ---------------- 2.1 Provide a paragraph introduction to your use case. Background to the project is another alternative. (E.g. 100 words). This research project is about robot control across a Grid, first step toward a Grid solution for generic process control. A Grid can provide a homogeneous environment for computers from different sites. It can choose the most suitable machine for every task and transparently run redundant computations for critical operations, adding fault tolerance to the remote control. We built a Grid between France and Italy and successfully controlled a navigating robot and a robotic arm. Our Grid is based on the GridRPC paradigm, the DIET environment and an IPSEC-based VPN. We turned some modules of robotic applications in Grid services. Finally we developed a high-level API for specializing the GridRPC paradigm for our purposes, and a semantics for easy adding new Grid services. 2.2 Is there a URL with more information about the project ? incoming ... 3. Use Case to Motivate Functionality Within a Simple API: ---------------------------------------------------------- Provide a scenario description to explain customers' needs. E.g. "move a file from A to B," "start a job." Please include figures if possible. If your use case requires multiple components of functionality, please provide separate descriptions for each component, bullet points of 50 words per functionality are acceptable. - Customer uses a laptop+wifi to control a physical process and to be able to walk around during complex experiment. He runs computations on some powerful computing servers in a Grid, and does not care of the choice of the servers. - Customer runs a complex process control (physical process) composed of many modules. Each module is a service and a Grid middleware manage these services. The RobGrid API allows a smart programming of the application, running critical services several times in parallel (redundant computation) to ensure these operations will succeed. - Customer controls several physical processes and use several computers, but avoid to devote computers to particular processes. Any machine can be used to control a process, and a Grid middleware manages the matching of the resources and the process currently under control. - Customer needs to control a process in a hostil environment, with very few computing resources around. He needs to use remote computing resources. The API allows to easily write applications that overlap the "physical times" (time to move some part of the physical processes) with some communication times across Internet, and to access parallel computers when extra CPU power is needed. 4. Customers: ------------- - Customers of this use case can be computing researchers that aim to develop Grid solutions for process control. They can use our Grid and our API to implement their distributed and remote control. - Customers can be "end users": people that need to control some physical processes from a distant location, or to control a lot of processes spread in the world and a lot of computing resources at different locations. 5. Involved Resources: ---------------------- 5.1 List all the resources needed: e.g. what hardware, data, software might be involved. - 2 robots (one mobile robot and a robotic arm) - 1 cluster of 8 machines - 3 4-processor PC - 4 monoprocessor PC - 1 gateway PC for IPSEC VPN management on the main site - 1 firewall PC for IPSEC VPN management on the main site - 1 PC on each other site (3 other sites) - 1 bi-processor PC : grid server for DNS, Corba Name server, ... - DIET & OmniORB - IPSEC VPN - NONO & Koala libraries developed at Supelec (in collaboration with Loria) for robot management - RobGrid API (developed at Supelec) - Some Robotic applications developed at Supelc (in collaboration with ENSAM) 5.2 Are these resources geographically distributed? Yes : - University of Salerno : one grid computing server - 1 PC on Metz network ("wanadoo" provider) - 1 PC on Metz network ("free" provider) - Supelec Metz campus : 2 robots and the rest of the computing resources 5.3 How many resources are involved in the use case? E.g. how many remote tasks are executing at the same time? We have run until 18 services in the same benchmark (during the same application execution) We have run our application on 3 sites (simultaneously) among the 4 availables We have run our 2 robots simultaneously, and 18 services on 5 computing servers, more the firewall, the gateway and the computing server of the grid (DNS, ...), using 3 sites. 5.4 Describe your codes and tools: what sort of license is available, e.g. open or closed source license; what sort of third party tools and libraries do you use, and what is their availablility; do you regularly work from source code, or use pre-compiled applications; what languages are your applications developed in (if relevant), e.g. Fortran, C, C++, Java, Perl, or Python. Usually we use gcc and g++, we write C/C++ code, and all are open source codes. We use different tools to write UML documentation (including Rational Rose). We write and use source code and some pre-compiled versions of DIET. 5.5 What information sources do you require, e.g. certificate authorities, or registries. We have installed our VPN using certificates, and sometimes "secret strings" (when pb happen with certificates). 5.6 Do you use any resources other than traditional compute or data resources, e.g. telescopes, microscopes, medical imaging instruments. YES : 2 robots. One mobile robot: Koala robot of K-Team company, and one robotic arm from "Francaise d'instrumentation". 5.7 How often is your application used on the grid or grid-like systems ? [X] Exclusively ..... BUT: it was just a testbed for the development of our RobGrid API [ ] Often (say 50-50) [ ] Ocassionally on the grid, but mostly stand-alone [ ] Not at all yet, but the plan is to. 6. Environment: --------------- Provide a description of the environment your scenario runs in, for example the languages used, the tool-sets used, and the user environments (e.g. shell, scripting language, or portal). Linux, IPSEC VPN, OmniORB COrba Bus, DIET, Nono lib, Koala lib : robot management RobGrid API : management of robot on the Grid, allows to easily write Grid services for robot control Robotic application : navigation, landmark detection, self-localization, lightness measurement We shut down some services to simulate failures and to test our fault-tolerance mechanisms based on redundant Grid computations,and we measure execution times. 7. How the resources are selected: ---------------------------------- 7.1 Which resources are selected by users, which are inherent in the application, and which are chosen by system administrators, or by other means? E.g. who is specifying the architecture and memory to run the remote tasks? The robot(s) is(are) inerent to the application. For the moment the user choose the machines to use! But in the next version the user should choose only the sites to use for redundant computations, and the Grid middleware should choose the machines to use on these sites. The choice depends on the availability of the machines. Usually we do not look for powerful machine,s but for available machines for redundant computations. Ex: during the day machines at Metz on Internet and on the network of private provider (wanadoo, free, ...) and with limited Bw (128Kb/s, 512Kb/s) are more interesting than machines at the University of Salerno. During the night it is the opposite. We run many services (ex: 18 services, corresponding to 8 different services, and some services run redundantly). So we need many machines, but each machine can be a classical PC or a small multiprocessor machine. 7.2 How are the resources selected? E.g. by OS, by CPU power, by memory, don't care, by cost, frequency of availability of information, size of datasets? Main criteria are the OS and the availability of the machines (have different availability at different moment of the day), and the speed of the network to reach these machines. 7.3 Are the resource requirements dynamic or static? Dynamic: depends on the date (some machines aer needed during the day, and others during the night, for example), depends on the possible failures of the machines and of the network (we choose available" and "reachable" machines). 8. Security Considerations: --------------------------- 8.1 What things are sensitive in this scenario: executable code, data, computer hardware? I.e. at what level are security measures used to determine access, if any? Main sensitive point: if program and/or sensor output values are modified, then the robot can be damaged (physical damage!). So we need security. We have deployed a secure VPN: IPSEC with gateway and firewall, with data encryption, authentication (certificate based), limited login on our Grid ... "Full IPSEC install". 8.2 Do you have any existing security framework, e.g. Kerberos 5, Unicore, GSI, SSH, smartcards? IPSEC, ssh and certificates. 8.3 What are your security needs: authentication, authorisation, message protection, data protection, anonymisation, audit trail, or others? authentication, authorisation, message protection, data protection. 8.4 What are the most important issues which would simplify your security solution? Simple API, simple deployment, integration with commodity technologies. Simple deployment: IPSEC is integrated to Linux, but remains difficult to install on many sites. We succeeded on Salerno and Supelec. We failed on Potenza and Bucarest (UPB). We succeeded on wanadoo and free networks, but each time the machines were ebooted they changed their IP (dynamic IP on these private provider networks) and VPN config had to be updated .... 9. Scalability: --------------- What are the things which are important to scalability and to what scale - compute resources, data, networks ? For scalability experiment we need more robots (more physical processes to control), located on different sites ... and we need robotician/automatician partners to improve robotic applications. We need more sites that accept to deploy IPSEC (sometimes some change have to be done in the site security configuration). Of course we need more PC, but seems not a pb. 10. Performance Considerations: ------------------------------- Explain any relevant performance considerations of the use case. We have time constraints : when the local machines fail or are overloaded, we use remote machines (remote machines take the control automatically) and we have to deal with Internet communication times. We send camera image, infra red sensor output. When application slow down we can observe the robot slow down. We use to overlap communication time with computation times and with mechanical times. Each grid service is a kind of pipeline across Internet. At the end we measure execution time, global execution times on 24h (one benchamrk each 15 minutes, to not burn the robot! we have burnt 2 motors and one robot ...). We can observe that if services are on different machines, when machines fails, not all service fail, a,d so the global slowdown of the application is limited. Many services continue to run on local machines, and remote machines control only some parts of the application. Slow don is limited to these parts. Stategy seems OK. We try to learn what site is a good candidate to host redundant services at 8h, 12h, 16h, 20h, 24h 4h ... 11. Grid Technologies currently used: ------------------------------------- If you are currently using or developing this scenario, which grid technologies are you using or considering? We used to use a VPN based middleware and a GridRPC programming model, then we build our RobGridAPI : kind of distributed and remote object framework (like Java RMI). But in the future we would like to use a GridRPC formalism on a "Globus-like" middleware, to test another technology than VPN. We would like to test a middleware that does not need to have accounts on each site. 12. What Would You Like an API to Look Like? -------------------------------------------- Suggest some functions and their prototypes which you would like in an API which would support your scenario. GridRPC was Ok, but we would like it would be easier to initialize the system, to exchange data like camera images. We are experimenting ProCative (other project), the API seems interesting, bu we pefer C/C++ to deal with drivers of robot devices. RobGrid is RMI-like, so I think a GridRMI, in place of GridRPC could be interesting for our application. Seems more adaped to our use case. 13. References: --------------- List references for further reading. -For an introduction to the robotic application and the first parallelization we experimented : A. Siadat and S. Vialle (alphabetic order). Robot Localization, Using P-Similar Landmarks, Optimized Triangulation and Parallel Programming. Accepted in ISSPIT 2002: 2nd IEEE International Symposium on Signal Processing and Information Technology. Marrakesh, Morocco, December 18-21, 2002. - For an introduction to the RobGrid API: F. Sabatier and A. De Vivo and S. Vialle. Grid Programming for Disributed Remote Robot Control.. 13th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2004), Modena, Italy. June 2004. - Aa article about the entire project has been submited to EGC05 ... --------------------------------------------------------------------