GridIR Working Group June 26, 2003 GGF8, Seattle Presenters: Kevin Gamiel (MCNC/CNIDR) Gregory Newby (Arctic Region Supercomputing Center) Nassib Nassar (Etymon) Minutes: Sousan Karimi (MCNC/CNIDR) Dr. Gregory Newby started the GIR-WG session by introducing himself, Kevin Gamiel, Nassib Nassar and Sousan Karimi. He talked about the GGF Intelectual Property policy, and mentioned that it can be found on the GGF web site. He talked about the meeting's agenda, the status of the requirements document, and invited the attendees to the GridIR demo. Dr. Newby asked the participants for their consensus to announce the last call for the requirements document, and also their input for the GridIR architecture document. Mr. Gamiel clarified that there are two more weeks for comments during last call on email list. Dr. Newby next talked about background of IR and the importance of GridIR. He explained that through setting up a Virtual Organization (VO) those organizations that didn't want to make their information available through the web, can now share them with those in the VO using GridIR. Question: Does GridIR work better with static or dynamic information? Answer: Dr. Newby: It works for both but because of Grid infrastructure it is more suitable for dynamic information. Question: There were discussions in Chicago about the reason of having 3 services (Collection Manager, Indexer and Query Processor), so clarification was requested. Answer: Mr. Gamiel: The requirements doesn't impose an architecture of the three services per se, but the intent is to provide the functionality that is required, therefore in the current architecture we have chosen to use these three services. At this time Dr. Newby suggested that the audience read the posted requirements document to provide input. He also mentioned that the document can be found on GridIR website and also that it will be available using GridForge soon. There wasn't much response for last call on requirements document by the attendees, but no objections on announcing last call on email list. Dr. Newby explained that since GGF7 minor changes have been made to the document for language clarification and enhancement has been made to the security description. He said we mention security since we think in some area security is not addressed we need to identify those areas. Basic concern is that the desirable function of GridIR is document sharing with ability to control the level of sharing. Service level and user level is provided by OGSA and we can have it for free, but we want to have access level and in some cases even document level control, which we don't think it will come from GGF. Question: Do you have any thing below the document level in mind? Answer: In our requirement document we stopped at the document level. Question: Our concern is mostly dealing with patient data that have lots of security and privacy issues, I am not sure if the document level control can address this. Answer: It depends on how you define your document, plus we are trying to use filters to solve some of these problems, however, the problem of security and its level is a hard problem and that may be a task for another working group. At this time Mr. Nassar said "if you look at XML and the way subdocument is defined, it is very general and is a moving target." Ms. Ramakrishnan at this time explained that these issues are being discussed in the Security Working group. Mr. Newby continued his presentation by giving a quick over view of the architecture, he said there are four main components to GridIR architecture. First is the Collection Manager, which is like a gatekeeper and it does whatever transformation is necessary to be used in VOs. Then he moved on explaining the Indexer and the query processor. He stated that the query processing can be one by an Indexer since Indexer can respond to a query, but what Query Processor can do is more than just search and getting the result, it can send one query to different indexers, merge the results, rank them and so on. He stated that another important job of the Query processor is to deal with persistent queries, which you will see in the demo this afternoon. Then he talked about the last component, the Security Service, and explained that it requires a fine level of control over data and methods. Question: Is there sub-document level acl? Answer by Dr. Newby: not sure what level we should stop at. Mr. Nassar offered: it is a moving target Ms. Ramakrishnan: just a change in policy representation and enforcement points At this time, Mr. Nassar took over and went over the current architecture of GridIR. He said the presentation was put together by Mr. Jeremiah Morris and some may have already seen it in Tokyo session. (You can find this presentation at http://www.gridir.org/publications.html.) Mr. Nassar stated that Mr. Gamiel's group has been implementing these components, and of course it is a work in progress. He gave an overview of the architecture and talked about the different components of this architecture. He said common to the three components is the notification process. He said the VO could have one or more of these components on a machine. Talked about Collection Manager and said that when it is created, it is configured to retrieve data from local or remote points, it notifies indexer if any of the data has changed. Mr. Gamiel offered an explanation about notification and stated that we want to support both push and pull methods, for example crawls as well as email list. He said it could be tightly coupled with databases that know when data has changed, then standing queries need to be processed, Query processor re-executes queries when it is notified. Query Processor combines all of the data from indexer and send it to the client/user. Then he talked about the importance of relevance for IR, and stated that of course it is a fuzzy notion and databases don't usually have fuzzy notions. Ranking based on relevance don't have a fixed way of representing the response set. He continued with saying that queries on different search engines give different results since there are different documents and different ways to compute relevance and overlap with DAIS when we formulate queries or specifications. Question: GridIR assumes these are dumb sources, databases are smart sources, so architecture should allow for these? Answere: Dr. Newby: we are accounting for that to some extent, but maybe we need to look into it more. Mr. Nassar: relational database are kind of out of scope. Mr. Gamiel: DAIS may be focussing on this issue. However, both query processor and indexer have the same interface, so user could query it, and chining of query processors is possible. We see the Query Processor as a more sophisticated service as it is now, and for it to handle the merge and other functions. Question: Is ranking and ordering of results part of the model? There may be a situation where user may want to change the relevance. Answer: We are trying to give you a model where you can solve such problems. Query Processor can be implemented in such a way. Question: Do you have an API? Answer: We need to come up with that in the Specification document. We will say what all you can do, but not necessarily all of it needs to be supported. We are trying to grid enable the information retrieval system. Dr. Newby: Effective merging of results is the least solved problem right now. GridIR will lay foundation for more research on this. Mr. Nassar: The reason we separated Indexer from Query Prosessor is to be able to solve these problems. Dr. Newby: We are very excited about Query Processor and the chining process for example creating a set from a result of a query, and then be able to query that. Mr. Gamiel gave an example of how a user can construct a query and submit it and come back after a week and get the updated results. Client can come and go as they want, this is all secure, the user can also create Query Processor and Indexer he/she is authorized based on OGSA model of factories, instances, etc ... User can dynamically create informational retrieval systems or part of it. Next Mr. Gamiel requested that everyone should subscribe to the mailing list and explained that if they were on the old mailing list hosted by MCNC now they should switch to the one offered through GGF. He asked the attendees to send their comments on the Requirement Document and the Architecture document even though it is a rough draft. Then Dr. Newby presented the conclusion slides and talked about the fact that how security and logging should be handled has not yet been defined, and that we don't want to overload the requirements. He said we would like to have other reference retrieval or parts of it, and asked the attendees to contact the Chairs or others to contribute to the reference implementation of GridIR. Next Dr. Newby talked about plans for GGF9 he said by then requirements will be out of the door, and we possibly will have final architecture document and the first draft of specification document. He then request for volunteers for Co-ordination with PA, DAIS and FTP. He invited every one to attend the demo at 2 pm in Aspen room.