OGSA HPC Profile Minutes – Fri Aug 25 2006 Participants: Marty Humphrey (University of Virginia) Marvin Theimer (MSFT) Glenn Wasson (University of Virginia) Michel Drescher (Fujitsu Europe) Minutes: Marty Humphrey * Summary of Actions: AI-HPCP-0825a: Everyone will re-read Donal's version of the HPC Profile Application in the JSDL section of Gridforge by Friday Sept 1 call. AI-HPCP-0825b: Check with BES doc and/or WG to determine the "final word" on issues of idempotency and subscriptions (and others). AI-HPCP-0825c: Marvin will suggest phrasing re: vectors to the email group for consensus. AI-HPCP-0825d: Marvin will mopdify HPC Profile doc to resolve Chris' comments, restrict to integer the total CPU count, total resource count and individual CPU count. AI-HPCP-0825e: Marty to send email to BES/JSDL/HPCP: "We are looking for prototype and interop participants." * Previous minutes: Approved * Review of Actions AI-HPCP-0818a: Donal Fellows to review HPC Profile Application. DONE. On Mon Aug-21, Donal Fellows uploaded a new version to the JSDL space in Gridforge. The main modification is a more comprehensive security section. Our group discussion in today's telecon on this subject resulted in AI-HPCP-0825a: AI-HPCP-0825a: "Everyone" will re-read it (in the JSDL section of Gridforge). On the Sept 1 call, we will determine (via vote) if we believe that it's ready for ratification via the JSDL group and the large OGF at large. In short, Michel suggested that we keep it in the HPCP gridforge section while it is being actively worked on and transition it to the JSDL section when it's ready for "normative radification". Because it's currently in the JSDL forge section, we will only "pull it back" to HPCP if, upon reading it, we decide (collectively) that it needs modifications (see AI-HPCP-0825a). AI-HPCP-0818b: Marvin Theimer will add paragraph to HPC Basic Profile stating: "all other elements of the JSDL 1.0 MAY appear but MAY also result in a 'not implemented' fault, but only the list here must be supported" DONE. This is discussed in this telecon, below. AI-HPCP-0818c: Marvin Theimer will add first draft of section 4 to HPC Basic Profile that restricts BES operations to singletons DONE. This is discussed in this telecon, below. * Go over changes to HPC Profile that Marvin sent out ----------------------------------------------------- Marvin: Before giving an overview of the modifications, there's an issue/question to BES guys: what about idempotency and subscriptions (and perhaps other issues)? That is, this was either in a previous version of the BES space *OR* explicitly mentioned to be included in a future version (but it's not mentioned in the current BES spec). We understand that this will appear as an option in the BES spec, but this has implications on the HPC Profile (e.g., we might like to restrict this and/or claim that such considerations are out-of-scope). Marty: Yes, we knew this going in -- that JSDL is "easier" in that it's fixed in stone, while BES is a moving target. This makes profiling difficult. AI-HPCP-0825b: Check with BES doc and/or WG to determine the "final word" on this issue. Marvin gives an overview of his modifications, beginning with the proposed text in the Intro paragrph to Section 3 (consensus: working is fine). Marvin described Mods to Section 4: changed the intro, and vectors must be exactly length 1. The group discussed at depth the issue of the treatment of vectors Initially, the group wanted to insert something akin to Michel's proposed test in email 8/25 Glenn: isn't there potential confusion: the client sends a vector of size greater than 1... what exactly does the service do? Does the service have a choice which one to choose? Marvin: BES needs to deal with this case, although there are two different classes of faults. There is a BES fault - "error in inputs (the following inputs -- activity IDs -- were unknown)" and an HPC Profile fault ("the vector is too large") Glenn: suggests that the base profile should say 1 (and return a size of 1) and greater than 1 is a fault; extension: here's how it maps differing size inputs to differing size outputs Marvin: asks for clarification: extension to BES, right? this idea that "I don't return exactly what was specified" is a BES concept Glenn: yes Michel: saying that you'll return a vector of size 1 retricts composability; instead, return a vector of the same size Marvin: Good point -- it may handle things of size greater than one -- if so, then it must return a vector of the same size. But it may fault. It MUST handle a vector of size 1. If it handles a vector of size larger than one, then it would publish an attribute indicating so. Michel suggests wording that the client "SHOULD" try to use a vector of size 1 but the group consensus is (while the intent is good) that this is not the best wording because it could be interpreted to make clients ONLY work in terms of vectors of size 1 -- this is not the intent and might restrict clients. AI-HPCP-0825c: Marvin will suggest phrasing and send it to the email group for consensus. Marty to make this into a tracker. this concluded the discussion of vectors. Marty: What about total CPU count? "non-exact" clearly has value, but it just seems too complicated for interop, for compliance Marvin: In addition, not all schedulers implement a range, so let's make it exact. Marty: Section 3.2.5.8. What is "total resource count"? Michel: this is for "tiling" - e.g., 5x10processes. Marty: Let's make this exact as well. Group consensus is that this is reasonable. Marvin: some of these issues can arise and be resolved during implementations as well Marty: what about Chris' comments? Re: RAM... "total physical memory" is certainly easier. Michel: JSDL had long discussions... it comes down to requirements and capabilities. Marvin: this must be clarified in the HPC profile. in our JSDL section, make these appear as requirements. That is, the phrasing of the text in the JSDL section of the HPC Profile should clearly indicate that they're *requirements* (not capabilities). In the BES section of the HPC profile, make the text clearly reflect that these are capabilities (not requirements). Michel: Section 3 of HPCP is clear. Section 4 needs clarification: if BES exposes this via JSDL terms, then we need to clarify that these are capabilities, which are static values. Marvin: 3.2.5.6 and 3.2.5.7 need to have clarifications changed. Glenn: we need to restrict to integer in the total CPU count, total resource count and individual CPU count Group consensus on this AI-HPCP-0825d: Marvin will resolve Chris' comments by making clarifications that these are requirements. In addition, Marvin will change the document to restrict to integer the total CPU count, total resource count and individual CPU count * Discuss prototyping and interop opportunities and next-steps. --------------------------------------------------------------- The group agreed that the HPC Profile document is reasonably close to completion, so it was suggested that we start developing prototypes and perform interop testing to further disambiguate the document (and start developing a compliance suite) Michel: this document should get into public comments first, so we should wait before creating implementations Marvin: I propose something slightly different... I propose that we put up for public comment and in parallel do implementations... this is primarily due to time contraints. Marty: Public comment is good, of course, but can we do anything to ensure that the vendors are engaged/committed? Marvin: I will send it directly to Sun, Altair, Condor, Globus, ESI, etc.... "please please review this carefully and contribute..." Marty: if we get Marvin's changes in the BES (and "security section"), we will SUSPEND development of the HPCP docment AI-HPCP-0825e: Marty to send email to BES/JSDL/HPCP: "We are looking for prototype and interop participants. We will be start to discuss compliance suite. This will be the agenda for next week's call. first round of interop will be attempted BEFORE ggf18 (exactly what? we're not sure). Two phases: phase 1: quick, phase 2: for interop after GGF18. Discussion fo extensions can be accomodated RIGHT NOW in parallel to this -- to keep the momentum going. Meeting concluded. Next call: Friday Sep 1 2006 7am Pacific time.