Hi, This note is for those folks attending the DRMAA working group meeting that took place on May 13 in California, New York, and Germany... For tomorrow's meeting, I think it might be useful if we focused on the specific API calls. To that end, I've tried to abstract away all the mundane details by converting the API from C code into pseudo-code. I also updated it based on today's discussions. (Note: I only updated section 3.1 in the attached doc.) In particular, rather than discuss overall intent, I think it would be useful to try and phrase issues/questions as suggested changes and additions to the API. For example, instead of saying, "I think there should be an asynchronous callback interface", one would propose a modification to the current API spec, e.g., "I'd like to add the following routine 'drmaa_register_callback(job_id, action, callback_func)', with arguments defined as follows ... that behaves as follows ...". Of course, before we start modifying the API, we should probably finish fleshing it out... See you tomorrow, - bill -----Original Message----- From: owner-sched-wg@gridforum.org [mailto:owner-sched-wg@gridforum.org]On Behalf Of Rajic, Hrabri Sent: Wednesday, May 08, 2002 9:06 AM To: Grid Forum Scheduler Area Mail List (sched-wg@gridforum.org) Subject: DRMAA document sched-drmaa-1.3 Here is the updated document that has sections on job categories and native resources contributed by Andreas. I have incorporated comments from the last DRMAA telecon as well. <> <> Note 1. The text version is produced by converting the MS Word document to text version with line breaks. Note 2. Our goal is DRMAA specification 1.0. I am creating new DRMAA document versions as we make progress. Should we put version 0.6 or so in the document title to reflect the state of the spec completion? -Hrabri sched-drmaa-1_4.doc /* ---------- Initialization & Exit Routines ---------- */ drmaa_init(contact) IN contact /* contact information for DRM system (string) */ Initialize DRMAA API library and create a new DRMAA Session. 'Contact' is an implementation dependent string which may be used to specify which DRM system to use. This routine must be called before any other DRMAA calls, except for drmaa_version(). drmaa_exit( ) Disengage from DRMAA library and let the DRMAA library clean up any objects it created. Once this routine is called, no more DRMAA API calls can be made. This routine ends this DRMAA Session, but does not effect any jobs (e.g., queued and running jobs remain queued and running). drmaa_version(major, minor) OUT major /* major version number (non-negative integer) */ OUT minor /* minor version number (non-negative integer) */ Returns the major and minor version numbers of the DRMAA library; for DRMAA 1.0, 'major' is 1 and 'minor' is 0. /* ---------- Job Template and Job Submission Routines ---------- */ drmaa_allocate_job_template( ) RETURNS /* new job template (opaque handle) */ Allocate a new job template. drmaa_delete_job_template(jt) INOUT jt /* job template (opaque handle) */ Deallocate a job template. This routine has no effect on jobs. drmaa_set_attribute(jt, name, value) INOUT jt /* job template (opaque handle) */ IN name /* attribute name (string) */ IN value /* attribute value (string) */ Adds ('name', 'value') pair to list of attributes in job template 'jt'. The following are reserved attribute names available in all implementations of DRMAA v1.0 (and their respective meanings): remote command to execute, including the input parameters job state at submission (suspended, on hold, active) job environment job working directory wall clock time limit job category The following are reserved attribute names available which are not required to be implemented by a conforming DRMAA v1.0 implementation. For attributes that are implemented, the meanings are required to be as follows: standard input, output, and error streams e-mail to report he job completion and status input/output files to be staged and a parameter denoting shared or distributed file system job name to be used for the job submission ([A-Za-z0-9_]+) start after NOTE: We may break each of the above out into a separate set/get routines for improved type checking (and ease of passing complex parameters). They are listed in the form above as a short-hand. As with all naming issues, it is proposed to postpone this decision. drmaa_get_attribute(jt, name, value) IN jt /* job template (opaque handle) */ IN name /* attribute name (string) */ RETURNS /* attribute value (string) */ If 'name' is an existing attribute name in the job template 'jt', then the value of 'name' is returned; otherwise, NULL is returned. ISSUE: Do we want to allow one to get the attributes associated with a job category? What about just getting a list of available native attributes? drmaa_run_job(job_id, jt, synchronous) OUT job_id /* job identifier (string) */ IN jt /* job template (opaque handle) */ IN synchronous /* block until action completes (boolean) */ Submit a job with attributes defined in the job template 'jt'. The job identifier 'job_id' is a printable, NULL terminated string, identical to that returned by the underlying DRM system. If 'synchronous' is TRUE, this routine will not return until the job has been submitted (or an error occurs). drmaa_run_bulk_job(job_id, jt, njobs, synchronous) OUT job_ids /* job identifiers (array of strings) */ IN jt /* job template (opaque handle) */ IN njobs /* number of jobs (non-negative integer) */ IN synchronous /* block until action completes (boolean) */ Submit a set of 'njobs' jobs, each with attributes defined in the job template 'jt'. The job identifiers 'job_ids' are all printable, NULL terminated strings, identical to those returned by the underlying DRM system. If 'synchronous' is TRUE, this routine will not return until all the jobs have been submitted (or an error occurs). ISSUE: How do you pass different parameters to each job, or otherwise differentiate them? /* ---------- Job Control Routines ---------- */ ISSUE: Do we want to add a timeout argument to the routines that might block? drmaa_getpid_status(job_id, exit_code) IN job_id /* job identifier (string) */ OUT exit_code /* exit code of job (integer) */ Wait for the job identified by 'job_id' to exit. The exit status of the job is returned in 'exit_code'; note that this status is machine dependent. drmaa_control(job_id, action, synchronous) IN job_id /* job identifier (string) */ IN action /* control action (const) */ IN synchronous /* block until action completes (boolean) */ Start, stop, restart, or kill the job identified by 'job_id'. If 'job_id' is DRMAA_JOB_ID_ALL, then this routine acts on all jobs *submitted* during this DRMAA session. The legal values for 'action' and their meanings are: DRMAA_CONTROL_SUSPEND: stop the job, DRMAA_CONTROL_RESUME: (re)start the job, DRMAA_CONTROL_HOLD: put the job on-hold, DRMAA_CONTROL_RELEASE: release the hold on the job, and DRMAA_CONTROL_TERMINATE: kill the job. If 'action' is DRMAA_CONTROL_RESUME and 'synchronous' is TRUE, this routine will not return until the requested action has been completed (or an error occurs); otherwise, 'synchronous' has no effect. drmaa_synchronize(job_ids) IN job_ids /* job identifiers (array of strings) */ Wait until all jobs specified by 'job_ids' have finished execution. If 'job_ids' is DRMAA_JOB_IDS_ALL, then this routine waits for all jobs *submitted* during this DRMAA Session. drmaa_wait(job_id, exit_code, options, rusage) IN job_id /* job identifier (string) */ OUT exit_code /* exit code of job (integer) */ IN options /* options (integer) */ OUT rusage /* resource usage (???) */ Wait for the job identified by 'job_id' to exit. The exit status of the job is returned in 'exit_code'; note that this status is machine dependent. Legal values for 'options', and their meanings are: ???. The resource usage of the job is returned in 'rusage'; note that this value is also machine dependent. drmaa_job_ps( char *job_id, int *remote_ps ); IN job_id /* job identifier (string) */ OUT remote_ps /* program status (constant) */ Get the program status of the job identified by 'job_id'. The possible values returned in 'remote_ps' and their meanings are: DRMAA_PS_UNDETERMINED: process status cannot be determined, DRMAA_PS_QUEUED: job is queued, DRMAA_PS_SYSTEM_SUSPENDED: job is system suspended, DRMAA_PS_USER_SUSPENDED: job is user suspended, DRMAA_PS_RUNNING: job is running, DRMAA_PS_DONE: job finished normally, and DRMAA_PS_FAILED: job finished, but failed. /* ---------- Auxiliary Routines ---------- */ PROPOSAL: Postpone discussions of error handling and tracing until we get everything else in better shape. In C, we would probably want to return 0 on success and an errno on failure, except when we're returning a structure pointer (when we can't return an errno). Of course, in C++, we might want to use real exception handling... drmaa_set_trace_file(file_name) IN file_name /* File name (string) */ Specify a file for tracing. By default, all tracing information is written to stderr. drmaa_trace_text(text) IN text /* Message to be logged in trace file (string) */ Write 'text' into the trace file. drmaa_perror( char *text); IN text /* Error message to be logged in trace file (string) */ Record the error message 'text' in the trace file. char *drmaa_strerror (int error); IN errno /* Error number (integer) */ RETURNS /* Readable text version of error (constant string) */ Get the error message text associated for the error number*/