Lauri Loebel Carpenter
October, 2004
Updated: February, 2005
Updated: March, 2005

Sam Request System User Requirements

For the design of the Oracle tables that will be used to implement the Sam Request System, see the Sam Request System Table Design document.

Goals:

SAM is a data handling system, not a work-flow management system. However, the bookkeeping required for managing the work flow of SAM-related data processing is closely coupled with existing SAM design concepts. Hence, it is desireable to include within SAM the necessary infrastructure and components such that external work-flow handling software can use the SAM tools and concepts to organize the execution of SAM-related jobs.

Conceptually, the user makes a data processing request (e.g., reconstruction of raw data, or generation of monte-carlo events), which includes all necessary data about how the user wishes the processing to be done. The request is stored in the SAM database. An external work-flow management system is responsible for handling the request, using the appropriate interfaces within SAM to "mark" the request as "being handled" or "has been handled already", etc.

In other words, SAM provides a place for "bookkeeping" the status of a user-generated request, and the interfaces and guidelines for how to utilize this "log book" of requests so that data integrity is maintained across all requests. However, the interface is provided to an external work-flow manager, which is not contained within the scope of SAM-supported tools. It is not SAM's responsibility to synchronize the work-flow of processing jobs; it is SAM's responsibility to provide a tool that is compatible with the SAM design and which can be used by a separate work-flow management package to perform this function.

So why use SAM at all for this purpose?

  1. Much of the information contained within the request is "file metadata", and as such, is best stored within the master "metadata catalog" (SAM). (That is, do not reinvent the wheel by creating a second type of "metadata storage system").
  2. The SAM infrastructure is robust and well-tested, and all nodes participating in SAM data processing must by definition be able to communicate with the SAM database. Hence, no new network configuration or application information is required beyond that which is already available on a SAM-enabled node.
  3. History. We should not remove existing functionality. We are already providing this function for the d0 reconstruction farms, and for monte-carlo event generation; however, the current implementation is quite fragmented and not well-designed.

Definition of Terms:

  • Processing Request: a request to process an existing SAM dataset using the SAM machinery for job submission, file delivery, etc. (e.g., run one or more projects against the dataset). A processing request must contain the following information:
    • user who is submitting the request
    • work group to which the user belongs and which should be charged for any resources used in processing the request
    • dataset specification, definining the files to be processed
    • application family to be used for each datatier of output file
  • Simulation Request: a request to generate a number of Monte Carlo events with specified parameters, and store the output metadata in SAM. A simulation request must contain the following inforamtion:
    • user who is submitting the request
    • work group to which the user belongs and which should be charged for any resources used in processing the request
    • number of events to be generated
    • all parameter values required by the event generator, in a format suitable for storage in the SAM database. (It is up to the work-flow management software, or some other package, to perform any conversion to/from SAM database format to the format required by the external event generation tools).
    • application family to be used for each datatier of output file
  • RequestID: the unique identifier of a user request. The requestId is generated by SAM when the request is created.
  • RequestType: the type of request that has been made (Processing Request, or Simulation Request).
  • RequestStatus: the current state of an existing Request, corresponding to one of the following states:
    • new (new request, still subject to tweaking and modification)
    • pending (new request, finalized and no longer modifiable)
    • approved (approved for handling)
    • hold (not approved for handling)
    • partial (in the process of being handled, but not yet complete)
    • complete (has been completely handled)
    • terminated (cannot be completed for some reason)
  • RequestData: the base data associated with every request, including who created the request, when the request was created, what type of request it is, and the request status
  • RequestParameters: the metadata associated with an event-generation request (that is, the parameters to be passed to the event generator and subsequent processors)
  • RequestHandlingDetail: information provided to SAM by an external work-flow manager describing the particular details of how a request is being handled. There may be many requestHandlingDetails associated with a particular request (as in the case where several different sites produce some portion of the requested events, or where several projects need to be run in order to complete the processing of a particular dataset, etc.).
  • RequestDetailStatus: the current state of an existing RequestHandlingDetail, corresponding to one of the following states:
    • assigned (a requestHandler has been assigned to this request)
    • running (the requestHandler has begun handling the request
    • complete (the requestHandler has completed its portion of the request
    • terminated (the requestHandler cannot complete its portion of the request for some reason)
  • ArchivedRequest: a request that has been marked as being "archived", meaning that it is removed from the list of actively considered requests. This should help performance for systems with many requests over time; as the requests are handled and completed, they can be archived so that they do not contributed to lists of requests that are "open", "completed", "in process", etc.

User Requirements:

  1. System must be able to handle both types of requests (processing and event-generation).
  2. All information about a request must be obtainable from the unique RequestId associated with that request, including the requestData and requestParameters.
  3. Application family information will be maintained in the RequestParameter dictionary for both types of requests.
  4. All information about the handling of a request must be obtainable from the requestHandlingDetailId's associated with the requestId.
  5. Users must be able to create a new request by "cloning" an existing request and "tweaking" it; the original requestId being cloned should be stored as part of the new request information.
  6. Users must be able to update the requestParameters, or completely delete a request, until/unless the request has been set to the 'new' status and is ready to be submitted to a requestHandler. Once a request is eligible for submission to a requestHandler, the requestParameters must be "frozen", no longer open to modification or removal from the system.
  7. A request may be "archived", that is, removed from active consideration. Requests that are "archived" will not show up in any displays of request information, unless specifically requested.
  8. When a file is stored in SAM, if the metadata for that file includes the specification of a requestId, then the metadata for that file must:
    • the metadata for the dataTier of the file being declared must include the metadata for that dataTier from the original request, with all parameterValues being equal. The metadata dictionary for this dataTier may be a superset of the original paramRequestDictionary for this dataTier, but it must include all of the original params for this dataTier.
    • the paramRequestDictionary for this file will be merged with the dataFileParamsDictionary for the file, so that all requestParams become dataFileParams when the file is declared (including the requestParams for dataTiers that do not match the dataTier of this particular file).
    • the initial implementation will not allow the dataFileParams to modify any paramValue specified in the requestParams (except for category 'global'); however, it will be implemented such that it is easily changed if there is a use case where dataFileParams should override requestParams.
    • the dataFileParams may include more information than the requestParams contains, but may not override any of the stated requestParams (unless such a use case is found and the code modified).
  9. The work-flow manager is responsible for maintaining the requestStatus and any associated requestDetailStatus values. SAM will provide the interfaces and documentation; SAM will also perform basic "sanity" checks to assist in ensuring data integrity, such as making sure that a requestStatus is not set to "complete" unless there exists an associated requestHandlingDetail with status "complete", etc.
  10. SAM interfaces to the Request System will contain the following functionality:
    • create a new request from scratch
    • create a new request by cloning and modifying an existing request, keeping track of the original request that was cloned
    • modify an existing request, including any associated requestParameters (as long as the request is not eligible to be passed to any requestHandler)
    • modify the status of a request after it has been passed to a requestHandler
    • add a requestHandler record to an existing request
    • modify a requestHandler associated with an existing request, including the requestHandlerStatus and other associated information
    • obtain full information about an existing request
    • obtain list of requestId's based on filtering criteria including user who made the request, workGroup for which the request was made, and requestStatus (along with the history of status, that is, who changed the status and when)
    • archive a request so that, while it is still present in the database for consistency of data, it will not be "visible" unless explicitly requested.

Document Update History:
March 2005:
  • add a statusHistory requirement in order to be able to determine who authorized a request to be eligible for processing, etc.
Feb 2005:
  • metadataParams may augment, but not override, requestParams (until/unless a use case is found where overriding is necessary, and then it must be turned on by a configuration parameter)
  • application data will be specified per-dataTier in the requestParams for both simulation and processing requests
  • changes to the particular status values per comments from affected parties.