Lauri Loebel Carpenter October, 2004 Updated: February, 2005 Updated: March, 2005
Sam Request System User Requirements
For the design of the Oracle tables that will be used to
implement the Sam Request System, see the
Sam Request System Table Design
document.
Goals:
SAM is a data handling system, not a work-flow management system.
However, the bookkeeping required for managing the work flow of
SAM-related data processing is closely coupled with existing
SAM design concepts. Hence, it is desireable to include within
SAM the necessary infrastructure and components such that external
work-flow handling software can use the SAM tools and
concepts to organize the execution of SAM-related jobs.
Conceptually, the user makes a data processing request (e.g.,
reconstruction of raw data, or generation of monte-carlo events),
which includes all necessary data about how the user wishes the
processing to be done. The request is stored in the SAM
database. An external work-flow management system is responsible for
handling the request, using the appropriate interfaces within
SAM to "mark" the request as "being handled" or "has been
handled already", etc.
In other words, SAM provides a
place for "bookkeeping" the status of a user-generated request,
and the interfaces and guidelines for how to
utilize this "log book" of requests so that data integrity
is maintained across all requests. However, the interface is
provided to an external work-flow manager, which is not
contained within the scope of SAM-supported tools. It is not
SAM's responsibility to synchronize the work-flow of
processing jobs; it is SAM's responsibility to provide a tool
that is compatible with the SAM design and which can be used
by a separate work-flow management package to perform this
function.
So why use SAM at all for this purpose?
- Much of the information contained within the request
is "file metadata", and as such, is best stored within
the master "metadata catalog" (SAM). (That is, do not
reinvent the wheel by creating a second type of "metadata
storage system").
- The SAM infrastructure is robust and well-tested, and
all nodes participating in SAM data processing must by definition
be able to communicate with the SAM database. Hence, no
new network configuration or application information is required beyond
that which is already available on a SAM-enabled node.
- History. We should not remove existing functionality. We
are already providing this function for the d0 reconstruction farms,
and for monte-carlo event generation; however, the current
implementation is quite fragmented and not well-designed.
Definition of Terms:
- Processing Request: a request to process
an existing SAM dataset using the SAM machinery for job submission, file delivery,
etc. (e.g., run one or more projects against the dataset). A processing
request must contain the following information:
- user who is submitting the request
- work group to which the user belongs and which should be charged
for any resources used in processing the request
- dataset specification, definining the files to be
processed
- application family to be used for each
datatier of output file
- Simulation Request: a request to
generate a number of Monte Carlo events with specified
parameters, and store the output metadata in SAM. A simulation
request must contain the following inforamtion:
- user who is submitting the request
- work group to which the user belongs and which should be charged
for any resources used in processing the request
- number of events to be generated
- all parameter values required by the event generator, in a format
suitable for storage in the SAM database. (It is up to the
work-flow management software, or some other package,
to perform any conversion to/from SAM database format
to the format required by the external event generation tools).
- application family to be used for each
datatier of output file
- RequestID: the unique identifier of
a user request. The requestId is generated by SAM when the request
is created.
- RequestType: the type of request
that has been made (Processing Request, or Simulation
Request).
- RequestStatus: the current state of an
existing Request, corresponding to one of the following
states:
- new (new request, still subject to tweaking and modification)
- pending (new request, finalized and no longer modifiable)
- approved (approved for handling)
- hold (not approved for handling)
- partial (in the process of being handled, but not yet complete)
- complete (has been completely handled)
- terminated (cannot be completed for some reason)
- RequestData: the base data associated with
every request, including who created the request, when the request
was created, what type of request it is, and the request status
- RequestParameters: the metadata associated
with an event-generation request (that is, the parameters to
be passed to the event generator and subsequent processors)
- RequestHandlingDetail: information provided to SAM
by an external work-flow manager describing the particular details
of how a request is being handled. There may be many requestHandlingDetails
associated with a particular request (as in the case where several different
sites produce some portion of the requested events, or where several projects
need to be run in order to complete the processing of a particular dataset,
etc.).
- RequestDetailStatus: the current state of an
existing RequestHandlingDetail, corresponding to one of the following states:
- assigned (a requestHandler has been assigned to this request)
- running (the requestHandler has begun handling the request
- complete (the requestHandler has completed its portion of the request
- terminated (the requestHandler cannot complete its portion of the request for some
reason)
- ArchivedRequest: a request that has been marked as
being "archived", meaning that it is removed from the list of actively
considered requests. This should help performance for systems with many
requests over time; as the requests are handled and completed, they can
be archived so that they do not contributed to lists of requests that are
"open", "completed", "in process", etc.
User Requirements:
- System must be able to handle both types of requests
(processing and event-generation).
- All information about a request must be obtainable
from the unique RequestId associated with that request,
including the requestData and requestParameters.
- Application family information will be maintained in the
RequestParameter dictionary for both types of requests.
- All information about the handling of a request must be obtainable
from the requestHandlingDetailId's associated with the requestId.
- Users must be able to create a new request by "cloning"
an existing request and "tweaking" it; the original requestId
being cloned should be stored as part of the new request information.
- Users must be able to update the requestParameters, or completely delete a request,
until/unless the request has been set to the 'new' status
and is ready to be submitted to a requestHandler.
Once a request is eligible for submission to a requestHandler, the requestParameters must be
"frozen", no longer open to modification or removal from the
system.
- A request may be "archived", that is, removed from active consideration.
Requests that are "archived" will not show up in any displays of
request information, unless specifically requested.
- When a file is stored in SAM, if the metadata for that file
includes the specification of a requestId, then the metadata for that
file must:
- the metadata for the dataTier of the file being declared
must include the metadata for that dataTier from the original
request, with all parameterValues being equal. The metadata
dictionary for this dataTier may be a superset of the original
paramRequestDictionary for this dataTier, but it must include
all of the original params for this dataTier.
- the paramRequestDictionary for this file will be merged
with the dataFileParamsDictionary for the file, so that
all requestParams become dataFileParams when the file
is declared (including the requestParams for dataTiers that
do not match the dataTier of this particular file).
- the initial implementation will not allow the dataFileParams
to modify any paramValue specified in the requestParams (except
for category 'global'); however,
it will be implemented such that it is easily changed if
there is a use case where dataFileParams should override
requestParams.
- the dataFileParams may include more information than
the requestParams contains, but may not override any
of the stated requestParams (unless such a use case is
found and the code modified).
- The work-flow manager is responsible for maintaining the
requestStatus and any associated requestDetailStatus values. SAM
will provide the interfaces and documentation; SAM will also perform
basic "sanity" checks to assist in ensuring data integrity, such as
making sure that a requestStatus is not set to "complete" unless
there exists an associated requestHandlingDetail with status "complete",
etc.
- SAM interfaces to the Request System will contain the following
functionality:
- create a new request from scratch
- create a new request by cloning and modifying an existing request,
keeping track of the original request that was cloned
- modify an existing request, including any associated requestParameters
(as long as the request is not eligible to be passed to any requestHandler)
- modify the status of a request after it has been passed to a requestHandler
- add a requestHandler record to an existing request
- modify a requestHandler associated with an existing request, including the
requestHandlerStatus and other associated information
- obtain full information about an existing request
- obtain list of requestId's based on filtering criteria including
user who made the request, workGroup for which the request was made, and
requestStatus (along with the history of status, that is, who changed
the status and when)
- archive a request so that, while it is still present in the database
for consistency of data, it will not be "visible" unless explicitly
requested.
Document Update History:
| March 2005: |
- add a statusHistory requirement in order to be able to determine
who authorized a request to be eligible for processing, etc.
|
| Feb 2005: |
- metadataParams may augment, but not override, requestParams (until/unless
a use case is found where overriding is necessary, and then it must
be turned on by a configuration parameter)
- application data will be specified per-dataTier in the requestParams
for both simulation and processing requests
- changes to the particular status values per comments from affected parties.
|
|