Details about the current installed Queuing System

A Job Management System consists of three modules:

The task of a Queue Manager module is to receive requests for job execution from users and to save those jobs into queues. The Queue Manager contacts the Scheduler module and sends the job execution requests.

Scheduler is responsible for defining the mode in which jobs are executed, i.e. defining when and where jobs are to be executed. In that process, the Scheduler uses three sets of data: job information, resource information and scheduling policies. It receives job information from the Queue Manager, and information on resource status and load from the Resource Manager module.

The Resource Manager is responsible for collecting data about client status and about starting and monitoring job execution. The Resource Manager module consists of two components: server and nodes. The server collects information from the nodes and sends them jobs that the nodes are supposed to execute. The nodes (clients) are responsible for preparing the environment for job execution and client status monitoring.

Current Set-up

Currently, OpenPBS (aka TORQUE) version 1.2.0p2, and Maui Cluster Scheduler version 3.2.6p11 are installed and running on our cluster.

Today (November, 3rd 2006) Torque 2.1.6 and Maui 3.2.6p13 are available.

-- [OpenPBS], [PBSPro] and [Torque] --

Portable Batch System PBS is an old and well tested job management system, developed for NASA’s needs. Later it was acquired by Veridian, which was later acquired by Altair. Since then, PBS is divided into OpenPBS and PBSPro. OpenPBS is an open source system intended for academic use. Veridian has focused further development on PBSPro, therefore the future of OpenPBS is uncertain. Cluster Resources Inc. has continued with OpenPBS development, under the name Torque (Tera-scale Open-source Resource and QUEue manager). Hereinafter, concept PBS is used in cases when all three systems are described.

Advantages of PBS are:

PBS contains tm interface which allows parallel libraries to start parallel processes using PBS services. In that way, PBS has total control over parallel processes and can collect data on resource usage. Currently, LAM/MPI and MPICH (by additional mpiexec module) parallel libraries are supported. PBS allows simple creation of global queues with certain features. Using queues, it is possible to easily simply more policies of job scheduling.

The OpenPBS disadvantages are the following:

As opposed to OpenPBS, PBSPro comes with support and additional functionalities. Some of additional functionalities are: CPU harvesting, enhanced Scheduler module, integration with Globus services

The services available from Torque:

-- Maui --

Current version installed on the cluster: Maui version 3.2.6p11

Maui is an open source system intended for job scheduling. Maui is used primarily as job scheduler in other Batch Systems, but it can be used independently. Maui can be integrated with OpenPBS, PBSPro, Torque, SGE, LSF and Loadleveler.

Maui has following functionalities: job scheduling based on making reservations and defining extremely complex job scheduling policies. Maui allows scheduling according to the following principles: fair share, backfilling, fixed resource reservations, assigning priorities based on job characteristics (e.g. owner or a group of owners, job size related to space or time). In addition to that, Maui contains a set of tools for retrieval of job execution history and different forms of job statistics and resource usage.

Maui’s disadvantage is that efficiency of job scheduling depends on the fact how well users describe their requests. For example, backfilling will not be pronounced at all if users don’t define job duration. Request definition problem is not associated with Maui only, but with all job management systems which accomplish scheduling on the basis of making reservations.

Commands and Description:

Alternatives

(at least the ones that I am aware of)

-- [SGE] --

Sun Grid Engine (SGE) is Sun's open source job management system.

Advantages of SGE:

Disadvantages:

-- Moab --

Moab Workload Manager is a commercial product of Cluster Resources aiming at extending the functionalities of job management systems. Basic functionalities of Moab Workload Manager are: possibilities of defining advanced job scheduling policies, integration of more job management systems and other resource management systems (e.g. data warehouse management system), interfaces for integration with other cluster middleware systems (e.g. accounting and cluster monitoring systems), intuitive interface in a form of a web portal (Moab Access Portal) and support for integration with Grid systems. Functionalities related to job scheduling are identical to functionalities offered by Maui.

More details about the installed Queuing System. (last edited 2007-05-11 13:33:40 by MathiasVandenbogaert)