Skip to content. Skip to navigation

ICTP Portal

Sections
You are here: Home Manuals on-line LSF 6.0 Platform LSF Version 6.0 - Running Jobs with Platform LSF - Working with Jobs
Personal tools
Document Actions

Platform LSF Version 6.0 - Running Jobs with Platform LSF - Working with Jobs

Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Working with Jobs


Contents

[ Top ]


Submitting Jobs (bsub)

In this section

bsub command

You submit a job with the bsub command. If you do not specify any options, the job is submitted to the default queue configured by the LSF administrator (usually queue normal).

For example, if you submit the job my_job without specifying a queue, the job goes to the default queue.

% bsub my_job
Job <1234> is submitted to default queue <normal>

In the above example, 1234 is the job ID assigned to this job, and normal is the name of the default job queue.

Your job remains pending until all conditions for its execution are met. Each queue has execution conditions that apply to all jobs in the queue, and you can specify additional conditions when you submit the job.

You can also specify an execution host or a range of hosts, a queue, and start and termination times, as well as a wide range of other job options. See the bsub command in the Platform LSF Reference for more details on bsub options.

Submitting a job to a specific queue (bsub -q)

Job queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Each job queue can use a configured subset of server hosts in the cluster; the default is to use all server hosts.

System administrators can configure job queues to control resource access by different users and types of application. Users select the job queue that best fits each job.

The default queue is normally suitable to run most jobs, but the default queue may assign your jobs a very low priority, or restrict execution conditions to minimize interference with other jobs. If automatic queue selection is not satisfactory, choose the most suitable queue for each job.

The factors affecting which queue to choose are user access restrictions, size of job, resource limits of the queue, scheduling priority of the queue, active time windows of the queue, hosts used by the queue, scheduling load conditions, and the queue description displayed by the bqueues -l command.

Viewing available queues

To see available queues, use the bqueues command.

Use bqueues -u user_name to specify a user or user group so that bqueues displays only the queues that accept jobs from these users.

The bqueues -m host_name option allows users to specify a host name or host group name so that bqueues displays only the queues that use these hosts to run jobs.

You can submit jobs to a queue as long as its STATUS is Open. However, jobs are not dispatched unless the queue is Active.

Submitting a job

The following examples are based on the queues defined in the default configuration. Your LSF administrator may have configured different queues.

To run a job during off hours because the job generates very high load to both the file server and the network, you can submit it to the night queue:

% bsub -q night

If you have an urgent job to run, you may want to submit it to the priority queue:

% bsub -q priority

If you want to use hosts owned by others and you do not want to bother the owners, you may want to run your low priority jobs on the idle queue so that as soon as the owner comes back, your jobs get suspended:

% bsub -q idle

If you are running small jobs and do not want to wait too long to get the results, you can submit jobs to the short queue to be dispatched with higher priority:

% bsub -q short


Make sure your jobs are short enough that they are not killed for exceeding the CPU time limit of the queue (check the resource limits of the queue, if any).

If your job requires a specific execution environment, you may need to submit it to a queue that has a particular job starter defined. LSF administrators are able to specify a queue-level job starter as part of the queue definition; ask them for the name of the queue and configuration details.

See Administering Platform LSF for information on queue-level job starters.

Submitting a job associated to a project (bsub -P)

Use the bsub -P project_name option to associate a project name with a job.

On systems running IRIX 6, before the submitted job begins execution, a new array session is created and the project ID corresponding to the project name is assigned to the session.

Submitting a job associated to a user group (bsub -G)

You can use the bsub -G user_group option to submit a job and associate it with a specified user group. This option is only useful with fairshare scheduling.

For more details on fairshare scheduling, see Administering Platform LSF.

You can specify any user group to which you belong as long as it does not contain any subgroups. You must be a direct member of the specified user group.

User groups in non-leaf nodes cannot be specified because it will cause ambiguity in determining the correct shares given to a user.

For example, to submit the job myjob associated to user group special:

% bsub -G special myjob

Submitting a job with a job name (bsub -J)

Use bsub -J job_name to submit a job and assign a job name to it.

You can later use the job name to identify the job. The job name need not be unique.

For example, to submit a job and assign the name my_job:

% bsub -J my_job

You can also assign a job name to a job array. See Administering Platform LSF for more information about job arrays.

Submitting a job to a service class (bsub -sla)

Use the bsub -sla service_class_name to submit a job to a service class for SLA-driven scheduling.

You submit jobs to a service class as you would to a queue, except that a service class is a higher level scheduling policy that makes use of other, lower level LSF policies like queues and host partitions to satisfy the service-level goal that the service class expresses.

For example:

% bsub -W 15 -sla Kyuquot sleep 100

submits the UNIX command sleep together with its argument 100 as a job to the service class named Kyuquot.

The service class name where the job is to run is configured in lsb.serviceclasses. If the SLA does not exist or the user is not a member of the service class, the job is rejected.

Outside of the configured time windows, the SLA is not active, and LSF schedules jobs without enforcing any service-level goals. Jobs will flow through queues following queue priorities even if they are submitted with -sla.


You should submit your jobs with a run time limit (-W option) or the queue should specify a run time limit (RUNLIMIT in the queue definition in lsb.queues). If you do not specify a run time limit, LSF automatically adjusts the optimum number of running jobs according to the observed run time of finished jobs.

See Administering Platform LSF for more information about service classes and goal-oriented SLA driven scheduling.

Submitting a job under a job group (bsub -g)

Use bsub -g to submit a job into a job group. The job group does not have to exist before submitting the job. For example:

% bsub -g /risk_group/portfolio1/current myjob
Job <105> is submitted to default queue.

Submits myjob to the job group /risk_group/portfolio1/current.

If group /risk_group/portfolio1/current exists, job 105 is attached to the job group.

If group /risk_group/portfolio1/current does not exist, LSF checks its parent recursively, and if no groups in the hierarchy exist, all three job groups are created with the specified hierarchy and the job is attached to group.

See Administering Platform LSF for more information about job groups.

[ Top ]


Modifying a Submitted Job (bmod)

In this section

[ Top ]


Modifying Pending Jobs (bmod)

If your submitted jobs are pending (bjobs shows the job in PEND state), use the bmod command to modify job submission parameters. You can also modify entire job arrays or individual elements of a job array.

See the bmod command in the Platform LSF Reference for more details.

Replacing the job command-line

To replace the job command line, use the bmod -Z "new_command" option. The following example replaces the command line option for job 101 with "myjob file":

% bmod -Z "myjob file" 101

Changing a job parameter

To change a specific job parameter, use bmod with the bsub option used to specify the parameter. The specified options replace the submitted options. The following example changes the start time of job 101 to 2:00 a.m.:

% bmod -b 2:00 101

Resetting to default submitted value

To reset an option to its default submitted value (undo a bmod), append the n character to the option name, and do not include an option value. The following example resets the start time for job 101 back to its default value:

% bmod -bn 101

Resource reservation can be modified after a job has been started to ensure proper reservation and optimal resource utilization.

Modifying a job submitted to a service class

Use the -sla option of bmod to modify the service class a job is attached to, or to attach a submitted job to a service class. Use bmod -slan to detach a job from a service class. For example:

% bmod -sla Kyuquot 2307

Attaches job 2307 to the service class Kyuquot.

% bmod -slan 2307

Detaches job 2307 from the service class Kyuquot.

You cannot:

  • Use -sla with other bmod options
  • Move job array elements from one service class to another, only entire job arrays
  • Modify the service class of jobs already attached to a job group

See Administering Platform LSF for more information about submitting jobs to service classes for SLA-driven scheduling.

Modifying a job submitted to a job group

Use the -g option of bmod and specify a job group path to move a job or a job array from one job group to another. For example:

% bmod -g /risk_group/portfolio2/monthly 105

moves job 105 to job group /risk_group/portfolio2/monthly.

Like bsub -g, if the job group does not exist, LSF creates it.

bmod -g cannot be combined with other bmod options. It can operate on finished, running, and pending jobs.

You can modify your own job groups and job groups that other users create under your job groups. The LSF administrator can modify job groups of all users.

You cannot move job array elements from one job group to another, only entire job arrays. A job array can only belong to one job group at a time. You cannot modify the job group of a job attached to a service class.

bhist -l shows job group modification information:

% bhist -l 105

Job <105>, User <user1>, Project <default>, Job Group </risk_group>, Command 
<myjob>
                     
Wed May 14 15:24:07: Submitted from host <hostA>, to Queue <normal>, CWD
<$HOME/lsf51/5.1/sparc-sol7-64/bin>;
Wed May 14 15:24:10: Parameters of Job are changed:
                         Job group changes to: /risk_group/portfolio2/monthly;
Wed May 14 15:24:17: Dispatched to <hostA>;
Wed May 14 15:24:17: Starting (Pid 8602);
...

See Administering Platform LSF for more information about job groups.

[ Top ]


Modifying Running Jobs

Modifying resource reservation

A job is usually submitted with a resource reservation for the maximum amount required. Use bmod -R to modify the resource reservation for a running job. This command is usually used to decrease the reservation, allowing other jobs access to the resource.

The following example sets the resource reservation for job 101 to 25MB of memory and 50 MB of swap space:

% bmod -R "rusage[mem=25:swp=50]" 101

By default, you can modify resource reservation for running jobs. Set LSB_MOD_ALL_JOBS in lsf.conf to modify additional job options.

See Reserving Resources for Jobs for more details.

Modifying other job options

If LSB_MOD_ALL_JOBS is specified in lsf.conf, the job owner or the LSF administrator can use the bmod command to modify the following job options for running jobs:

  • CPU limit (-c [hour:]minute[/host_name | /host_model] | -cn)
  • Memory limit (-M mem_limit | -Mn)
  • Run limit (-W run_limit[/host_name | /host_model] | -Wn)
  • Standard output file name (-o output_file | -on)
  • Standard error file name (-e error_file | -en)
  • Rerunnable jobs (-r | -rn)

In addition to resource reservation, these are the only bmod options that are valid for running jobs. You cannot make any other modifications after a job has been dispatched.

An error message is issued and the modification fails if these options are used on running jobs in combination with other bmod options.

Modifying resource limits for running jobs

The new resource limits cannot exceed the resource limits defined in the queue.

To modify the CPU limit of running jobs, LSB_JOB_CPULIMIT=Y must be defined in lsf.conf.

To modify the memory limit of running jobs, LSB_JOB_MEMLIMIT=Y must be defined in lsf.conf.

Limitations

Modifying remote running jobs in a MultiCluster environment is not supported.

To modify the name of job error file for a running job, you must use bsub -e or bmod -e to specify an error file before the job starts running.

For more information

See Administering Platform LSF for more information about job output files, using job-level resource limits, and submitting rerunnable jobs.

[ Top ]


Controlling Jobs

LSF controls jobs dispatched to a host to enforce scheduling policies, or in response to user requests. The LSF system performs the following actions on a job:

  • Suspend by sending a SIGSTOP signal
  • Resume by sending a SIGCONT signal
  • Terminate by sending a SIGKILL signal

On Windows, equivalent functions have been implemented to perform the same tasks.

In this section

[ Top ]


Killing Jobs (bkill)

The bkill command cancels pending batch jobs and sends signals to running jobs. By default, on UNIX, bkill sends the SIGKILL signal to running jobs.

Before SIGKILL is sent, SIGINT and SIGTERM are sent to give the job a chance to catch the signals and clean up. The signals are forwarded from mbatchd to sbatchd, which waits for the job to exit before reporting the status. Because of these delays, for a short period of time after entering the bkill command, bjobs may still report that the job is running.

On Windows, job control messages replace the SIGINT and SIGTERM signals, and termination is implemented by the TerminateProcess() system call.

Example

To kill job 3421:

bkill 3421
Job <3421> is being terminated

Forcing removal of a job from LSF

If a job cannot be killed in the operating system, use bkill -r to force the removal of the job from LSF.

The bkill -r command removes a job from the system without waiting for the job to terminate in the operating system. This sends the same series of signals as bkill without -r, except that the job is removed from the system immediately, the job is marked as EXIT, and job resources that LSF monitors are released as soon as LSF receives the first signal.

[ Top ]


Suspending and Resuming Jobs (bstop and bresume)

The bstop and bresume commands allow you to suspend or resume a job.

A job can also be suspended by its owner or the LSF administrator with the bstop command. These jobs are considered user-suspended and are displayed by bjobs as USUSP.

When the user restarts the job with the bresume command, the job is not started immediately to prevent overloading. Instead, the job is changed from USUSP to SSUSP (suspended by the system). The SSUSP job is resumed when host load levels are within the scheduling thresholds for that job, similarly to jobs suspended due to high load.

If a user suspends a high priority job from a non-preemptive queue, the load may become low enough for LSF to start a lower priority job in its place. The load created by the low priority job can prevent the high priority job from resuming.

This can be avoided by configuring preemptive queues. See Administering Platform LSF for information about configuring queues.

Suspending a job

bstop command

To suspend a job, use the bstop command. Suspending a job causes your job to go into USUSP state if the job is already started, or to go into PSUSP state if your job is pending.

By default, jobs that are suspended by the administrator can only be resumed by the administrator or root; users do not have permission to resume a job suspended by another user or the administrator. Administrators can resume jobs suspended by users or administrators. Administrators can also enable users to resume their own jobs that have been stopped by an administrator.

UNIX

bstop sends the following signals to the job:

  • SIGTSTP for parallel or interactive jobs

    SIGTSTP is caught by the master process and passed to all the slave processes running on other hosts.

  • SIGSTOP for sequential jobs

    SIGSTOP cannot be caught by user programs. The SIGSTOP signal can be configured with the LSB_SIGSTOP parameter in lsf.conf.

Example

To suspend job 3421, enter:

bstop 3421
Job <3421> is being stopped

Resuming a job

bresume command

To resume a job, use the bresume command.

Resuming a user-suspended job does not put your job into RUN state immediately. If your job was running before the suspension, bresume first puts your job into SSUSP state and then waits for sbatchd to schedule it according to the load conditions.

For example, to resume job 3421, enter:

bresume 3421
Job <3421> is being resumed

You cannot resume jobs suspended by another user; you can only resume your own jobs. If your job was suspended by the administrator, you cannot resume it; the administrator or root must resume the job for you.

ENABLE_USER_RESUME parameter (lsb.params)

If ENABLE_USER_RESUME=Y in lsb.params, you can resume your own jobs that have been suspended by the administrator.

[ Top ]


Changing Job Order Within Queues (bbot and btop)

By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come-first-served), subject to availability of suitable server hosts.

Use the btop and bbot commands to change the position of pending jobs, or of pending job array elements, to affect the order in which jobs are considered for dispatch. Users can only change the relative position of their own jobs, and LSF administrators can change the position of any users' jobs.

Moving a job to the bottom of a queue

Use bbot to move jobs relative to your last job in the queue.

If invoked by a regular user, bbot moves the selected job after the last job with the same priority submitted by the user to the queue.

If invoked by the LSF administrator, bbot moves the selected job after the last job with the same priority submitted to the queue.

Moving a job to the top of a queue

Use btop to move jobs relative to your first job in the queue.

If invoked by a regular user, btop moves the selected job before the first job with the same priority submitted by the user to the queue.

If invoked by the LSF administrator, btop moves the selected job before the first job with the same priority submitted to the queue.

Example

In the following example, job 5311 is moved to the top of the queue. Since job 5308 is already running, job 5311 is placed in the queue after job 5308.


Note that user1's job is still in the same position on the queue. user2 cannot use btop to get extra jobs at the top of the queue; when one of his jobs moves up the queue, the rest of his jobs move down.

bjobs -u all
JOBID USER  STAT  QUEUE    FROM_HOST  EXEC_HOST  JOB_NAME   SUBMIT_TIME
5308  user2 RUN   normal   hostA      hostD      /s500     Oct 23 10:16
5309  user2 PEND  night    hostA                 /s200     Oct 23 11:04
5310  user1 PEND  night    hostB                 /myjob    Oct 23 13:45
5311  user2 PEND  night    hostA                 /s700     Oct 23 18:17
% btop 5311
Job <5311> has been moved to position 1 from top.
% bjobs -u all
JOBID USER  STAT  QUEUE    FROM_HOST  EXEC_HOST  JOB_NAME   SUBMIT_TIME
5308  user2 RUN   normal   hostA      hostD      /s500     Oct 23 10:16
5311  user2 PEND  night    hostA                 /s200     Oct 23 18:17
5310  user1 PEND  night    hostB                 /myjob    Oct 23 13:45
5309  user2 PEND  night    hostA                 /s700     Oct 23 11:04

[ Top ]


Controlling Jobs in Job Groups

Stopping (bstop)

Use the -g option of bstop and specify a job group path to suspend jobs in a job group

% bstop -g /risk_group 106
Job <106> is being stopped

Use job ID 0 (zero) to suspend all jobs in a job group:

% bstop -g /risk_group/consolidate 0
Job <107> is being stopped
Job <108> is being stopped
Job <109> is being stopped

Resuming (bresume)

Use the -g option of bresume and specify a job group path to resume suspended jobs in a job group:

% bresume -g /risk_group 106
Job <106> is being resumed

Use job ID 0 (zero) to resume all jobs in a job group:

% bresume -g /risk_group 0
Job <109> is being resumed
Job <110> is being resumed
Job <112> is being resumed

Terminating (bkill)

Use the -g option of bkill and specify a job group path to terminate jobs in a job group. For example,

% bkill -g /risk_group 106
Job <106> is being terminated

Use job ID 0 (zero) to terminate all jobs in a job group:

% bkill -g /risk_group 0
Job <1413> is being terminated
Job <1414> is being terminated
Job <1415> is being terminated
Job <1416> is being terminated

bkill only kills jobs in the job group you specify. It does not kill jobs in lower level job groups in the path. For example, jobs are attached to job groups /risk_group and /risk_group/consolidate:

% bsub -g /risk_group  myjob
Job <115> is submitted to default queue <normal>.
% bsub -g /risk_group/consolidate myjob2
Job <116> is submitted to default queue <normal>.

The following bkill command only kills jobs in /risk_group, not the subgroup /risk_group/consolidate:

% bkill -g /risk_group 0
Job <115> is being terminated
% bkill -g /risk_group/consolidate 0
Job <116> is being terminated

Deleting (bgdel)

Use bgdel command to remove a job group. The job group cannot contain any jobs. For example:

% bgdel /risk_group
Job group /risk_group is deleted.

deletes the job group /risk_group and all its subgroups.

For more information

See Administering Platform LSF for more information about using job groups.

[ Top ]


Submitting a Job to Specific Hosts

To indicate that a job must run on one of the specified hosts, use the bsub -m "hostA hostB ..." option.

By specifying a single host, you can force your job to wait until that host is available and then run on that host.

For example:

% bsub -q idle -m "hostA hostD hostB" myjob

This command submits myjob to the idle queue and tells LSF to choose one host from hostA, hostD and hostB to run the job. All other batch scheduling conditions still apply, so the selected host must be eligible to run the job.

Resources and bsub -m

If you have applications that need specific resources, it is more flexible to create a new Boolean resource and configure that resource for the appropriate hosts in the cluster.

This must be done by the LSF administrator. If you specify a host list using the -m option of bsub, you must change the host list every time you add a new host that supports the desired resources. By using a Boolean resource, the LSF administrator can add, move or remove resources without forcing users to learn about changes to resource configuration.

[ Top ]


Submitting a Job and Indicating Host Preference

When several hosts can satisfy the resource requirements of a job, the hosts are ordered by load. However, in certain situations it may be desirable to override this behavior to give preference to specific hosts, even if they are more heavily loaded.

For example, you may have licensed software which runs on different groups of hosts, but you prefer it to run on a particular host group because the jobs will finish faster, thereby freeing the software license to be used by other jobs.

Another situation arises in clusters consisting of dedicated batch servers and desktop machines which can also run jobs when no user is logged in. You may prefer to run on the batch servers and only use the desktop machines if no server is available.

To see a list of available hosts, use the bhosts command.

In this section

Submitting a job with host preference

bsub -m

The bsub -m option allows you to indicate preference by using + with an optional preference level after the host name. The keyword others can be used to refer to all the hosts that are not explicitly listed. You must specify others with at least one host name or host group name.

For example:

bsub -m "hostD+ others" -R "solaris && mem> 10" myjob

In this example, LSF selects all solaris hosts that have more than 10 MB of memory available. If hostD meets this criteria, it will be picked over any other host which otherwise meets the same criteria. If hostD does not meet the criteria, the least loaded host among the others will be selected. All the other hosts are considered as a group and are ordered by load.

Queues and host preference

A queue can also define host preferences for jobs. Host preferences specified by bsub -m override the queue specification.

In the queue definition in lsb.queues, use the HOSTS parameter to list the hosts or host groups to which the queue can dispatch jobs.

Use the not operator (~) to exclude hosts or host groups from the list of hosts to which the queue can dispatch jobs. This is useful if you have a large cluster, but only want to exclude a few hosts from the queue definition.

See the Platform LSF Reference for information about the lsb.queues file.

Submitting a job with different levels of host preference

You can indicate different levels of preference by specifying a number after the plus sign (+). The larger the number, the higher the preference for that host or host group. You can also specify the + with the keyword others.

For example:

bsub -m "groupA+2 groupB+1 groupC" myjob

In this example, LSF gives first preference to hosts in groupA, second preference to hosts in groupB and last preference to those in groupC. Ordering within a group is still determined by load.

You can use the bmgroup command to display configured host groups.

Submitting a job with resource requirements

To submit a job which will run on Solaris 7 or Solaris 8:

% bsub -R "sol7 || sol8" myjob

When you submit a job, you can also exclude a host by specifying a resource requirement using hname resource:

% bsub -R "hname!=hostb && type==sgi6" myjob

[ Top ]


Using LSF with Non-Shared File Space

LSF is usually used in networks with shared file space. When shared file space is not available, use the bsub -f command to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.

LSF attempts to run the job in the directory where the bsub command was invoked. If the execution directory is under the user's home directory, sbatchd looks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the /net automount option.

If the directory is not available on the execution host, the job is run in /tmp. Any files created by the batch job, including the standard output and error files created by the -o and -e options to bsub, are left on the execution host.

LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the -f option to bsub.

LSF uses the lsrcp command to transfer files. lsrcp contacts RES on the remote host to perform file transfer. If RES is not available, the UNIX rcp command is used.

See Administering Platform LSF for more information about file transfer in LSF.

bsub -f

The -f "[local_file operator [remote_file]]" option to the bsub command copies a file between the submission host and the execution host. To specify multiple files, repeat the -f option.

local_file

File name on the submission host

remote_file

File name on the execution host

The files local_file and remote_file can be absolute or relative file path names. You must specify at least one file name. When the file remote_file is not specified, it is assumed to be the same as local_file. Including local_file without the operator results in a syntax error.

operator

Operation to perform on the file. The operator must be surrounded by white space.

Valid values for operator are:

>

local_file on the submission host is copied to remote_file on the execution host before job execution. remote_file is overwritten if it exists.

<

remote_file on the execution host is copied to local_file on the submission host after the job completes. local_file is overwritten if it exists.

<<

remote_file is appended to local_file after the job completes. local_file is created if it does not exist.

><, <>

Equivalent to performing the > and then the < operation. The file local_file is copied to remote_file before the job executes, and remote_file is copied back, overwriting local_file, after the job completes. <> is the same as ><

If the submission and execution hosts have different directory structures, you must ensure that the directory where remote_file and local_file will be placed exists. LSF tries to change the directory to the same path name as the directory where the bsub command was run. If this directory does not exist, the job is run in your home directory on the execution host.

You should specify remote_file as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the bsub command is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.

[ Top ]


Reserving Resources for Jobs

About resource reservation

When a job is dispatched, the system assumes that the resources that the job consumes will be reflected in the load information. However, many jobs do not consume the resources they require when they first start. Instead, they will typically use the resources over a period of time.

For example, a job requiring 100 MB of swap is dispatched to a host having 150 MB of available swap. The job starts off initially allocating 5 MB and gradually increases the amount consumed to 100 MB over a period of 30 minutes. During this period, another job requiring more than 50 MB of swap should not be started on the same host to avoid over-committing the resource.

You can reserve resources to prevent overcommitment by LSF. Resource reservation requirements can be specified as part of the resource requirements when submitting a job, or can be configured into the queue level resource requirements.

Viewing host-level resource information

Use bhosts -l to view the amount of resources reserved on each host. Use bhosts -s to view information about shared resources.

Viewing queue-level resource information

To see the resource usage configured at the queue level, use bqueues -l.

How resource reservation works

When deciding whether to schedule a job on a host, LSF considers the reserved resources of jobs that have previously started on that host. For each load index, the amount reserved by all jobs on that host is summed up and subtracted (or added if the index is increasing) from the current value of the resources as reported by the LIM to get amount available for scheduling new jobs:

available amount = current value - reserved amount for all 
jobs

Using the rusage string

To specify resource reservation at the job level, use bsub -R and include the resource usage section in the resource requirement (rusage) string.

For example:

% bsub -R "rusage[tmp=30:duration=30:decay=1]" myjob

will reserve 30 MB of temp space for the job. As the job runs, the amount reserved will decrease at approximately 1 MB/minute such that the reserved amount is 0 after 30 minutes.

[ Top ]


Submitting a Job with Start or Termination Times

By default, LSF dispatches jobs as soon as possible, and then allows them to finish, although resource limits might terminate the job before it finishes.

You can specify a time of day at which to start or terminate a job.

Submitting a job with a start time

If you do not want to start your job immediately when you submit it, use bsub -b to specify a start time. LSF will not dispatch the job before this time. For example:

% bsub -b 5:00 myjob

This example submits a job that remains pending until after the local time on the master host reaches 5 a.m.

Submitting a job with a termination time

Use bsub -t to submit a job and specify a time after which the job should be terminated. For example:

% bsub -b 11:12:5:40 -t 11:12:20:30 myjob

The job called myjob is submitted to the default queue and will start after November 12 at 05:40 a.m. If the job is still running on November 12 at 8:30 p.m., it will be killed.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: November 21, 2003
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2003 Platform Computing Corporation. All rights reserved.

Powered by Plone This site conforms to the following standards: