Skip to content. Skip to navigation

ICTP Portal

Sections
You are here: Home Manuals on-line LSF 6.0 Platform LSF Version 6.0 - Running Jobs with Platform LSF - Viewing Information About Jobs
Personal tools
Document Actions

Platform LSF Version 6.0 - Running Jobs with Platform LSF - Viewing Information About Jobs

Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Viewing Information About Jobs


Use the bjobs and bhist commands to view information about jobs:

  • bjobs reports the status of jobs and the various options allow you to display specific information.
  • bhist reports the history of one or more jobs in the system.

You can also find jobs on specific queues or hosts, find jobs submitted by specific projects, and check the status of specific jobs using their job IDs or names.

Contents

[ Top ]


Viewing Job Information (bjobs)

The bjobs command has options to display the status of jobs in the LSF system. For more details on these or other bjobs options, see the bjobs command in the Platform LSF Reference.

Unfinished current jobs

The bjobs command reports the status of LSF jobs.

When no options are specified, bjobs displays information about jobs in the PEND, RUN, USUSP, PSUSP, and SSUSP states for the current user.

For example:

bjobs 
JOBID USER   STAT   QUEUE     FROM_HOST EXEC_HOST JOB_NAME    SUBMIT_TIME
3926  user1  RUN    priority  hostf     hostc     verilog     Oct 22 13:51
605   user1  SSUSP  idle      hostq     hostc     Test4       Oct 17 18:07
1480  user1  PEND   priority  hostd               generator   Oct 19 18:13
7678  user1  PEND   priority  hostd               verilog     Oct 28 13:08
7679  user1  PEND   priority  hosta               coreHunter  Oct 28 13:12
7680  user1  PEND   priority  hostb               myjob       Oct 28 13:17

All jobs

bjobs -a displays the same information as bjobs and in addition displays information about recently finished jobs (PEND, RUN, USUSP, PSUSP, SSUSP, DONE and EXIT statuses).

All your jobs that are still in the system and jobs that have recently finished are displayed.

Running jobs

bjobs -r displays information only for running jobs (RUN state).

[ Top ]


Viewing Job Pend and Suspend Reasons (bjobs -p)

When you submit a job, it may be held in the queue before it starts running and it may be suspended while running. You can find out why jobs are pending or in suspension with the bjobs -p option.

You can combine bjob options to tailor the output. For more details on these or other bjobs options, see the bjobs command in the Platform LSF Reference.

In this section

Pending jobs and reasons

bjobs -p displays information for pending jobs (PEND state) and their reasons. There can be more than one reason why the job is pending.

For example:

bjobs -p
JOBID USER   STAT   QUEUE    FROM_HOST    JOB_NAME    SUBMIT_TIME
7678  user1  PEND   priority  hostD       verilog      Oct 28 13:08
Queue's resource requirements not satisfied:3 hosts;
Unable to reach slave lsbatch server: 1 host;
Not enough job slots: 1 host;

The pending reasons also mention the number of hosts for each condition.

You can view reasons why a job is pending or in suspension for all users by combining the -p and -u all options.

Viewing pending and suspend reasons with host names

To get specific host names along with pending reasons, use the -p and -l options with the bjobs command.

For example:

bjobs -lp
Job Id <7678>, User <user1>, Project <default>, Status <PEND>, Queue <priority>
, Command <verilog>
Mon Oct 28 13:08:11: Submitted from host <hostD>,CWD <$HOME>, Requested 
Resources <type==any && swp>35>;

PENDING REASONS:
Queue's resource requirements not satisfied: hostb, hostk, hostv;
Unable to reach slave lsbatch server: hostH;
Not enough job slots: hostF;

SCHEDULING PARAMETERS:
            r15s   r1m  r15m   ut  pg    io   ls    it    tmp    swp    mem
loadSched   -      0.7  1.0    -   4.0   -    -     -     -      -      -
loadStop    -      1.5  2.5    -   8.0   -    -     -     -      -      -

Viewing suspend reasons only

The -s option of bjobs displays reasons for suspended jobs only. For example:

bjobs -s
JOBID USER  STAT  QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
605   user1 SSUSP idle  hosta     hostc     Test4    Oct 17 18:07
The host load exceeded the following threshold(s):
Paging rate: pg;
Idle time: it;

[ Top ]


Viewing Detailed Job Information (bjobs -l)

The -l option of bjobs displays detailed information about job status and parameters, such as the job's current working directory, parameters specified when the job was submitted, and the time when the job started running. For more details on bjobs options, see the bjobs command in the Platform LSF Reference.

bjobs -l with a job ID displays all the information about a job, including:

  • Submission parameters
  • Execution environment
  • Resource usage

For example:

bjobs -l 7678
Job Id <7678>, User <user1>, Project <default>, Status <PEND>, Queue <priority>
, Command <verilog>
Mon Oct 28 13:08:11: Submitted from host <hostD>,CWD <$HOME>, 
Requested Resources <type==any && swp>35>;
PENDING REASONS:
Queue's resource requirements not satisfied:3 hosts;
Unable to reach slave lsbatch server: 1 host;
Not enough job slots: 1 host;

SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut   pg    io   ls  it  tmp  swp  mem
loadSched  -      0.7  1.0    -    4.0   -    -   -   -    -    -
loadStop   -      1.5  2.5    -    8.0   -    -   -   -    -    -

[ Top ]


Viewing Job Resource Usage (bjobs -l)

LSF monitors the resources jobs consume while they are running. The -l option of the bjobs command displays the current resource usage of the job.

For more details on bjobs options, see the bjobs command in the Platform LSF Reference.

Job-level information

Job-level information includes:

  • Total CPU time consumed by all processes of a job
  • Total resident memory usage in KB of all currently running processes of a job
  • Total virtual memory usage in KB of all currently running processes of a job
  • Currently active process group ID of a job
  • Currently active processes of a job

Update interval

The job-level resource usage information is updated at a maximum frequency of every SBD_SLEEP_TIME seconds. See the Platform LSF Reference for the value of SBD_SLEEP_TIME.

The update is done only if the value for the CPU time, resident memory usage, or virtual memory usage has changed by more than 10 percent from the previous update or if a new process or process group has been created.

Viewing job resource usage

To view resource usage for a specific job, specify bjobs -l with the job ID:

bjobs -l 1531
Job Id <1531>, User <user1>, Project <default>, Status <RUN>, Queue 
<priority> Command <example 200>
Fri Dec 27 13:04:14   Submitted from host <hostA>, CWD <$HOME>, 
SpecifiedHosts <hostD>;
Fri Dec 27 13:04:19: Started on <hostD>, Execution Home </home/user1>, Executio
n CWD </home/user1>;
Fri Dec 27 13:05:00: Resource usage collected.
The CPU time used is 2 seconds.
MEM: 147 Kbytes; SWAP: 201 Kbytes PGID: 8920;  PIDs: 8920 8921 8922 

SCHEDULING PARAMETERS:
          r15s   r1m   r15m   ut    pg    io    ls    it    tmp   swp   mem
loadSched -      -     -      -     -     -     -     -     -     -     -
loadStop  -      -     -      -     -     -     -     -     -     -     -

[ Top ]


Viewing Job History (bhist)

Sometimes you want to know what has happened to your job since it was submitted. The bhist command displays a summary of the pending, suspended and running time of jobs for the user who invoked the command. Use bhist -u all to display a summary for all users in the cluster.

For more details on bhist options, see the bhist command in the Platform LSF Reference.

In this section

Viewing detailed job history

The -l option of bhist displays the time information and a complete history of scheduling events for each job.

% bhist -l 1531
JobId <1531>, User <user1>, Project <default>, Command< example200>
Fri Dec 27 13:04:14: Submitted from host <hostA> to Queue <priority>, 
CWD <$HOME>, Specified Hosts <hostD>;
Fri Dec 27 13:04:19: Dispatched to <hostD>;
Fri Dec 27 13:04:19: Starting (Pid 8920);
Fri Dec 27 13:04:20: Running with execution home </home/user1>, Execution CWD 
</home/user1>, Execution Pid <8920>;
Fri Dec 27 13:05:49: Suspended by the user or administrator;
Fri Dec 27 13:05:56: Suspended: Waiting for re-scheduling after being resumed 
by user;
Fri Dec 27 13:05:57: Running;
Fri Dec 27 13:07:52: Done successfully. The CPU time used is 28.3 seconds.

Summary of time in seconds spent in various states by Sat Dec 27 13:07:52 1997
PEND  PSUSP  RUN  USUSP  SSUSP  UNKWN  TOTAL
5     0      205  7      1      0      218

Viewing history of jobs not listed in active event log

LSF periodically backs up and prunes the job history log. By default, bhist only displays job history from the current event log file. You can use bhist - n num_logfiles to display the history for jobs that completed some time ago and are no longer listed in the active event log.

bhist -n num_logfiles

The -n num_logfiles option tells the bhist command to search through the specified number of log files instead of only searching the current log file.

Log files are searched in reverse time order. For example, the command bhist -n 3 searches the current event log file and then the two most recent backup files.

Examples

bhist -n 1

searches the current event log file lsb.events

bhist -n 2

searches lsb.events and lsb.events.1

bhist -n 3

searches lsb.events, lsb.events.1, lsb.events.2

bhist -n 0

searches all event log files in LSB_SHAREDIR

Viewing chronological history of jobs

By default, the bhist command displays information from the job event history file, lsb.events, on a per job basis.

bhist -t

The -t option of bhist can be used to display the events chronologically instead of grouping all events for each job.

bhist -T

The -T option allows you to select only those events within a given time range.

For example, the following displays all events which occurred between 14:00 and 14:30 on a given day:

% bhist -t -T 14:00,14:30
Wed Oct 22 14:01:25: Job <1574> done successfully;
Wed Oct 22 14:03:09: Job <1575> submitted from host to Queue , CWD , User , 
Project , Command , Requested Resources ;
Wed Oct 22 14:03:18: Job <1575> dispatched to ;
Wed Oct 22 14:03:18: Job <1575> starting (Pid 210);
Wed Oct 22 14:03:18: Job <1575> running with execution home , Execution CWD , 
Execution Pid <210>;
Wed Oct 22 14:05:06: Job <1577> submitted from host  to Queue, CWD , User , 
Project , Command , Requested Resources ;
Wed Oct 22 14:05:11: Job <1577> dispatched to ;
Wed Oct 22 14:05:11: Job <1577> starting (Pid 429);
Wed Oct 22 14:05:12: Job <1577> running with execution home, Execution CWD , 
Execution Pid <429>;
Wed Oct 22 14:08:26: Job <1578> submitted from host to Queue, CWD , User , 
Project , Command;
Wed Oct 22 14:10:55: Job <1577> done successfully;
Wed Oct 22 14:16:55: Job <1578> exited;
Wed Oct 22 14:17:04: Job <1575> done successfully;

[ Top ]


Viewing Job Output (bpeek)

The output from a job is normally not available until the job is finished. However, LSF provides the bpeek command for you to look at the output the job has produced so far.

By default, bpeek shows the output from the most recently submitted job. You can also select the job by queue or execution host, or specify the job ID or job name on the command line.

For more details on bpeek options, see the bpeek command in the Platform LSF Reference.

Viewing output of a running job

Only the job owner can use bpeek to see job output. The bpeek command will not work on a job running under a different user account.

To save time, you can use this command to check if your job is behaving as you expected and kill the job if it is running away or producing unusable results.

For example:

bpeek 1234
<< output from stdout >>
Starting phase 1
Phase 1 done
Calculating new parameters
...

[ Top ]


Viewing Information about SLAs and Service Classes

Monitoring the progress of an SLA (bsla)

Use bsla to display the properties of service classes configured in lsb.serviceclasses and dynamic state information for each service class.

Examples

  • One velocity goal of service class Tofino is active and on time. The other configured velocity goal is inactive.
    % bsla
    SERVICE CLASS NAME: Tofino
     -- day and night velocity
    PRIORITY: 20
    
    GOAL: VELOCITY
    ACTIVE WINDOW: (9:00-17:00)
    STATUS: Active:On time
    VELOCITY:  10
    CURRENT VELOCITY:  10 
    
    GOAL:  VELOCITY 
    ACTIVE WINDOW: (17:30-8:30) 
    STATUS:  Inactive
    VELOCITY:  30
    CURRENT VELOCITY:  0 
    
       NJOBS   PEND    RUN     SSUSP   USUSP   FINISH
         360    300     10         2       0       48
    
  • The deadline goal of service class Uclulet is not being met, and bsla displays status Active:Delayed:
    % bsla
    SERVICE CLASS NAME:  Uclulet
     -- working hours
    PRIORITY: 20
    
    GOAL:  DEADLINE 
    ACTIVE WINDOW: (8:30-16:00) 
    DEADLINE:  (Tue Jun 24 16:00)
    ESTIMATED FINISH TIME:  (Wed Jun 25 14:30)
    OPTIMUM NUMBER OF RUNNING JOBS:  2
    STATUS:  Active:Delayed
    
       NJOBS   PEND    RUN     SSUSP   USUSP   FINISH
          13      0      0         0       0       13
    
  • The configured velocity goal of the service class Kyuquot is active and on time. The configured deadline goal of the service class is inactive.
    % bsla Kyuquot 
    SERVICE CLASS NAME:  Kyuquot 
     -- Daytime/Nighttime SLA
    PRIORITY:  23
    USER_GROUP:  user1 user2
    
    GOAL:  VELOCITY 
    ACTIVE WINDOW: (9:00-17:30) 
    STATUS:  Active:On time
    VELOCITY:  8 
    CURRENT VELOCITY:  0 
    
    GOAL:  DEADLINE 
    ACTIVE WINDOW: (17:30-9:00) 
    STATUS:  Inactive
    
       NJOBS   PEND    RUN     SSUSP   USUSP   FINISH
          0      0       0        0       0       0
    
  • The throughput goal of service class Inuvik is always active. bsla displays:
    • Status as active and on time
    • An optimum number of 5 running jobs to meet the goal
    • Actual throughput of 10 jobs per hour based on the last CLEAN_PERIOD
      % bsla Inuvik
      SERVICE CLASS NAME:  Inuvik
       -- constant throughput
      PRIORITY:  20
      
      GOAL:  THROUGHPUT 
      ACTIVE WINDOW: Always Open 
      STATUS:  Active:On time
      SLA THROUGHPUT:  10.00 JOBs/CLEAN_PERIOD
      THROUGHPUT:  6 
      OPTIMUM NUMBER OF RUNNING JOBS:  5
      
         NJOBS   PEND    RUN     SSUSP   USUSP   FINISH
          110     95       5        0       0      10
      

Tracking historical behavior of an SLA (bacct)

Use bacct to display historical performance of a service class. For example, service classes Inuvik and Tuktoyaktuk configure throughput goals.

% bsla
SERVICE CLASS NAME:  Inuvik
 -- throughput 6 
PRIORITY:  20

GOAL:  THROUGHPUT 
ACTIVE WINDOW: Always Open 
STATUS:  Active:On time
SLA THROUGHPUT:  10.00 JOBs/CLEAN_PERIOD
THROUGHPUT:  6 
OPTIMUM NUMBER OF RUNNING JOBS:  5

   NJOBS   PEND    RUN     SSUSP   USUSP   FINISH
    111     94       5        0       0      12
--------------------------------------------------------------
SERVICE CLASS NAME:  Tuktoyaktuk
 -- throughput 3
PRIORITY:  15

GOAL:  THROUGHPUT 
ACTIVE WINDOW: Always Open 
STATUS:  Active:On time
SLA THROUGHPUT:  4.00 JOBs/CLEAN_PERIOD
THROUGHPUT:  3 
OPTIMUM NUMBER OF RUNNING JOBS:  4

   NJOBS   PEND    RUN     SSUSP   USUSP   FINISH
    104     96       4        0       0       4

These two service classes have the following historical performance. For SLA Inuvik, bacct shows a total throughput of 8.94 jobs per hour over a period of 20.58 hours:

% bacct -sla Inuvik

Accounting information about jobs that are: 
  - submitted by users user1, 
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on service classes Inuvik, 
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second ) 
 Total number of done jobs:     183      Total number of exited jobs:     1
 Total CPU time consumed:      40.0      Average CPU time consumed:     0.2
 Maximum CPU time of a job:     0.3      Minimum CPU time of a job:     0.1
 Total wait time in queues: 1947454.0
 Average wait time in queue:10584.0
 Maximum wait time in queue:18912.0      Minimum wait time in queue:    7.0
 Average turnaround time:     12268 (seconds/job)
 Maximum turnaround time:     22079      Minimum turnaround time:      1713
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00
 Total throughput:             8.94 (jobs/hour)  during   20.58 hours
 Beginning time:       Oct 11 20:23      Ending time:          Oct 12 16:58

For SLA Tuktoyaktuk, bacct shows a total throughput of 4.36 jobs per hour over a period of 19.95 hours:

% bacct -sla Tuktoyaktuk


Accounting information about jobs that are: 
  - submitted by users user1, 
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on service classes Tuktoyaktuk, 
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second ) 
 Total number of done jobs:      87      Total number of exited jobs:     0
 Total CPU time consumed:      18.0      Average CPU time consumed:     0.2
 Maximum CPU time of a job:     0.3      Minimum CPU time of a job:     0.1
 Total wait time in queues: 2371955.0
 Average wait time in queue:27263.8
 Maximum wait time in queue:39125.0      Minimum wait time in queue:    7.0
 Average turnaround time:     30596 (seconds/job)
 Maximum turnaround time:     44778      Minimum turnaround time:      3355
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00
 Total throughput:             4.36 (jobs/hour)  during   19.95 hours
 Beginning time:       Oct 11 20:50      Ending time:          Oct 12 16:47

Because the run times are not uniform, both service classes actually achieve higher throughput than configured.

For more information

See Administering Platform LSF for more information about service classes and goal-oriented SLA driven scheduling.

[ Top ]


Viewing Jobs in Job Groups

Viewing job group information (bjgroup)

Use the bjgroup command to see information about jobs in specific job groups.

% bjgroup
GROUP_NAME         NJOBS   PEND    RUN    SSUSP  USUSP  FINISH
/fund1_grp          5       4       0      1      0      0
/fund2_grp          11      2       5      0      0      4
/bond_grp           2       2       0      0      0      0
/risk_grp           2       1       1      0      0      0
/admi_grp           4       4       0      0      0      0

Viewing jobs by job group (bjobs)

Use the -g option of bjobs and specify a job group path to view jobs attached to the specified group.

% bjobs -g /risk_group
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
113     user1   PEND  normal     hostA                   myjob     Jun 17 16:15
111     user2   RUN   normal     hostA       hostA       myjob     Jun 14 15:13
110     user1   RUN   normal     hostB       hostA       myjob     Jun 12 05:03
104     user3   RUN   normal     hostA       hostC       myjob     Jun 11 13:18

bjobs -l displays the full path to the group to which a job is attached:

% bjobs -l -g /risk_group

Job <101>, User <user1>, Project <default>, Job Group 
</risk_group>, Status <RUN>, Queue <normal>, Command <myjob>
Tue Jun 17 16:21:49: Submitted from host <hostA>, CWD 
</home/user1;
Tue Jun 17 16:22:01: Started on <hostA>;
...

For more information

See Administering Platform LSF for more information about using job groups.

[ Top ]


Viewing Information about Resource Allocation Limits

Your job may be pending because some configured resource allocation limit has been reached. Use the blimits command to show the dynamic counters of resource allocation limits configured in Limit sections in lsb.resources. blimits displays the current resource usage to show what limits may be blocking your job.

blimits command

The blimits command displays:

  • Configured limit policy name
  • Users (-u option)
  • Queues (-q option)
  • Hosts (-m option)
  • Project names (-p option)

Resources that have no configured limits or no limit usage are indicated by a dash (-). Limits are displayed in a USED/LIMIT format. For example, if a limit of 10 slots is configured and 3 slots are in use, then blimits displays the limit for SLOTS as 3/10.

If limits MEM, SWP, or TMP are configured as percentages, both the limit and the amount used are displayed in MB. For example, lshosts displays maximum memory (maxmem) of 249 MB, and MEM is limited to 10% of available memory. If 10 MB out of are used, blimits displays the limit for MEM as 10/25 (10 MB USED from a 25 MB LIMIT).

Configured limits and resource usage for builtin resources (slots, mem, tmp, and swp load indices) are displayed as INTERNAL RESOURCE LIMITS separately from custom external resources, which are shown as EXTERNAL RESOURCE LIMITS.

Limits are displayed for both the vertical tabular format and the horizontal format for Limit sections. Since a vertical format Limit section has no name, blimits displays NONAMEnnn under the NAME column for these limits, where the unnamed limits are numbered in the order the vertical-format Limit sections appear in the lsb.resources file.

If a resource consumer is configured as all, the limit usage for that consumer is indicated by a dash (-).

PER_HOST slot limits are not displayed. The bhosts commands displays these as MXJ limits.

In MultiCluster, blimits returns the information about all limits in the local cluster.

Examples

For the following limit definitions:

Begin Limit
NAME = limit1
USERS = user1
PER_QUEUE = all
PER_HOST = hostA hostC
TMP = 30%
SWP = 50%
MEM = 10%
End Limit

Begin Limit
NAME = limit_ext1
PER_HOST = all
RESOURCE = ([user1_num, 30] [hc_num, 20])
End Limit

blimits displays the following:

% blimits
 
INTERNAL RESOURCE LIMITS:

NAME     USERS     QUEUES     HOSTS   PROJECTS   SLOTS    MEM      TMP      SWP
limit1   user1         q2     hostA         -       -   10/25        -   10/258
limit1   user1         q3     hostA         -       -       -   30/2953       -
limit1   user1         q4     hostC         -       -       -    40/590       -

EXTERNAL RESOURCE LIMITS:

NAME        USERS   QUEUES   HOSTS   PROJECTS    user1_num    hc_num     HC_num
limit_ext1      -        -   hostA          -           -       1/20          -
limit_ext1      -        -   hostC          -         1/30      1/20          -
  • In limit policy limit1, user1 submitting jobs to q2, q3, or q4 on hostA or hostC is limited to 30% tmp space, 50% swap space, and 10% available memory. No limits have been reached, so the jobs from user1 should run. For example, on hostA for jobs from q2, 10 MB of memory are used from a 25 MB limit and 10 MB of swap space are used from a 258 MB limit.
  • In limit policy limit_ext1, external resource user1_num is limited to 30 per host and external resource hc_num is limited to 20 per host. Again, no limits have been reached, so the jobs requesting those resources should run.

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: November 21, 2003
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2003 Platform Computing Corporation. All rights reserved.

Powered by Plone This site conforms to the following standards: