This is the eighth blog post in a series of articles about using the CCEB cluster. An overview of the series is available here. This post focuses on normal sessions and submitting batch jobs.
Initializing a normal session and submitting a batch job is an advanced topic but you’ll still be requesting appropriate system parameters using the LSF bsub
command. We will again focus on the bsub
for a normal session and common advanced options associated with the bsub
to request more cores for parallelization, force memory constraints on your session, and request certain machines. We will also set up the source Rscript to allow for efficient batch job submission.
A note on vocabulary. A batch job is used to refer to the jobs that can run without end user interaction, or can be scheduled to run as resources permit. Batch processing is for those frequently used programs that can be executed with minimal human interaction. The way in which a batch job is run or submit is through a normal session on the cluster.
Note: The vocabulary of batch jobs can become quite confusing as the name itself is an oxymoron. In this blog post I will use the word job to refer to the entire batch job submit. The iterations that are run within the batch job are referred to as tasks. Also normal and batch are used interchangeably here.
Unlike the interactive session, normal sessions are requested and submit through your bsub
command. Once the job is submit, tasks will either run or error. These processes are not interactive or dynamic. After the submission, you have to wait for the individual tasks in the batch to finish as a success or with an exit code (error). In the next blog post, I will cover ways to check in on a submit batch job through the normal session.
A normal session is no different than physically waiting in a queue. You submit your job to run. The master host assesses your job request based on the availability of machines, memory, cores, etc and your requested specifications. The job and eventual tasks are then sent to run, pend, or killed. The benefit of this approach is that the cluster can self-manage the resources and jobs submit. It is possible that some of your tasks will run while others pend.
Complex code, code that takes a long time to run, or requires a lot of memory should always be submit to the normal queue. The system maintainers prefer that we submit jobs as often as possible as normal rather than interactive to allow the master host to self regulate efficiently. As a user it is more cumbersome to de-bug a batch job so spend time checking your code thoroughly before hand in an interactive session.
Like anything on the cluster you need to ssh
onto the submission host. From the submission host you can use the bsub
command to submit a normal session or batch job.
A normal session, is more annoying to setup than an interactive session. You first will need to put your code on the cluster. Recall, I use fetch a file transfer client free for students and academics, to do this. Then you’ll need to submit the bsub
command.
With a normal session and batch jobs it is common to put a number of the unchanged portions of the bsub
into a shell script. A shell script, or .sh file, is a place to store commonly used bsub
options so you don’t have to run them every time within the interactive bsub
command. This may be confusing now but we will go though an example.
In my blog post Using the CCEB High Performance Computing Cluster: Interactive Session Basics I discussed adding the module load R/#.#.#
command to you .bash_profile
so that it automatically runs when you log on. When submitting a normal session batch job you would normally have to type module load R/#.#.#
somewhere in your submission so that R is loaded. If you put this in your start up .bash_profile
though you do not need to submit a bsub
with module load
. The code below assumes you have added module load R/#.#.#
into you .bash_profile
. If you haven’t done this I describe how you can add it here: Using the CCEB High Performance Computing Cluster: Interactive Session Basics. If you don’t want to add this line then you’ll need to incorporate module load
into your bsub
or shell script.
bsub
OptionsMany of the normal submission options are the same as an interactive session but we’ll cover them in the context of a normal job here. To continue a bsub
on multiple lines you can use \
. This is a forced call to next line so spaces must still be accurately separated. I’ve used \
below since this code gets very long.
The most basic bsub
to submit a normal job is as follows:
bsub -q cceb_normal \
-J "jobname[1-n]" \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
-q cceb_normal
rather than -q cceb_interactive
.jobname
but you can specify whatever you’d like for this. For example, sims
, tapas
, alisjob
.1-n
tasks or iterations are to be run.output.txt
and since the full path is specified this file is saved in /project/taki3/amv/cluster/normal_session_examples
.Rscript
found in /project/taki3/amv/cluster/normal_session_examples/normal_example.R
.This busb
does not specify how many jobs to run simultaneously so the cluster will allocate the number running based on how free machines are. In a normal/batch session, we are limited by PMACS to a total of 50 cores running simultaneously so keep that in mind when submitting jobs. The host won’t give you more than that even if you specify more.
Some common additions include:
-n #
- to request multiple cores for each jobbsub -q cceb_normal \
-J "jobname[1-n]" \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
mc.cores
inside the R function.-n #
cannot exceed 50 by PMACs usage restrictions. If you’re on another cluster or queue this may be possible. For example, when I use the taki
queue I have access to all the cores in his cluster queue.-R "span[hosts=1]"
- to request that the total number of cores you are requesting in a single iteration of the job be all on the same host. This is only true for a single job in the batch not across all batches. For example, job 1 may span silver01
while job 2 may span silver02
.bsub -q cceb_normal \
-J "jobname[1-n]" \
-R "span[hosts=1]" \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
-R "rusage[mem=####]"
- This option will not run your task until the amount of memory input is free. Note, the memory is in megabytes.bsub -q cceb_normal \
-J "jobname[1-n]" \
-R "span[hosts=1] rusage[mem=5000]" \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
bsub -q cceb_normal \
-J "jobname[1-n]" \
-R "rusage[mem=5000]" \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
-m "machinename"
- to request a specific host by name.bsub -q cceb_normal \
-J "jobname[1-n]" \
-m "silver02" \
-R "span[hosts=1] rusage[mem=5000]" \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
bsub -q cceb_normal \
-J "jobname[1-n]" \
-m "silver02" \
-R "rusage[mem=5000]" \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
It is not advisable to request specific machines on the CCEB cluster. This over-rides the master hosts allocation and result in core and memory issues. Additionally, if cluster maintenance is scheduled on a machine it could kill all of your jobs and sub-tasks.
-M ####
- to kill the job if it exceeds the #### memory amount.bsub -q cceb_normal \
-J "jobname[1-n]" \
-m "silver02" \
-R "span[hosts=1] rusage[mem=5000]" \
-M 10000 \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
bsub -q cceb_normal \
-J "jobname[1-n]" \
-m "silver02" \
-R "rusage[mem=5000]" \
-M 10000 \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
Again, memory is in megabytes.
"jobname[1-n]%N"
- this command names the job, sets the total n
number of iterations or tasks to run, and also sets the number of iterations or tasks%N
that are possible to run simultaneously.bsub -q cceb_normal \
-J "jobname[1-n]%5" \
-m "silver02" \
-R "span[hosts=1] rusage[mem=5000]" \
-M 10000 \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
bsub -q cceb_normal \
-J "jobname[1-n]%5" \
-m "silver02" \
-R "rusage[mem=5000]" \
-M 10000 \
-n 8 \
-o /project/taki3/amv/cluster/normal_session_examples/example_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example.R
-n #
* %N
cannot exceed 50.-n #
cores. Do your math to be sure you don’t use more than 50.ssh
onto the cluster.bsub
to submit the batch job and initiate a normal session.bjobs
. After you submit a job on the cluster you can check what iterations are running or pending using bjobs
. You’ll type bjobs
from the command line on either the submission host or on an execution host. For example, you can run bjobs
after ssh
ing or after ssh
ing and bsub
ing into an interactive session.bjobs
Examplebjobs
is an LSF command that checks your active job statuses. By default, displays information about your own pending, running, and suspended jobs. After I submit a normal job I run bjobs
to see if the tasks are running, pending, or erroring.
This command will be used throughout the examples to show at a surface level if the job and tasks ran. To thoroughly check job status you need to look at the output file.
Below I provide 3 unique examples of running code using a normal session. These are the most common job types you will run when using a normal session. Each example is executing the same code but utilizing the cluster in different ways. Therefore, the code is not exactly the same but the results should all be the same assuming the same seed is specified across the methods. I’ve adapted the example code provided in the Advanced Topics for Interactive Sessions blog post.
You can download the code and sample output files used below here.
This example runs the code provided below as 100 separate jobs in a normal session. After ssh
ing onto scisub run the following:
This code is given through the zipped file above in ‘normal_session_examples/normal_example1/normal_example1.R’. Change the paths in the code to match those related to your machine. Use a transfer client and put this file on the cluster. Note, you should specify the file paths to where you put the script and would like to save output on the cluster.
For simplicity, I’ve provided the code below to explain some special characteristics of the code related to a batch job.
library(lme4)
library(parallel)
normal_example1 <- function(i, wd = '/project/taki3/amv/cluster/') {
message('Number of cores detected ', Sys.getenv('LSB_DJOB_NUMPROC'))
message(paste0('Sampling data for iter ', i))
# Randomly sample without replacement iris data
iris_sample = dplyr::sample_n(tbl = iris, size = 75, replace = FALSE)
message(paste0('Calculating model for iter ', i))
# Fit linear mixed effects model
model = lm(Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length, data = iris_sample)
# If the directory doesn't exist for job create it
if(dir.exists(paste0(wd, 'linear_models')) == FALSE){
dir.create(paste0(wd, 'linear_models'))
}
message(paste0('Saving model for iter ', i))
# Save each iteration of results
saveRDS(object = model, file = paste0(wd, "linear_models/model_norm1_session_", i, ".rds"))
message(paste0('Completed iter ', i))
# Return the fitted model
return(model)
}
# Set a master seed
set.seed(23)
# Create a vector of seeds for the normal batch job
i = as.numeric(Sys.getenv("LSB_JOBINDEX"))
# If i = 0 set it to 1
# Useful for when you're de-bugging code in an interactive session where i is set to 0
if(i == 0){
i = 1
}
# Create a vector of seeds for array job
iter = 100
seed_vec = sample(1:100000, iter, replace=F)
# Set the seed specific to this task
set.seed(seed_vec[i])
# Run the example function
normal_example1(i = i, wd = '/project/taki3/amv/cluster/')
Notes on the code above related to a batch job:
The i = as.numeric(Sys.getenv("LSB_JOBINDEX"))
line assigns an object i
to the task index running. For example, say task 17 is running then within the task i = 17
. This helps index and set unique seeds in every task or iteration.
The code below is useful if you are debugging in an interactive session.
# Useful for when you're de-bugging code in an interactive session
if(i == 0){
i = 1
}
In an interactive session i = as.numeric(Sys.getenv("LSB_JOBINDEX"))
will return i = 0
or i = NA
. This is not ideal as you’re debugging and running things all together. This step simply says if i = 0
then re-assign it to 1
so that the code does not error and runs.
The iter = 100
must match the number of tasks assigned in the bsub
. I set iter = 100
so that I can sample 100 unique random seeds to assign each iteration or task. This ensures the sample we obtain in the function is reproducible and not the same across iterations.
The seed_vec = sample(1:100000, iter, replace=F)
takes 100 random samples of numbers from 1 to 1000000 without replacement. Since the original seed was set to 23 these 100 numbers will be the same every time you submit the job.
Finally, set the seed based on the task being run using set.seed(seed_vec[i])
.
To be sure that this function is ready to submit as a batch job and there are no bugs run a single iteration of the normal_example1
function. You can just run everything after the comment # Set a master seed
in an interactive session.
bsub
After you have your R code ready to execute and saved on the cluster ssh
onto the submission host and run the following bsub
command.
bsub -q cceb_normal \
-J "norm1[1-100]%20" \
-R "rusage[mem=500]" \
-M 1000 \
-o /project/taki3/amv/cluster/normal_session_examples/normal_example1/example1_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example1/normal_example1.R
This command is going to run the normal_example1.R
code as a batch job. 100 iterations or jobs were submit to the queue with the name norm
. I have forced this example to run only 20 jobs (%20
) simultaneously or in parallel simply for example purposes. An iteration will not be submit to run until 500 megabytes are free and will error out if a single job exceeds 1000 megabytes. The output from both the LSF normal session jobs and anything output from R to /project/taki3/amv/cluster/normal_session_examples/normal_example1/normal_example1_output.txt
.
The image below shows my ssh
onto the cluster, submitting the code with a bsub
, and then checking the jobs running with bjobs
.
This example runs the code provided below as 100 separate jobs in a normal session. In each job the code is set to also run in parallel using parallel::mclapply()
. This code is a toy example simple to show the bsub
and parallel::mclapply
together. This example does not really represent the situation in which you would run a batch job with parallelization inside each task. The parallel::mclapply
running in parallel won’t lead to speed gains since we submit a single task to 2 cores. This style of a batch job with parallelization within a task is useful if there is a portion of your code that uses a lot of time and is repetitive in nature. You can parallelize the portion that causes a time lag to help distribute the task and reduce the overall time.
I’ve done this type of parallelization before when I am running 1000 simulations. I submit the 1000 simulations as a batch job with 1000 tasks. In each task, I have to generate data for 100 subjects. This portion takes a long time time to run so I parallel::mclapply()
over the 100 subjects to generate the data in parallel. This example is more fitting for this type of parallelization.
After ssh
ing onto scisub run the following:
library(lme4)
library(parallel)
normal_example2 <- function(i, wd = '/project/taki3/amv/cluster/') {
message('Number of cores detected ', Sys.getenv('LSB_DJOB_NUMPROC'))
message(paste0('Sampling data for iter ', i))
# Randomly sample without replacement iris data
iris_sample = dplyr::sample_n(tbl = iris, size = 75, replace = FALSE)
message(paste0('Calculating model for iter ', i))
# Fit linear mixed effects model
model = lm(Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length, data = iris_sample)
# If the directory doesn't exist for job create it
if(dir.exists(paste0(wd, 'linear_models')) == FALSE){
dir.create(paste0(wd, 'linear_models'))
}
message(paste0('Saving model for iter ', i))
# Save each iteration of results
saveRDS(object = model, file = paste0(wd, "linear_models/model_norm1_session_", i, ".rds"))
message(paste0('Completed iter ', i))
# Return the fitted model
return(model)
}
# Set a master seed
set.seed(23)
# Create a vector of seeds for the normal batch job
i = as.numeric(Sys.getenv("LSB_JOBINDEX"))
# If i = 0 set it to 1
# Useful for when you're de-bugging code in an interactive session where i is set to 0
if(i == 0){
i = 1
}
# Create a vector of seeds for array job
iter = 100
seed_vec = sample(1:100000, iter, replace=F)
# Set the seed specific to this task
set.seed(seed_vec[i])
# Run the example function
parallel::mclapply(i, normal_example2, wd = '/project/taki3/amv/cluster/', mc.cores = as.numeric(Sys.getenv('LSB_DJOB_NUMPROC')))
bsub
bsub -q cceb_normal \
-n 2 \
-J "norm2[1-100]%20" \
-R "rusage[mem=500]" \
-M 1000 \
-o /project/taki3/amv/cluster/normal_session_examples/normal_example2/example2_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example2/normal_example2.R
This bsub
is exactly the same as above except now I’ve change the name to norm2
, the output and code to normal_example2
, and added n -2
to assign each task 2 cores.
The image below shows my ssh
onto the cluster, submitting the code with a bsub
, and then checking the jobs running with bjobs
. Notice, next to the jobs running there is 2*execution host. This indicates that each task is using 2 cores on that execution machine. This occurs because I request 2 cores for each task in my bsub
n -2
. The parallel::mclapply
then calls these 2 cores to run the i
th task within the R code. Again, this is not how you would apply this in practice since only a single task is being parallelized over 2 cores. In practice, the i
input should be a vector to be distributed and parallelized over (i.e. i = 1:n
).
This example runs the code provided below as 1 task in a batch job. In the task the code is set to also run in parallel using parallel::mclapply()
. This is a useful example if you have a set of code that you’d rather run in parallel using parallel::mclapply
but should submit it as a batch job so that the cluster can still allocate resources and regulate the job. The parallel::mclapply
used in this code is a classic example of parallelizing in a single task since i = 1:100
in the code.
After ssh
ing onto scisub run the following:
library(lme4)
library(parallel)
normal_example3 <- function(i, wd = '/project/taki3/amv/cluster/') {
message('Number of cores detected ', Sys.getenv('LSB_DJOB_NUMPROC'))
message(paste0('Sampling data for iter ', i))
# Randomly sample without replacement iris data
iris_sample = dplyr::sample_n(tbl = iris, size = 75, replace = FALSE)
message(paste0('Calculating model for iter ', i))
# Fit linear mixed effects model
model = lm(Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length, data = iris_sample)
# If the directory doesn't exist for job create it
if(dir.exists(paste0(wd, 'linear_models')) == FALSE){
dir.create(paste0(wd, 'linear_models'))
}
message(paste0('Saving model for iter ', i))
# Save each iteration of results
saveRDS(object = model, file = paste0(wd, "linear_models/model_norm1_session_", i, ".rds"))
message(paste0('Completed iter ', i))
# Return the fitted model
return(model)
}
# Set a master seed
set.seed(23)
# Run the example function 100 times
parallel::mclapply(1:100, normal_example3, wd = '/project/taki3/amv/cluster/', mc.cores = as.numeric(Sys.getenv('LSB_DJOB_NUMPROC')))
bsub
bsub -q cceb_normal \
-n 50 \
-J "norm3[1]%1" \
-R "rusage[mem=500]" \
-M 1000 \
-o /project/taki3/amv/cluster/normal_session_examples/normal_example3/example3_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example3/normal_example3.R
This bsub
is exactly the same as above except now I’ve change the name to norm3
, am running a single task in the batch job, the output and code to normal_example3
, and added n -50
to assign the task 50 cores.
The image below shows my ssh
onto the cluster, submitting the code with a bsub
, and then checking the jobs running with bjobs
. Notice, next to the jobs running there is cores*execution
host. This indicates that each task is using the specified number of cores on that execution machine. This occurs because I request 50 cores for the task in my bsub
n -50
. The parallel::mclapply
then calls these 50 cores to run the task within the R code. In this example, I did not use -R "span[hosts=1]"
so more than one machine is selected to execute. If I specified this option then you should see 50*one machine name
and all 50 cores would be from the same machine.
Example 1 is the most common way to parallelize code and submit a batch job. This simply runs one set of code over a number of different tasks.
Sometimes you may break up tasks to work in parallel inside of your batch job if you need more cores for each job. Example 2 shows how to request the bsub
if you want to call parallel::mclapply
inside of a batch task to run things in parallel through the tasks and in parallel within a task.
For simulations where you are changing parameters Example 1 still works. You could create a function with lots of parameters set in your R code. You can also call number seeds using the as.numeric(Sys.getenv("LSB_JOBINDEX"))
all to be sent as a batch job.
I use Example 3 over Example 1 or 2 when I need to submit lots of tasks (i.e. more than 5000) since this could over-whelm the cluster or when I really need a large amount of cores to be working on a single common task. This is common in big data, imaging, and genetics but less common for other statistical fields with “small” data.
The output files from each job are provided in the zipped folder I provided above. These are simple text files that report the output from the job. You will see here whether a job ran or errored. You can determine whether the error occurred because of a mal-formed bsub
or a bug in the R code. I always use the message
function in R to help output messages within the wrapper function so I gain some insight on where an error occurs within the code. It is also useful to determine how far along the code is.
I’ve copied and pasted sample output from a single task using normal_example1.txt
. The output contains very useful information about the tasks and full batch job. All the output from each task is dumped into the output file so you can check that everything ran without problems.
Sender: LSF System <jszostek@amber01>
Subject: Job 1724878[2]: <norm1[1-100]%20> in cluster <PMACS-SCC> Done
Job <norm1[1-100]%20> was submitted from host <scisub> by user <alval> in cluster <PMACS-SCC> at Thu Jul 25 11:23:52 2019
Job was executed on host(s) <amber01>, in queue <cceb_normal>, as user <alval> in cluster <PMACS-SCC> at Thu Jul 25 11:23:52 2019
</home/alval> was used as the home directory.
</home/alval> was used as the working directory.
Started at Thu Jul 25 11:23:52 2019
Terminated at Thu Jul 25 11:23:58 2019
Results reported at Thu Jul 25 11:23:58 2019
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example1/normal_example1.R
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 1.58 sec.
Max Memory : 146 MB
Average Memory : 146.00 MB
Total Requested Memory : 500.00 MB
Delta Memory : 354.00 MB
Max Swap : -
Max Processes : 3
Max Threads : 4
Run time : 5 sec.
Turnaround time : 6 sec.
The output (if any) follows:
Loading required package: Matrix
Number of cores detected 1
Sampling data for iter 2
Calculating model for iter 2
Saving model for iter 2
Completed iter 2
Call:
lm(formula = Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length,
data = iris_sample)
Coefficients:
(Intercept) Petal.Length Sepal.Width Sepal.Length
-0.4357 0.5462 0.2673 -0.2078
Let’s break down the output piece by piece below.
Sender: LSF System <jszostek@amber01>
Subject: Job 1724878[2]: <norm1[1-100]%20> in cluster <PMACS-SCC> Done
This portion of the output tells you that the task was submit to execution host amber01
. For whatever reason, it is always sent from jszostek@amber01
. We see this is the report for batch job ID 1724878
specifically the 2nd task [2]
(i.e. i = 2
in the code if you need to de-bug). The job was named norm
and tasks [1-100]
were submit allowing 20
tasks to run simultaneously. The Done
statement at the end reports that the task ran successfully. If it did not run successfully you would see exit
there. If you’re running thousands of tasks just a simple search in the document for exit
will let you know if everything ran or if some tasks failed.
Job <norm1[1-100]%20> was submitted from host <scisub> by user <alval> in cluster <PMACS-SCC> at Thu Jul 25 11:23:52 2019
Job was executed on host(s) <amber01>, in queue <cceb_normal>, as user <alval> in cluster <PMACS-SCC> at Thu Jul 25 11:23:52 2019
</home/alval> was used as the home directory.
</home/alval> was used as the working directory.
Started at Thu Jul 25 11:23:52 2019
Terminated at Thu Jul 25 11:23:58 2019
Results reported at Thu Jul 25 11:23:58 2019
This portion of the output is summary information. The output repeats the job information norm1[1-100]%20
and adds that it was submit through the scisub
host by user alval
with a time and date. I have access to 2 hosts through the PennSIVE working group the scisub
host as well as the takim
host. This portion is useful if after a few days I check my code and need to know which host I submit the code from. The output reports again that the code was executed on the execution host amber01
through the cceb_normal
queue. Again, this is useful if you have access to more than one queue. The working directory from which I submit the bsub
. This working directory is of course different than that specified in my actual R code. Lastly, the output reports the time the task started and ended.
------------------------------------------------------------
# LSBATCH: User input
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example1/normal_example1.R
------------------------------------------------------------
Successfully completed.
This portion of the output simply states was R script was run for the task and notes the task completed successfully. If this task failed due to a code error or bsub
error there would be a note here about exiting rather than success.
Resource usage summary:
CPU time : 1.58 sec.
Max Memory : 146 MB
Average Memory : 146.00 MB
Total Requested Memory : 500.00 MB
Delta Memory : 354.00 MB
Max Swap : -
Max Processes : 3
Max Threads : 4
Run time : 5 sec.
Turnaround time : 6 sec.
This portion of the output is where you can gain information related to submitting your bsub
with memory constraints. There is information about the memory provided in Max Memory
and Average Memory
. You can run the job once without any memory constraints and then use the information here to add educated constraints to you bsub
. This portion of the output also reports the Run time
in seconds. This will help you calculate about how much time a task will take and then the full batch job.
The output (if any) follows:
Loading required package: Matrix
Number of cores detected 1
Sampling data for iter 2
Calculating model for iter 2
Saving model for iter 2
Completed iter 2
Call:
lm(formula = Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length,
data = iris_sample)
Coefficients:
(Intercept) Petal.Length Sepal.Width Sepal.Length
-0.4357 0.5462 0.2673 -0.2078
The remaining portion of the output are messages reported from R. There are a few messages from loading packages (i.e. Loading required package: Matrix
) and then the messages that I forced throughout the function. Again, these are useful for debugging and to determine how your code progresses.
The output from tasks concatenates to the output file specified in the bsub
. If you run a batch job then change some code and re-run the file will contain output from both batch jobs. While working, I often like to save the output from the previous run in case I need it for reference but delete it before running the new batch job so the output is the same. In the R code I’m submitting I put the commented out bsub
I use for that code. Above this, I have two bash commands to copy the output file with a new name and then delete the output file. In this way, I retain the previous file under a new name and then delete the file so that the output after the batch job runs contains only information from the single job.
You’ll see this in the example code provided. An example from normal_example1.R
is provided below.
cp /project/taki3/amv/cluster/normal_session_examples/normal_example1/example1_output.txt /project/taki3/amv/cluster/normal_session_examples/normal_example1/example1_output_prev.txt
rm /project/taki3/amv/cluster/normal_session_examples/normal_example1/example1_output.txt
bsub -q cceb_normal \
-J "norm1[1-100]%20" \
-R "rusage[mem=500]" \
-M 1000 \
-o /project/taki3/amv/cluster/normal_session_examples/normal_example1/example1_output.txt \
Rscript /project/taki3/amv/cluster/normal_session_examples/normal_example1/normal_example1.R
This post covered a lot of detail related to batch jobs. This is an advanced topic in utilizing the cluster and if you are planning on running a batch job and don’t feel you fully understand you should reach out to the PMACs system folks, your advisor, and/or other students. We covered some basic bsub
commands that are useful for most batch job submissions. The possibilities with batch jobs are endless in terms of structure of the jobs and how you utilize the resource but three common examples were covered here with R scripts, output, and the bsub
. Lastly, we went through the output returned from an example and how to maximize output as a resource to understand how your job went.
In the next post, I’ll be covering a few LSF commands that will help you quality control running jobs.