Running Analyses with SLURM
pm package
2026-03-05
slurm-integration.RmdThe pm package provides seamless integration with SLURM
(Simple Linux Utility for Resource Management) for running
computationally intensive analyses on HPC clusters. This vignette
demonstrates how to use the run_in_slurm() method to submit
R functions as SLURM batch jobs.
Overview
The SLURM integration in pm allows you to:
- Submit R functions as non-blocking SLURM batch jobs
- Check job status and completion
- Retrieve results automatically when jobs complete
- Configure SLURM resources (time, memory, CPUs, modules, etc.)
- Handle job failures and timeouts gracefully
Basic Usage
Simple Function Execution
The simplest use case is running a function that returns a result:
library(pm)
# Create or load a project
pm <- pm_project("path/to/your/project")
analysis <- pm$create_analysis("my_analysis")
# Define a function to run in SLURM
compute_result <- function() {
# Your computation here
result <- sum(1:1000)
return(result)
}
# Submit the job (non-blocking)
slurm_run <- analysis$run_in_slurm(compute_result)
# Check if job is done
slurm_run$is_done()
# Get results when ready
results <- slurm_run$get_results()Passing Arguments to Functions
You can pass arguments to your function using ...:
# Function that takes arguments
process_data <- function(data, multiplier) {
return(data * multiplier)
}
# Submit with arguments
slurm_run <- analysis$run_in_slurm(
process_data,
data = 42,
multiplier = 2
)
# Get results
results <- slurm_run$get_results()Using Positional Arguments
Positional arguments are also supported:
add_numbers <- function(x, y) {
return(x + y)
}
slurm_run <- analysis$run_in_slurm(add_numbers, 10, 20)
results <- slurm_run$get_results(timeout = 100) # Returns 30SLURM Configuration
Basic Resource Configuration
Configure SLURM resources using the config
parameter:
slurm_run <- analysis$run_in_slurm(
my_function,
config = list(
job_name = "my_analysis_job",
time_limit = "02:00:00", # 2 hours
memory = "8G", # 8 GB RAM
cpus = 4 # 4 CPUs
)
)Available Configuration Options
-
job_name: Name for the SLURM job (default: based on result_id) -
time_limit: Time limit in format “HH:MM:SS” (default: “01:00:00”) -
memory: Memory requirement, e.g., “8G”, “16G” (default: “32G”) -
cpus: Number of CPUs (default: 1) -
modules: Character vector of modules to load (default: NULL) -
slurm_flags: Additional SLURM flags as a string (default: ““)
Job Management
Blocking Result Retrieval with Timeout
You can wait for results with a timeout:
# Wait up to 5 minutes for results
results <- slurm_run$get_results(timeout = 300)
# If timeout is exceeded, an error is raised
tryCatch({
results <- slurm_run$get_results(timeout = 60)
}, error = function(e) {
cat("Timeout exceeded:", conditionMessage(e), "\n")
})Result Management
Error Handling
Handling Job Failures
When a job fails, get_results() will raise an error with
details:
tryCatch({
results <- slurm_run$get_results()
}, error = function(e) {
cat("Job failed:", conditionMessage(e), "\n")
# Error message includes job ID and log file location
})Checking Log Files
Log files are automatically created in the analysis’s
logs/ directory:
# Log file paths
cat("Output log:", slurm_run$log_path, "\n")
cat("Error log:", slurm_run$error_log_path, "\n")
# Read log files if needed
if (file.exists(slurm_run$log_path)) {
log_content <- readLines(slurm_run$log_path)
cat("Output log content:\n")
cat(paste(log_content, collapse = "\n"), "\n")
}Complete Example
Here’s a complete example combining all features:
library(pm)
# Setup
pm <- pm_project("path/to/project")
analysis <- pm$create_analysis("slurm_analysis")
# Define a computationally intensive function
heavy_computation <- function(n, seed = 42) {
set.seed(seed)
# Simulate heavy computation
result <- list(
data = rnorm(n),
summary = list(
mean = mean(rnorm(n)),
sd = sd(rnorm(n)),
n = n
),
timestamp = Sys.time()
)
return(result)
}
# Submit with configuration
slurm_run <- analysis$run_in_slurm(
heavy_computation,
n = 1000000,
seed = 123,
result_id = "large_simulation",
config = list(
job_name = "large_sim",
time_limit = "04:00:00",
memory = "16G",
cpus = 8,
modules = c("R/4.3.0")
)
)
# Monitor job
cat("Job submitted with ID:", slurm_run$job_id, "\n")
cat("Log file:", slurm_run$log_path, "\n")
# Wait for completion (with timeout)
results <- slurm_run$get_results(timeout = 3600) # 1 hour timeout
# Use results
cat("Computation completed!\n")
cat("Mean:", results$summary$mean, "\n")
cat("SD:", results$summary$sd, "\n")Best Practices
- Always check job status before retrieving results to avoid errors
- Use appropriate timeouts based on expected job duration
- Specify sufficient resources in config to avoid job failures
- Use custom result IDs for better organization of intermediate results
- Check log files when jobs fail to diagnose issues
- Cancel long-running jobs if they’re no longer needed
Controlling workspace images with store_image
When you call run_in_slurm(), the analysis can save an R
workspace image that is loaded on the SLURM worker before your function
runs. By default, the entire calling environment is saved, which can be
wasteful if you have large intermediate objects in memory that the job
does not actually need.
You can use the store_image config option to save only
specific objects. This keeps the job submission lightweight while still
making required objects available on the SLURM node:
library(pm)
pm <- pm_project("path/to/project")
analysis <- pm$create_analysis("slurm_analysis")
# Suppose your current R session has some large objects
big_raw_data <- readRDS("big-raw-data.rds") # hundreds of MB
precomputed_lookup <- readRDS("lookup-small.rds") # small, needed on the worker
heavy_computation <- function(n) {
# 'precomputed_lookup' will be available on the SLURM worker
sample(precomputed_lookup, n, replace = TRUE)
}
slurm_run <- analysis$run_in_slurm(
heavy_computation,
n = 100000,
config = list(
time_limit = "02:00:00",
memory = "16G",
# Only save and load selected objects instead of the full workspace
store_image = c("precomputed_lookup")
)
)In this example, big_raw_data stays in your interactive
session but is not serialized and sent to the SLURM job, while
precomputed_lookup is saved and restored for use in
heavy_computation(). This pattern is especially useful when
your workspace contains large simulation results or raw data that are
not needed by the job itself.
Technical Details
How It Works
- Job Submission: The function and its arguments are serialized to RDS files in a temporary directory
- SLURM Script: A generic SLURM batch script is created with environment variables for configuration
- R Script Execution: SLURM runs a generic R script that loads the function and arguments, executes them, and saves results
-
Result Storage: Results are saved as RDS files in
the analysis’s
intermediate/directory -
Status Checking: Job status is checked using
squeueand log file analysis
File Structure
When you submit a SLURM job, the following files are created:
- Temporary directory (invisible to user): Contains function, arguments, and SLURM scripts
-
Log files:
logs/slurm-{job_id}.outandlogs/slurm-{job_id}.err -
Result file:
intermediate/{result_id}.rds
SLURM Availability
The package checks for SLURM availability before submission:
# Check if SLURM is available
if (pm::is_slurm_available()) {
cat("SLURM is available\n")
} else {
cat("SLURM is not available on this system\n")
}If SLURM is not available, run_in_slurm() will raise an
error.
Troubleshooting
Job Never Completes
If a job appears to hang:
- Check the job status with
squeue -j {job_id} - Check log files for errors
- Verify SLURM configuration (time, memory, etc.)
- Consider canceling and resubmitting with different resources
See Also
-
?PMAnalysisfor more information on the analysis object -
?PMSlurmRunfor details on job management methods -
?is_slurm_availablefor checking SLURM availability