Working with File Formats: Read and Write Operations
Alon Alexander
2026-01-21
file-formats.RmdIntroduction
The pm package provides seamless read and write
operations for various file formats through the PMData
class and its read() and write() methods. This
vignette demonstrates how to work with different file types using
get_output_path() and get_artifact().
Supported File Types
The package supports several file formats organized by type:
Table Formats
-
CSV (
.csv) - Comma-separated values -
TSV (
.tsv) - Tab-separated values
-
Parquet (
.parquet,.pqt) - Apache Parquet format (requiresarrowpackage)
Using get_output_path() with Type Specification
The get_output_path() method allows you to specify a
type, and the package will automatically choose an appropriate file
extension:
# Create a temporary project for demonstration
project_dir <- file.path(tempdir(), "file_formats_demo")
pm <- pm_create_project(project_dir)
analysis <- pm$create_analysis("demo")Writing Tables
When you specify type = "table", the default format is
Parquet, but you can also use CSV or TSV:
# Create sample data
sample_data <- data.frame(
id = 1:10,
value = rnorm(10),
category = letters[1:10]
)
# Write as table (defaults to .parquet)
parquet_output <- analysis$get_output_path("table_parquet", type = "table")
parquet_output$write(sample_data)
cat("Written to:", basename(parquet_output$path), "\n")
#> Written to: table_parquet.parquet
# Write as CSV explicitly (no need for type when extension is provided)
csv_output <- analysis$get_output_path("table_csv.csv")
csv_output$write(sample_data)
cat("Written to:", basename(csv_output$path), "\n")
#> Written to: table_csv.csv
# Write as TSV (if both type and extension are given, they are validated to be matching)
tsv_output <- analysis$get_output_path("table_tsv.tsv", type = "table")
tsv_output$write(sample_data)
cat("Written to:", basename(tsv_output$path), "\n")
#> Written to: table_tsv.tsvWriting Objects
For R objects, use type = "object":
# Create a complex object
my_model <- list(
coefficients = c(intercept = 1.5, slope = 2.3),
data = sample_data,
metadata = list(created = Sys.Date(), version = "1.0")
)
# Write as RDS (single object)
rds_output <- analysis$get_output_path("model", type = "object")
rds_output$write(my_model)
cat("Written to:", basename(rds_output$path), "\n")
#> Written to: model.rdata
# Write as RData (can contain multiple objects)
rdata_output <- analysis$get_output_path("model_rdata.RData", type = "object")
rdata_output$write(my_model, sample_data, metadata = list(version = "1.0"))
cat("Written to:", basename(rdata_output$path), "\n")
#> Written to: model_rdata.RDataReading Files with get_artifact()
The get_artifact() method finds files by their ID
(filename without extension), and read() automatically
detects the format:
# Read the Parquet table (automatically detects Parquet format)
parquet_data <- analysis$get_artifact("table_parquet")$read()
head(parquet_data)
#> id value category
#> 1 1 -1.400043517 a
#> 2 2 0.255317055 b
#> 3 3 -2.437263611 c
#> 4 4 -0.005571287 d
#> 5 5 0.621552721 e
#> 6 6 1.148411606 f
# Read the CSV (automatically detects CSV format)
csv_data <- analysis$get_artifact("table_csv")$read()
head(csv_data)
#> id value category
#> 1 1 -1.400043517 a
#> 2 2 0.255317055 b
#> 3 3 -2.437263611 c
#> 4 4 -0.005571287 d
#> 5 5 0.621552721 e
#> 6 6 1.148411606 f
# Read the TSV (automatically detects TSV format)
tsv_data <- analysis$get_artifact("table_tsv")$read()
head(tsv_data)
#> id value category
#> 1 1 -1.400043517 a
#> 2 2 0.255317055 b
#> 3 3 -2.437263611 c
#> 4 4 -0.005571287 d
#> 5 5 0.621552721 e
#> 6 6 1.148411606 f
# Read the RDS object (automatically detects RDS format)
model <- analysis$get_artifact("model")$read()
str(model, max.level = 2)
#> <environment: 0x10fc11828>
# Read RData (returns an environment)
rdata_env <- analysis$get_artifact("model_rdata")$read()
cat("Objects in RData file:", paste(names(rdata_env), collapse = ", "), "\n")
#> Objects in RData file: metadata, my_model, sample_dataThe Power of Type-Agnostic Reading
One of the key benefits is that you don’t need to know the file format when reading:
# Get artifact by ID - no need to know if it's .csv, .parquet, or .tsv
artifact <- analysis$get_artifact("table_csv")
# Read it - format is automatically detected
data <- artifact$read()
# Works regardless of the actual file format!
cat("Successfully read", nrow(data), "rows\n")
#> Successfully read 10 rows
# Same works for Parquet or TSV - just change the ID
parquet_artifact <- analysis$get_artifact("table_parquet")
parquet_data <- parquet_artifact$read()
cat("Parquet file also read successfully:", nrow(parquet_data), "rows\n")
#> Parquet file also read successfully: 10 rowsThis means: - You can change file formats (e.g., from CSV to Parquet) without changing your reading code - You don’t need to remember file extensions - just use the ID - The package handles format detection automatically
Working with Intermediate Files
You can also use intermediate = TRUE to save files in
the intermediate folder:
# Save intermediate result
intermediate_output <- analysis$get_output_path(
"temp_calculation",
type = "table",
intermediate = TRUE
)
intermediate_output$write(sample_data)
cat("Intermediate file saved to:", basename(intermediate_output$path), "\n")
#> Intermediate file saved to: temp_calculation.parquet
# Note: get_artifact() only searches outputs/, not intermediate/
# To access intermediate files, use get_intermediate_artifact() or list_outputs(intermediate = TRUE)
intermediates <- analysis$list_outputs(intermediate = TRUE)
cat("Intermediate files:", paste(sapply(intermediates, function(x) x$id), collapse = ", "), "\n")
#> Intermediate files: temp_calculationComplete Workflow Example
Here’s a complete example showing the workflow:
# 1. Create output path with type specification
output <- analysis$get_output_path("analysis_results", type = "table")
# 2. Write data (format determined by extension/type)
processed_data <- data.frame(
x = 1:5,
y = 2:6,
z = 3:7
)
output$write(processed_data)
# 3. Later, retrieve and read (no need to know format,
# but can pass parameters for specific needs)
artifact <- analysis$get_artifact("analysis_results")
reloaded_data <- artifact$read(as_tibble=TRUE)
# 4. Verify it worked
identical(processed_data, as.data.frame(reloaded_data))
#> [1] TRUESummary
- Use
get_output_path(id, type = "table")to get paths with appropriate extensions - Use
get_output_path(id, type = "object")for R objects - The
write()method automatically handles different formats - The
read()method automatically detects file formats - You don’t need to know file extensions when using
get_artifact()andread() - This makes your code format-agnostic and easier to maintain
For more details on project workflows, see the Getting Started vignette.