DataScienceOperations.jl
A Julia companion package for the DSO CLI, providing project and stage management, YAML parameter loading, and configuration utilities.
DataScienceOperations.DataScienceOperations โ Module
module DataScienceOperationsDataScienceOperations.jl is a Julia companion package for the DSO CLI, providing utilities for project and stage management, configuration handling, and robust parameter loading from YAML files. It is designed to integrate seamlessly with DSO-based workflows, offering ergonomic access to project roots, stage directories, and configuration parameters.
Main features:
- Project root and stage path resolution
- Flexible parameter loading and access via
DsoParams - Utility functions for path and environment management
- Integration with the DSO command-line interface
Intended for users who need to manage complex project structures and configurations in Julia, especially in conjunction with the DSO CLI.
DSO, is a command line helper for building reproducible data analysis projects on top of dvc. To learn more about dso, please refer to the dso documentation. DataScienceOperations.jl is the Julia companion package for dso. The purpose of this package is to provide access to files and configuration organized in a dso project.
Installation
pkg> add https://github.com/SMLMS/DataScienceOperations.jl๐ ๏ธ Usage
Flags & Options
here([rel_path]): Returns the project root or a subpath.stage_here([rel_path]): Returns the absolute path to the current stage or a subpath.set_stage(stage): Sets the current stage directory.read_params([stage_path]; return_list=false): Loads parameters from YAML for a stage.
Examples
DSO.jl provides convenient access to stage parameters from Julia scripts or notebooks. Using read_params the params.yaml file of the specified stage is compiled and loaded into a dictionary. The path must be specified relative to the project root โ this ensures that the correct stage is found irrespective of the current working directory, as long as it the project root or any subdirectory thereof. Only parameters that are declared as params, dep, or output in dvc.yaml are loaded to ensure that one does not forget to keep the dvc.yaml updated.
using DataScienceOperations
params_obj = read_params("subfolder/my_stage")
# Access parameters
params_obj.thresholds
params_obj.samplesheet
# get parameter keys
get_keys(params_obj)By default, DSO compiles paths in configuration files to paths relative to each stage (see configuration). From Julia, you can use stagehere to resolve paths relative to the current stage independent of your current working directory. This works, because readparams has stored the path of the current stage in a configuration object that persists in the current Julia session. stage_here can use this information to resolve relative paths.
# Get project root
root = here()
# Set stage
set_stage("analysis")
# Get stage path
stage_path = stage_here()Creating a stage within the Julia environment can be performed using create and supplying it with the relative path of the stage from project root and a description.
create(
"stage",
dir="path/to/dir",
name="AwesomeProject",
description = "Some amazing analysis"
)Requirements
- Julia โฅ 1.10.10
- Dependencies: Dates, FilePathsBase, YAML, TOML
๐งช Testing
To verify the installation run:
pkg> test๐ Further reading
More information about the DSO project as well as an R-companion can be found here:
๐ Documentation
Check out the Docs for the full API reference.
โ๏ธ License
MIT ยฉ Sebastian Malkusch