DataScienceOperations.jl

A Julia companion package for the DSO CLI, providing project and stage management, YAML parameter loading, and configuration utilities.

DataScienceOperations.DataScienceOperations โ€” Module
module DataScienceOperations

DataScienceOperations.jl is a Julia companion package for the DSO CLI, providing utilities for project and stage management, configuration handling, and robust parameter loading from YAML files. It is designed to integrate seamlessly with DSO-based workflows, offering ergonomic access to project roots, stage directories, and configuration parameters.

Main features:

  • Project root and stage path resolution
  • Flexible parameter loading and access via DsoParams
  • Utility functions for path and environment management
  • Integration with the DSO command-line interface

Intended for users who need to manage complex project structures and configurations in Julia, especially in conjunction with the DSO CLI.

source

DSO, is a command line helper for building reproducible data analysis projects on top of dvc. To learn more about dso, please refer to the dso documentation. DataScienceOperations.jl is the Julia companion package for dso. The purpose of this package is to provide access to files and configuration organized in a dso project.

Installation

pkg> add https://github.com/SMLMS/DataScienceOperations.jl

๐Ÿ› ๏ธ Usage

Flags & Options

  • here([rel_path]): Returns the project root or a subpath.
  • stage_here([rel_path]): Returns the absolute path to the current stage or a subpath.
  • set_stage(stage): Sets the current stage directory.
  • read_params([stage_path]; return_list=false): Loads parameters from YAML for a stage.

Examples

DSO.jl provides convenient access to stage parameters from Julia scripts or notebooks. Using read_params the params.yaml file of the specified stage is compiled and loaded into a dictionary. The path must be specified relative to the project root โ€“ this ensures that the correct stage is found irrespective of the current working directory, as long as it the project root or any subdirectory thereof. Only parameters that are declared as params, dep, or output in dvc.yaml are loaded to ensure that one does not forget to keep the dvc.yaml updated.

using DataScienceOperations

params_obj = read_params("subfolder/my_stage")


# Access parameters
params_obj.thresholds
params_obj.samplesheet

# get parameter keys
get_keys(params_obj)

By default, DSO compiles paths in configuration files to paths relative to each stage (see configuration). From Julia, you can use stagehere to resolve paths relative to the current stage independent of your current working directory. This works, because readparams has stored the path of the current stage in a configuration object that persists in the current Julia session. stage_here can use this information to resolve relative paths.

# Get project root
root = here()

# Set stage
set_stage("analysis")

# Get stage path
stage_path = stage_here()

Creating a stage within the Julia environment can be performed using create and supplying it with the relative path of the stage from project root and a description.

create(
    "stage",
    dir="path/to/dir",
    name="AwesomeProject",
    description = "Some amazing analysis"
)

Requirements

  • Julia โ‰ฅ 1.10.10
  • Dependencies: Dates, FilePathsBase, YAML, TOML

๐Ÿงช Testing

To verify the installation run:

pkg> test

๐Ÿ™ Further reading

More information about the DSO project as well as an R-companion can be found here:

๐Ÿ“š Documentation

Check out the Docs for the full API reference.

โš–๏ธ License

MIT ยฉ Sebastian Malkusch