sakura.utils.data_splitter.DataSplitter

class sakura.utils.data_splitter.DataSplitter

Bases: object

Class for creating dataset splits using label-based grouping

Methods

auto_random_k_bin_labelling

Obtain a label vector containing 1~k for included points, 0 for not included points.

auto_random_k_fold_cv_split

Obtain 2*k split codes based on random K-Fold split, where train or test are labelled as 1 (corresp.).

auto_random_stratified_k_bin_labelling

(to be implemented)

get_incremental_select_unselect_split

Obtain a split code from label vector, points labelled from 1~k are considered as selected (1), otherwise not selected (0).

get_incremental_train_test_split

Obtain 2 split codes from label vector, points labelled from 1~k are considered as train (1 in first vector), rest of selected (non-zero) cells are test(1 in second vector), unselected points remain unchanged (0 in all vectors).

get_k_fold_cv_split

Obtain cross validation foldings directly from 1~k labels, 0 considered to be not selected