sakura.utils.data_splitter.DataSplitter.get_incremental_train_test_split

DataSplitter.get_incremental_train_test_split(base: ndarray, k: int) → dict

Obtain 2 split codes from label vector, points labelled from 1~k are considered as train (1 in first vector), rest of selected (non-zero) cells are test(1 in second vector), unselected points remain unchanged (0 in all vectors).

Useful when planning to increase points in supervision incrementally. (e.g. select 30% cells with known certain known labels)

Parameters:

base (np.ndarray[base.dtype, np.integer]) – The predefined label vector to work with
k (int) – Selection threshold k for the base label vector input

Returns:

A dictionary with “train” and “test” split codes

Return type:

dict[str, np.ndarray]