sakura.utils.data_transformations.ToKBins

class sakura.utils.data_transformations.ToKBins

Callable class to discretize continuous data into intervals using sklearn KBinsDiscretizer with binning strategies

Useful for preprocessing continuous phenotypes into categorical representations.

Parameters:

sample (array-like) – Input data of shape (n_samples, n_features) containing continuous features
n_bins (int or array-like of shape (n_features,), optional) – The number of bins for all features or each feature to produce, defaults to 2 for all
encode (Literal['ordinal', 'onehot', 'onehot-dense'], optional) – Method used to encode the transformed result
strategy* – Strategy used to define the widths of the bins

Returns:

Transformed K-bins discretized data

Return type:

numpy.ndarray or scipy.sparse matrix

Note

<strategy>: ‘kmeans’ strategy may produce irregular bin widths depending on data distribution.

Options:

Encoding method for transformed bins:

Binning strategy:

Methods