sakura.utils.data_transformations.ToKBins
- class sakura.utils.data_transformations.ToKBins
Bases:
objectCallable class to discretize continuous data into intervals using sklearn KBinsDiscretizer with binning strategies
Useful for preprocessing continuous phenotypes into categorical representations.
- Parameters:
sample (array-like) – Input data of shape (n_samples, n_features) containing continuous features
n_bins (int or array-like of shape (n_features,), optional) – The number of bins for all features or each feature to produce, defaults to 2 for all
encode (Literal['ordinal', 'onehot', 'onehot-dense'], optional) – Method used to encode the transformed result
strategy* – Strategy used to define the widths of the bins
- Returns:
Transformed K-bins discretized data
- Return type:
numpy.ndarray or scipy.sparse matrix
Note
<strategy>: ‘kmeans’ strategy may produce irregular bin widths depending on data distribution.
Options:
- Encoding method for transformed bins:
‘ordinal’: Integer representation (0 to n_bins-1)
‘onehot’: Sparse matrix one-hot encoding
‘onehot-dense’: Dense array one-hot encoding
- Binning strategy:
‘quantile’: Equal-frequency bins
‘uniform’: Equal-width bins
‘kmeans’: Clustering-based bin edges
Methods