sakura.utils.data_transformations.ToKBins

class sakura.utils.data_transformations.ToKBins

Bases: object

Callable class to discretize continuous data into intervals using sklearn KBinsDiscretizer with binning strategies

Useful for preprocessing continuous phenotypes into categorical representations.

Parameters:
  • sample (array-like) – Input data of shape (n_samples, n_features) containing continuous features

  • n_bins (int or array-like of shape (n_features,), optional) – The number of bins for all features or each feature to produce, defaults to 2 for all

  • encode (Literal['ordinal', 'onehot', 'onehot-dense'], optional) – Method used to encode the transformed result

  • strategy* – Strategy used to define the widths of the bins

Returns:

Transformed K-bins discretized data

Return type:

numpy.ndarray or scipy.sparse matrix

Note

<strategy>: ‘kmeans’ strategy may produce irregular bin widths depending on data distribution.

Options:

Encoding method for transformed bins:
  • ‘ordinal’: Integer representation (0 to n_bins-1)

  • ‘onehot’: Sparse matrix one-hot encoding

  • ‘onehot-dense’: Dense array one-hot encoding

Binning strategy:
  • ‘quantile’: Equal-frequency bins

  • ‘uniform’: Equal-width bins

  • ‘kmeans’: Clustering-based bin edges

Methods