scallops.features.normalize.normalize_features

scallops.features.normalize.normalize_features(data, reference_query=None, by=None, normalize='zscore', n_neighbors=100, neighbors_metric='minkowski', robust=False, mad_scale='normal', max_value=None, centering=True, scaling=True, batch_size=None, centroid_column_names=('Nuclei_AreaShape_Center_Y', 'Nuclei_AreaShape_Center_X'))

Normalize features

Parameters:
  • data (AnnData) – Annotated data matrix.

  • reference_query (str | None) – Query to extract reference observations (e.g. “gene_symbol==’NTC’”)

  • by (Sequence[str] | str | None) – Column(s) in data.obs to stratify by.

  • normalize (Literal['zscore', 'local-zscore', 'nn-zscore']) – Normalization method to use where local uses nearest neighbors by location and nn uses nearest neighbors by neighbors_metric.

  • n_neighbors (int | None) – Number of neighbors for local and nearest neighbor zscore.

  • neighbors_metric (str) – Nearest neighbor metric to use when normalize is nn-zscore.

  • robust (bool) – Use robust statistics.

  • mad_scale (float | str) – Numerical scale factor to divide median absolute deviation. The string “normal” is also accepted, and results in scale being the inverse of the standard normal quantile function at 0.75

  • centering (bool) – Whether to center the data before scaling.

  • max_value (float | None) – Truncate to this value after scaling

  • scaling (bool) – Whether to scale the data by dividing by the standard deviation.

  • batch_size (int | None) – Batch size to use for local scaling to conserve memory.

  • centroid_column_names (tuple[str, str]) – Columns for y and x centroids to use for local zscore.

Returns:

Normalized data

Return type:

AnnData