scallops.visualize.distribution.cdf_plot

scallops.visualize.distribution.cdf_plot(df, feature, targets, groupby_column='gene_symbol_0', hue=None, reference_group='NTC', line_width=None, col=None, col_order=None, reference_color='grey', shade=None, height=8, include_n=True)

A utility function to create Cumulative Distribution Function (CDF) plots comparing a set of conditions against a reference condition. The CDF plots are shaded to highlight the differences between the reference and target groups.

Parameters:
  • df (DataFrame) – DataFrame containing the experimental data.

  • feature (str) – Column representing the feature for CDF plotting.

  • targets (Sequence[str] | None) – Target values in groupby_column to plot.

  • groupby_column (str) – Column used for grouping the data.

  • reference_group (str | None) – Reference group for comparison.

  • line_width (int | None) – Width of the CDF plot lines.

  • col (str | None) – Variable that defines subsets to plot on different columns

  • col_order (Sequence[Any] | None) – Specify the order column order for categorical levels of col.

  • hue (str | None) – Variable used to color CDF plots (useful for showing individual guides).

  • reference_color (str) – Color for the reference group.

  • shade (bool | None) – Shade the area between target and reference.

  • height (int) – Figure height.

  • include_n (bool) – Include the sample size per target in legend.

Returns:

The axes.

Example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scallops.visualize.distribution import cdf_plot

# 1. Generate synthetic data
np.random.seed(42)
data = []
groups = {
    "NTC": {"loc": 5, "scale": 1.5, "size": 500},
    "GeneA": {"loc": 8, "scale": 1.5, "size": 500},
    "GeneB": {"loc": 5, "scale": 2.5, "size": 500},
    "GeneC": {"loc": 3, "scale": 1.0, "size": 400},
}
for group, params in groups.items():
    values = np.random.normal(**params)
    df_group = pd.DataFrame({"gene": group, "intensity": values})
    data.append(df_group)
synthetic_df = pd.concat(data, ignore_index=True)

# 2. Call the cdf_plot function
axes = cdf_plot(
    df=synthetic_df,
    groupby_column="gene",
    reference_group="NTC",
    feature="intensity",
)
plt.show()
Return type:

Sequence[Axes]