scallops.visualize.distribution.cdf_plot

scallops.visualize.distribution.cdf_plot(df, feature, targets, groupby_column='gene_symbol_0', hue=None, reference_group='NTC', line_width=None, col=None, col_order=None, reference_color='grey', shade=None, height=8, include_n=True)

A utility function to create Cumulative Distribution Function (CDF) plots comparing a set of conditions against a reference condition. The CDF plots are shaded to highlight the differences between the reference and target groups.

Parameters:

df (DataFrame) – DataFrame containing the experimental data.
feature (str) – Column representing the feature for CDF plotting.
targets (Sequence[str] | None) – Target values in groupby_column to plot.
groupby_column (str) – Column used for grouping the data.
reference_group (str | None) – Reference group for comparison.
line_width (int | None) – Width of the CDF plot lines.
col (str | None) – Variable that defines subsets to plot on different columns
col_order (Sequence[Any] | None) – Specify the order column order for categorical levels of col.
hue (str | None) – Variable used to color CDF plots (useful for showing individual guides).
reference_color (str) – Color for the reference group.
shade (bool | None) – Shade the area between target and reference.
height (int) – Figure height.
include_n (bool) – Include the sample size per target in legend.

Returns:

The axes.

Example:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scallops.visualize.distribution import cdf_plot

# 1. Generate synthetic data
np.random.seed(42)
data = []
groups = {
    "NTC": {"loc": 5, "scale": 1.5, "size": 500},
    "GeneA": {"loc": 8, "scale": 1.5, "size": 500},
    "GeneB": {"loc": 5, "scale": 2.5, "size": 500},
    "GeneC": {"loc": 3, "scale": 1.0, "size": 400},
}
for group, params in groups.items():
    values = np.random.normal(**params)
    df_group = pd.DataFrame({"gene": group, "intensity": values})
    data.append(df_group)
synthetic_df = pd.concat(data, ignore_index=True)

# 2. Call the cdf_plot function
axes = cdf_plot(
    df=synthetic_df,
    groupby_column="gene",
    reference_group="NTC",
    feature="intensity",
)
plt.show()

Return type:

Sequence[Axes]