scallops.visualize.distribution.cdf_plot
- scallops.visualize.distribution.cdf_plot(df, feature, targets, groupby_column='gene_symbol_0', hue=None, reference_group='NTC', line_width=None, col=None, col_order=None, reference_color='grey', shade=None, height=8, include_n=True, palette=None)
A utility function to create Cumulative Distribution Function (CDF) plots comparing a set of conditions against a reference condition. The CDF plots are shaded to highlight the differences between the reference and target groups.
- Parameters:
df (DataFrame) – DataFrame containing the experimental data.
feature (str) – Column representing the feature for CDF plotting.
targets (Sequence[str] | None) – Target values in groupby_column to plot.
groupby_column (str) – Column used for grouping the data.
reference_group (str | None) – Reference group for comparison.
line_width (int | None) – Width of the CDF plot lines.
col (str | None) – Variable that defines subsets to plot on different columns
col_order (Sequence[Any] | None) – Specify the order column order for categorical levels of col.
hue (str | None) – Variable used to color CDF plots (useful for showing individual guides).
reference_color (str) – Color for the reference group.
shade (bool | None) – Shade the area between target and reference.
height (int) – Figure height.
include_n (bool) – Include the sample size per target in legend.
palette (list[tuple[int, int, int]] | None) – Colors to use when mapping the hue semantic.
- Returns:
The axes.
- Example:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from scallops.visualize.distribution import cdf_plot # 1. Generate synthetic data np.random.seed(42) data = [] groups = { "NTC": {"loc": 5, "scale": 1.5, "size": 500}, "GeneA": {"loc": 8, "scale": 1.5, "size": 500}, "GeneB": {"loc": 5, "scale": 2.5, "size": 500}, "GeneC": {"loc": 3, "scale": 1.0, "size": 400}, } for group, params in groups.items(): values = np.random.normal(**params) df_group = pd.DataFrame({"gene": group, "intensity": values}) data.append(df_group) synthetic_df = pd.concat(data, ignore_index=True) # 2. Call the cdf_plot function axes = cdf_plot( df=synthetic_df, groupby_column="gene", reference_group="NTC", feature="intensity", ) plt.show()
- Return type:
Sequence[Axes]