scallops.visualize.distribution.volcano_plot
- scallops.visualize.distribution.volcano_plot(df, effect_size_col='∆ AUC', ycol='-log10FDR', fdr_col='FDR', title=None, highlight=None, highlight_col=None, star=None, top_n=None, bottom_n=None, ax=None, vbar_std=None, hbar_value=None, magnitude_std=None, legend=None, xlim=None, ylim=None, logy=False, alpha=0.05, time_lim=5.0, **kwargs)
Generate a volcano plot to visualize differential expression.
- Parameters:
df (DataFrame) – The input DataFrame containing the data for the volcano plot.
effect_size_col (str) – The column name representing the effect size (X-axis values).
ycol (str) – The column name representing the y-axis values.
fdr_col (str) – The column name representing the false discovery rate (FDR).
title (str | None) – The title of the volcano plot.
highlight (dict[str, Sequence[str]] | Literal['all', 'up', 'down'] | str | None) – A dictionary specifying columns and values to highlight in the plot. Alternatively, ‘all’ (highlight all up and down points), ‘up’ (only highlight up points) or down (only highlight down points). If a different string it assumes dataframe query. Default is None (do not highlight points).
highlight_col (str | None) – Name of a column in df to draw the annotations from. It is required if highlight is ‘all’, ‘up’ or ‘down’.
star (Sequence[str] | None) – Star a set of groups (from highlight_col), regardless of significance
top_n (int | None) – The number of top genes to highlight.
bottom_n (int | None) – The number of bottom genes to highlight.
ax (Axes) – Matplotlib Axes to plot on. If None, a new figure and axes will be created.
vbar_std (int | float | None) – The standard deviation multiplier for vertical bars indicating significance. If None, vertical bars will not be plotted.
hbar_value (int | float | None) – The horizontal line value for indicating significance. If None, no horizontal line will be plotted.
magnitude_std (float | None) – Magnitude of the standard deviation for indicating significance.
legend (tuple[str, str] | None) – A tuple specifying legend labels for up-regulated and down-regulated genes. If None, no legend will be displayed.
ylim (tuple[float, float] | None) – Tuples of floats specifying the lower and upper limits for x and y axes. Default is None (range of x and y axes)
logy (bool) – If True, apply a logarithmic scale to the y-axis.
alpha (float) – Significance level threshold for highlighting points.
time_lim (float) – Time limit for text adjustment (seconds)
kwargs (dict) – Additional keyword arguments to pass to Matplotlib subplots().
xlim (tuple[float, float] | None)
ylim
- Returns:
Matplotlib Axes containing the volcano plot.
- Raises:
ValueError – If the specified columns (effect_size_col or ycol) are not present in the DataFrame.
- Example:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from scallops.visualize import volcano_plot # Generate sample data np.random.seed(42) # Creating a DataFrame with 200 genes num_genes = 200 data = { "∆ AUC": np.random.uniform(-0.5, 0.5, num_genes), "FDR-BH pval": np.random.uniform(0.05, 1, num_genes), } # Marking two genes as down-regulated and two as up-regulated data["∆ AUC"][:2] = np.random.uniform(-2, -1, 2) # Down-regulated data["FDR-BH pval"][:2] = np.random.uniform(0, 0.01, 2) data["∆ AUC"][-2:] = np.random.uniform(1, 2, 2) # Up-regulated data["FDR-BH pval"][-2:] = np.random.uniform(0, 0.01, 2) df = pd.DataFrame(data) df["-log2FDR"] = df["FDR-BH pval"].apply(lambda x: -np.log(x)) # Generate volcano plot volcano_plot( df, effect_size_col="∆ AUC", ycol="-log2FDR", fdr_col="FDR-BH pval", vbar_std=2, # Adjust the standard deviation multiplier for vertical bars hbar_value=np.log10( -np.log2(0.05) ), # Adjust the horizontal line value for significance legend=("Down-regulated", "Up-regulated"), )
- Return type:
Axes