scallops.visualize.distribution.volcano_plot

scallops.visualize.distribution.volcano_plot(df, effect_size_col='∆ AUC', ycol='-log10FDR', fdr_col='FDR', title=None, highlight=None, highlight_col=None, star=None, top_n=None, bottom_n=None, ax=None, vbar_std=None, hbar_value=None, magnitude_std=None, legend=None, xlim=None, ylim=None, logy=False, alpha=0.05, time_lim=5.0, **kwargs)

Generate a volcano plot to visualize differential expression.

Parameters:
  • df (DataFrame) – The input DataFrame containing the data for the volcano plot.

  • effect_size_col (str) – The column name representing the effect size (X-axis values).

  • ycol (str) – The column name representing the y-axis values.

  • fdr_col (str) – The column name representing the false discovery rate (FDR).

  • title (str | None) – The title of the volcano plot.

  • highlight (dict[str, Sequence[str]] | Literal['all', 'up', 'down'] | str | None) – A dictionary specifying columns and values to highlight in the plot. Alternatively, ‘all’ (highlight all up and down points), ‘up’ (only highlight up points) or down (only highlight down points). If a different string it assumes dataframe query. Default is None (do not highlight points).

  • highlight_col (str | None) – Name of a column in df to draw the annotations from. It is required if highlight is ‘all’, ‘up’ or ‘down’.

  • star (Sequence[str] | None) – Star a set of groups (from highlight_col), regardless of significance

  • top_n (int | None) – The number of top genes to highlight.

  • bottom_n (int | None) – The number of bottom genes to highlight.

  • ax (Axes) – Matplotlib Axes to plot on. If None, a new figure and axes will be created.

  • vbar_std (int | float | None) – The standard deviation multiplier for vertical bars indicating significance. If None, vertical bars will not be plotted.

  • hbar_value (int | float | None) – The horizontal line value for indicating significance. If None, no horizontal line will be plotted.

  • magnitude_std (float | None) – Magnitude of the standard deviation for indicating significance.

  • legend (tuple[str, str] | None) – A tuple specifying legend labels for up-regulated and down-regulated genes. If None, no legend will be displayed.

  • ylim (tuple[float, float] | None) – Tuples of floats specifying the lower and upper limits for x and y axes. Default is None (range of x and y axes)

  • logy (bool) – If True, apply a logarithmic scale to the y-axis.

  • alpha (float) – Significance level threshold for highlighting points.

  • time_lim (float) – Time limit for text adjustment (seconds)

  • kwargs (dict) – Additional keyword arguments to pass to Matplotlib subplots().

  • xlim (tuple[float, float] | None)

  • ylim

Returns:

Matplotlib Axes containing the volcano plot.

Raises:

ValueError – If the specified columns (effect_size_col or ycol) are not present in the DataFrame.

Example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scallops.visualize import volcano_plot

# Generate sample data
np.random.seed(42)

# Creating a DataFrame with 200 genes
num_genes = 200
data = {
    "∆ AUC": np.random.uniform(-0.5, 0.5, num_genes),
    "FDR-BH pval": np.random.uniform(0.05, 1, num_genes),
}

# Marking two genes as down-regulated and two as up-regulated
data["∆ AUC"][:2] = np.random.uniform(-2, -1, 2)  # Down-regulated
data["FDR-BH pval"][:2] = np.random.uniform(0, 0.01, 2)
data["∆ AUC"][-2:] = np.random.uniform(1, 2, 2)  # Up-regulated
data["FDR-BH pval"][-2:] = np.random.uniform(0, 0.01, 2)

df = pd.DataFrame(data)
df["-log2FDR"] = df["FDR-BH pval"].apply(lambda x: -np.log(x))
# Generate volcano plot
volcano_plot(
    df,
    effect_size_col="∆ AUC",
    ycol="-log2FDR",
    fdr_col="FDR-BH pval",
    vbar_std=2,  # Adjust the standard deviation multiplier for vertical bars
    hbar_value=np.log10(
        -np.log2(0.05)
    ),  # Adjust the horizontal line value for significance
    legend=("Down-regulated", "Up-regulated"),
)
Return type:

Axes