Command Line
SCALLOPS provides a powerful command-line interface (CLI) to automate various workflows and tasks related to Optical Pooled Screens (OPS). Each command supports multiple arguments and options, allowing users to customize their workflows.
scallops dialout
The scallops dialout command is designed for the analysis and reporting of pooled dialout library sequencing data. It allows users to run the dialout pipeline to analyze sequencing data, generate counts, align reads to a reference genome, and produce detailed reports. It provides two subcommands:
- analysis:
The analysis subcommand is responsible for running the main dialout library sequencing data analysis pipeline. It processes sequencing reads from FASTQ files, performs alignment to a reference genome using BWA, computes Hamming distances between sequences, and generates output files such as mapped counts, dropout data, and unaligned sequences. This subcommand is essential for the initial data processing of a pooled dialout experiment.
- report:
The report subcommand generates a comprehensive report based on the analysis pipeline’s output. It creates visual summaries, scatter plots, and statistics from the dialout analysis data, helping researchers interpret sequencing results. The report can include analysis of guide RNA sequences, mismatches, and dropout rates, providing valuable insights into the sequencing performance and accuracy.
usage: scallops dialout [-h] {analysis,report} ...
Sub-commands
analysis
Pooled dialout library sequencing data analysis
scallops dialout analysis [-h] --fastq FASTQ --fasta FASTA --design-csv
DESIGN_CSV [-o OUTPUT]
[--design-spacer-col DESIGN_SPACER_COL]
[--design-query DESIGN_QUERY]
[--save-unaligned-fasta]
required arguments
- --fastq
Path to FASTQ directory
- --fasta
FASTA file (template for mapping, including NNNs). May include multiple templates but the position of NNNs need to match. Example: TCTTGTGGAAAGGACGAAACACCGNNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATA
- --design-csv
Design CSV file
optional arguments
- -o, --output
Path to output directory
Default:
'dialout-analysis'- --design-spacer-col
Spacer column in design CSV
Default:
'spacer_20mer'- --design-query
Expression to filter rows of design CSV
- --save-unaligned-fasta
Whether to save unaligned reads to FASTA file
Default:
False
report
Pooled dialout library sequencing report
scallops dialout report [-h] --analysis-dir ANALYSIS_DIR -o OUTPUT
--design-csv DESIGN_CSV
[--design-spacer-col DESIGN_SPACER_COL]
[--design-query DESIGN_QUERY]
[--min-total-reads MIN_TOTAL_READS]
[--min-sample-reads MIN_SAMPLE_READS]
[--sample-names SAMPLE_NAMES]
required arguments
- --analysis-dir
Path to analysis output directory
- -o, --output
Path to output pdf file or a directory where individual png images will be created
- --design-csv
Design CSV file
optional arguments
- --design-spacer-col
Spacer column in design CSV
Default:
'spacer_20mer'- --design-query
Expression to filter rows of design CSV
- --min-total-reads
Minimum total reads across all samples to include guide in pairwise sample plot
Default:
10- --min-sample-reads
Minimum reads to include guide in individual sample guide rank plot
Default:
2- --sample-names
Path to CSV file containing the columns index and sample
scallops extract-crops
The scallops extract-crops command extracts image crops from labeled images.
usage: scallops extract-crops [-h] -i IMAGES [IMAGES ...] -o OUTPUT --labels
LABELS [--image-pattern IMAGE_PATTERN]
[--merge MERGE] [--label-name LABEL_NAME]
[--crop-size CROP_SIZE]
[--percentile-min PERCENTILE_MIN]
[--percentile-max PERCENTILE_MAX]
[--local-percentile-normalize]
[--local-percentile-overlap LOCAL_PERCENTILE_OVERLAP]
[--label-filter LABEL_FILTER] [--chunks CHUNKS]
[--output-format {tiff,npy}]
[--gaussian-sigma GAUSSIAN_SIGMA]
[-g [GROUPBY ...]] [-s [SUBSET ...]] [--force]
[--client CLIENT] [--dask-cluster DASK_CLUSTER]
[--verbose] [--no-version]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Path to the output directory where the results will be saved.
- --labels
Path to zarr directory containing labels
- --merge
Path to directory containing output from merge
optional arguments
- --image-pattern
Pattern to extract metadata from file names.
- --label-name
Name of labels to use. For example nuclei or cell
Default:
'cell'- --crop-size
Image crop size
Default:
224- --percentile-min
Percentile min for normalization
Default:
0.1- --percentile-max
Percentile max for normalization
Default:
99.9- --local-percentile-normalize
Perform percentile normalization locally
Default:
False- --local-percentile-overlap
Overlap for local normalization
- --label-filter
Expression to filter labels (e.g. barcode_Q_mean_0/barcode_Q_mean > 0.5) or path to Parquet file containing labels to include.
- --chunks
Chunk size for local normalization
- --output-format
Possible choices: tiff, npy
Output image format
Default:
'tiff'- --gaussian-sigma
Apply gaussian-smoothed mask to isolate target mask
- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
scallops find-objects
The scallops find-objects command finds objects in label images output from segmentation.
Find objects in a labeled array and output Parquet file with label as index.
usage: scallops find-objects [-h] --labels LABELS -o OUTPUT --label-pattern
LABEL_PATTERN [--label-suffix [LABEL_SUFFIX ...]]
[-s [SUBSET ...]] [--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--verbose]
[--no-version]
required arguments
- --labels
Path to zarr directory containing labels
- -o, --output
Path to the output directory where the results will be saved.
- --label-pattern
Format string to extract metadata from labels (e.g. {well})
optional arguments
- --label-suffix
Label suffixes to include (e.g. nuclei, cell, cytosol)
Default:
['cell', 'cytosol', 'nuclei']- -s, --subset
Subset of labels to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
scallops features
The scallops features command is used to compute various features from labeled images, producing output in Parquet format. Each feature is indexed by the label in the corresponding image. This command allows users to extract multiple types of features, including geometric, texture, and intensity-based features, for each region of interest, such as nuclei, cells, or cytosol.
Key Features:
Multi-region Feature Computation: Compute features for different regions in the images, such as nuclei, cells, and cytosol. The feature extraction process is customizable, allowing users to define which features to compute for each region.
Customizable Feature Sets: Users can specify which features to compute, with available shortcuts to quickly select groups of features (See Shortcuts).
Stacked Image Support: SCALLOPS supports processing both primary and stacked images, allowing users to compute features across multiple channels by stacking different image types together.
Scalable via Dask: The computation process leverages Dask for distributed and parallelized processing, enabling SCALLOPS to handle large image datasets efficiently.
Compute features and output Parquet files with label as index.
usage: scallops features [-h] -i IMAGES [IMAGES ...] -o OUTPUT --labels LABELS
[--image-pattern IMAGE_PATTERN]
[--features-nuclei FEATURES_NUCLEI [FEATURES_NUCLEI ...]]
[--features-cell FEATURES_CELL [FEATURES_CELL ...]]
[--features-cytosol FEATURES_CYTOSOL [FEATURES_CYTOSOL ...]]
[--objects OBJECTS]
[--stack-images [STACK_IMAGES ...]]
[--label-filter LABEL_FILTER]
[--stack-image-pattern STACK_IMAGE_PATTERN]
[--nuclei-min-area NUCLEI_MIN_AREA]
[--nuclei-max-area NUCLEI_MAX_AREA]
[--cell-min-area CELL_MIN_AREA]
[--cell-max-area CELL_MAX_AREA]
[--cytosol-min-area CYTOSOL_MIN_AREA]
[--cytosol-max-area CYTOSOL_MAX_AREA]
[--channel-rename CHANNEL_RENAME]
[--features-plot [FEATURES_PLOT ...]]
[-g [GROUPBY ...]] [-s [SUBSET ...]] [--force]
[--client CLIENT] [--dask-cluster DASK_CLUSTER]
[--verbose] [--no-version]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Path to the output directory where the results will be saved.
- --labels
Path to zarr directory containing labels
optional arguments
- --image-pattern
Pattern to extract metadata from file names.
- --features-nuclei
A space-separated list of features to extract (e.g., ‘area intensity_0 corr_0_1’). Channels are 0-indexed. Use shortcuts for efficiency:
For specific channels: ‘intensity_0,1,2’
For all channels (wildcard): ‘intensity_*’
For all channel pairs: ‘colocalization_*_*’
- --features-cell
A space-separated list of features to extract (e.g., ‘area intensity_0 corr_0_1’). Channels are 0-indexed. Use shortcuts for efficiency:
For specific channels: ‘intensity_0,1,2’
For all channels (wildcard): ‘intensity_*’
For all channel pairs: ‘colocalization_*_*’
- --features-cytosol
A space-separated list of features to extract (e.g., ‘area intensity_0 corr_0_1’). Channels are 0-indexed. Use shortcuts for efficiency:
For specific channels: ‘intensity_0,1,2’
For all channels (wildcard): ‘intensity_*’
For all channel pairs: ‘colocalization_*_*’
- --objects
Path to directory containing output from find-objects
- --stack-images
Path to additional images to stack with images. Add s prefix to refer to stack image channel index (e.g. corr_0_s0).
- --label-filter
Path to Parquet containing labels to include.
- --stack-image-pattern
Format string to extract metadata from the image file name.
- --nuclei-min-area
Remove nuclei with area < nuclei-area
Default:
2- --nuclei-max-area
Remove nuclei with area > nuclei-area
- --cell-min-area
Remove cell with area < cell-area
Default:
2- --cell-max-area
Remove cells with area > cell-area
- --cytosol-min-area
Remove cytosolic labels with area < cytosol-area
Default:
2- --cytosol-max-area
Remove cytosolic labels with area > cytosol-area
- --channel-rename
Inline JSON mapping channel index (0-based) to channel name for feature readability. Example ‘{“0”:”A”, “2”:”B”}’
- --features-plot
Optional feature names to plot
- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
Available Features
CellProfiler Features
- intensity
Measures several intensity features for identified objects. Parameters:
c: Channel index.
- granularity
Outputs spectra of size measurements of the textures in the image. Parameters:
c: Channel index.
- intensity-distribution
Measures radial intensity features for identified objects. Parameters:
c: Channel index.
bins: Number of bins to measure the distribution (default 4)
- intensity-distribution-zernike
Measures zernike intensity features for identified objects. Parameters:
c: Channel index.
moment: Maximum zernike moment (default 9)
- haralick
Measures the degree and nature of textures within objects to quantify their roughness and smoothness. Parameters:
c: Channel index.
scale: Number of pixels included in gray-level co-occurence matrix (Default 3).
- sizeshape
Measures several area and shape features of identified objects.
- neighbors
Calculates how many neighbors each object has and records various properties about the neighbors’ relationships, including the percentage of an object’s edge pixels that touch a neighbor.
- colocalization
Measures the colocalization and correlation between intensities in different channels on a pixel-by-pixel basis within identified objects. Parameters:
c1: First channel index.
c2: Second channel index.
Other Features
- pftas
Parameter-free threshold adjacency statistics. Outputs 54 features. Reference: Fast automated cell phenotype image classification Parameters:
c: Channel index.
- correlation-pearson-box
Pearson correlation coefficient between two channels in the label bounding box. Typically used to measure nuclei alignment quality of ISS and phenotype images. Parameters:
c1: First channel index.
c2: Second channel index.
- intersects-boundary
Determines whether a label intersects a stitch boundary. Parameters:
c: Channel index.
- spots
Counts the number of spots in a FISH image. Parameters:
c: Channel index.
min peak_distance: Minimum number of pixels separating peaks (default 3).
radius: Radius of the disk footprint used for non-maximum suppression in peak_local_max (default 3).
Shortcuts
Use * for all channels. Example: intensity_*, colocalization_*_*.
Include a comma separated list of channel indices (0-based) to include. Example: intensity_0,1,2,6.
Specify a range of channel indices using start:stop:step. Example: colocalization_0_1:10:2.
Notes
Feature names are case insensitive (intensity == Intensity) and hyphens in feature names are ignored (intensitydistribution == intensity-distribution)
scallops illum-corr
The scallops illum-corr command is used for performing illumination correction on images, a crucial preprocessing step in biomedical image analysis. Uneven illumination can introduce artifacts that affect the accuracy of downstream analysis tasks like segmentation and feature extraction. SCALLOPS provides an aggregation method for illumination correction with two aggregators: Median and mean.
Key Features:
This method computes illumination correction by aggregating images using mean or median, followed by an optional median filter and rescaling. It offers a simple and effective approach for addressing illumination variations. The output can be saved as Zarr or TIFF images.
This method is designed to improve image uniformity, thereby enhancing the reliability of image analysis workflows, particularly in the context of high-throughput biomedical imaging data.
Calculate illumination correction by aggregating images by mean, median or min, followed by median filter and rescaling. Outputs flat-field TIFF or Zarr image.
usage: scallops illum-corr agg [-h] -i IMAGES [IMAGES ...] -o OUTPUT
[-g [GROUPBY ...]] [-s [SUBSET ...]]
[--image-pattern IMAGE_PATTERN]
[--smooth SMOOTH]
[--agg-method {mean,median,min}] [--no-rescale]
[--output-image-format {tiff,zarr}]
[--z-index Z_INDEX] [--channel CHANNEL]
[--force] [--verbose]
[--expected-images EXPECTED_IMAGES]
[--no-version] [--client CLIENT]
[--dask-cluster DASK_CLUSTER]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Path to output Zarr image or TIFF directory
optional arguments
- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- --image-pattern
Pattern to extract metadata from file names.
- --smooth
The radius of the disk-shaped footprint for median filter. Default is sqrt((image_width * image_height) / (PI * 20)
- --agg-method
Possible choices: mean, median, min
Method to aggregate images
Default:
'mean'- --no-rescale
Do not use 2nd percentile for robust minimum
Default:
False- --output-image-format
Possible choices: tiff, zarr
Output image format
Default:
'tiff'- --z-index
Either max, focus, z-index (0-based), or a path to a Parquet file containing columns key and z_index. Focus selects the best z-index using the slope of the image log-log power spectrum.
Default:
'max'- --channel
Channel index (0-based) to select best focus z index
Default:
0- --force
Overwrite existing output
Default:
False- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --expected-images
Validate that the specified number of images are provided.
- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
scallops pooled-sbs
The scallops pooled-sbs command is designed for processing images from pooled in-situ sequencing (SBS) experiments. This pipeline includes spot detection, read calling, and merging of single-cell sequencing (SCS) data with phenotype data.
Key Features:
Spot Detection: The spot detection subcommand identifies candidate peaks in the image data, which correspond to sequencing spots. The results can be saved in multiple formats, including the raw data, filtered images, and detected peaks.
Reads Processing: The reads subcommand processes the detected spots to assign sequencing reads to specific labels, such as nuclei or cells. It also includes options for crosstalk correction between channels and outputs corrected and uncorrected base intensities.
Merging Data: The merge subcommand joins in-situ barcodes with phenotype data, allowing for a combined view of sequencing and phenotype information.
SBS image processing pipeline.
usage: scallops pooled-sbs [-h] {spot-detect,reads,merge} ...
Sub-commands
spot-detect
Run pooled in-situ sequencing spot detection. Outputs table of all candidate peaks and max, the input images with LoG and maximum filters applied. Optionally also outputs the std image, which contains the standard deviation over cycles, followed by the mean across channels to identify spot locations and the LoG image, which contains the LoG filtered image.
scallops pooled-sbs spot-detect [-h] -i IMAGES [IMAGES ...] -o OUTPUT -c
CHANNELS [CHANNELS ...]
[--image-pattern IMAGE_PATTERN]
[--max-filter-width MAX_FILTER_WIDTH]
[--sigma-log SIGMA_LOG [SIGMA_LOG ...]]
[--peak-neighborhood-size PEAK_NEIGHBORHOOD_SIZE]
[--cycles CYCLES [CYCLES ...]]
[--chunks CHUNKS] [--z-index Z_INDEX]
[-g [GROUPBY ...]] [-s [SUBSET ...]] [--force]
[--client CLIENT]
[--dask-cluster DASK_CLUSTER]
[--save [{log,std} ...]]
[--expected-cycles EXPECTED_CYCLES]
[--verbose] [--no-version]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Path to output Zarr containing peaks, max, and optionally std and log
- -c, --channel
Channel indices (0-based) to use for spot detection
optional arguments
- --image-pattern
Pattern to extract metadata from file names.
- --max-filter-width
Neighborhood size for max filtering on Laplacian-of-Gaussian filtered SBS data, dilating sequencing channels to compensate for single-pixel alignment error
Default:
3- --sigma-log
Size of gaussian kernel used in Laplacian-of-Gaussian filter
Default:
1- --peak-neighborhood-size
Neighborhood size for peak detection
Default:
5- --cycles
Optional subset of cycle indices (0-based) to include.
- --chunks
Chunk size to use to perform parallel spot detection. If not specified, image chunk size is used
- --z-index
Either max or a z-index (0-based)
Default:
max- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --save
Possible choices: log, std
Additional outputs to save
- --expected-cycles
Validate that the specified number of cycles are provided.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
reads
Run pooled in-situ sequencing read calling. Outputs reads, barcodes to labels assignments, crosstalk matrix, and table with corrected and uncorrected base intensities.
scallops pooled-sbs reads [-h] --spots SPOTS --labels LABELS --label-name
LABEL_NAME --barcodes BARCODES -o OUTPUT
[--read-quality-filter READ_QUALITY_FILTER]
[--min-area MIN_AREA] [--max-area MAX_AREA]
[--mismatches N_MISMATCHES]
[--expand-labels-distance EXPAND_LABELS_DISTANCE]
[--threshold-peaks THRESHOLD_PEAKS]
[--threshold-peaks-crosstalk THRESHOLD_PEAKS_CROSSTALK]
[--crosstalk-correction-method {li_and_speed,median,none}]
[--crosstalk-correction-by-t]
[--crosstalk-nreads CROSSTALK_NREADS] [--all-labels]
[-s [SUBSET ...]] [--bases BASES]
[--barcode-col BARCODE_COL] [--save-bases] [--force]
[--verbose] [--no-version] [--client CLIENT]
[--dask-cluster DASK_CLUSTER]
required arguments
- --spots
Zarr output from scallops pooled-sbs spot-detect
- --labels
Zarr output from scallops segment containing labels
- --label-name
Name of labels to use. For example nuclei or cell
- --barcodes
Path to the barcode CSV file containing a column named ‘barcode’.
- -o, --output
Path to the output directory where the results will be saved.
optional arguments
- --read-quality-filter
Filter reads before assigning reads to labels
- --min-area
Filter labels with area < min-area
- --max-area
Filter labels with area > max-area
- --mismatches
Correct reads <= mismatches from closest match in barcodes
- --expand-labels-distance
Expand labels by expand-labels-distance when matching reads to labels.
- --threshold-peaks
Filter reads before assigning reads to labels. Use auto to automatically determine threshold.
Default:
'auto'- --threshold-peaks-crosstalk
Threshold for peaks for identifying sequencing reads used in crosstalk correction. Use auto to automatically determine threshold.
Default:
'auto'- --crosstalk-correction-method
Possible choices: li_and_speed, median, none
Method to correct channel crosstalk
Default:
'median'- --crosstalk-correction-by-t
Correct crosstalk separately for each cycle
Default:
False- --crosstalk-nreads
Number of reads to sample to compute crosstalk correction. Use -1to include all reads.
Default:
500000- --all-labels
Call reads both in and outside labels.
Default:
False- -s, --subset
Subset of images to include.
- --bases
ISS bases
Default:
'GTAC'- --barcode-col
Barcode column in barcodes CSV
Default:
'barcode'- --save-bases
Save individual base intensities
Default:
False- --force
Overwrite existing output
Default:
False- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
merge
Join in-situ barcodes with phenotype data and output as Parquet.
scallops pooled-sbs merge [-h] --sbs SBS --phenotype PHENOTYPE [PHENOTYPE ...]
[--join-sbs {inner,outer}]
[--join-phenotype {inner,outer}]
[--phenotype-suffix [PHENOTYPE_SUFFIX ...]]
[--format {parquet,zarr}] --barcodes BARCODES -o
OUTPUT [-s [SUBSET ...]] [--barcode-col BARCODE_COL]
[--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--verbose]
[--no-version]
required arguments
- --sbs
Directory containing SBS parquet files.
- --phenotype
Directories with phenotype parquet files.
optional arguments
- --join-sbs
Possible choices: inner, outer
SBS join type.
Default:
'outer'- --join-phenotype
Possible choices: inner, outer
Phenotype join type.
Default:
'outer'- --phenotype-suffix
Suffix for phenotype columns.
- --format
Possible choices: parquet, zarr
Output file format.
Default:
'parquet'- --barcodes
Path to the barcode CSV file containing a column named ‘barcode’.
- -o, --output
Path to the output directory where the results will be saved.
- -s, --subset
Subset of images to include.
- --barcode-col
Barcode column in barcodes CSV
Default:
'barcode'- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
scallops norm-features
The scallops norm-features command is used to normalize features.
usage: scallops rank-features [-h] -i INPUT [INPUT ...] [--output OUTPUT]
[--features [FEATURES ...]]
[--rank-method {welch_t,student_t,mannwhitney}]
[--label-filter LABEL_FILTER]
[--iqr-multiplier IQR_MULTIPLIER]
[--perturbation PERTURBATION] --reference
REFERENCE [--by [BY ...]]
[--min-labels MIN_LABELS] [--metadata METADATA]
[--join [JOIN ...]] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--force]
[--no-version]
required arguments
- -i, --input
Path to normalized file(s)
- --output
Path to Parquet file containing ranked features.
optional arguments
- --features
Features to include. If not specified, all features are used.
- --rank-method
Possible choices: welch_t, student_t, mannwhitney
Method to rank features
Default:
'welch_t'- --label-filter
Expression to filter labels (e.g. barcode_Q_mean_0/barcode_Q_mean > 0.5)
- --iqr-multiplier
Include values between Q25 - multiplier * IQR and Q75 - multiplier * IQR
- --perturbation
Field name to group perturbations
Default:
'gene_symbol'- --reference
Reference value in perturbation to compare against.
- --by
Stratify by groups when ranking.
- --min-labels
Require at least min-labels to include perturbation
Default:
10- --metadata
Path to CVS or Parquet file containing metadata to join with merged data.
- --join
Field(s) to join on
- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --force
Overwrite existing output
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
scallops rank-features
The scallops rank-features command is used to compute significance from the output of scallops norm-features.
usage: scallops rank-features [-h] -i INPUT [INPUT ...] [--output OUTPUT]
[--features [FEATURES ...]]
[--rank-method {welch_t,student_t,mannwhitney}]
[--label-filter LABEL_FILTER]
[--iqr-multiplier IQR_MULTIPLIER]
[--perturbation PERTURBATION] --reference
REFERENCE [--by [BY ...]]
[--min-labels MIN_LABELS] [--metadata METADATA]
[--join [JOIN ...]] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--force]
[--no-version]
required arguments
- -i, --input
Path to normalized file(s)
- --output
Path to Parquet file containing ranked features.
optional arguments
- --features
Features to include. If not specified, all features are used.
- --rank-method
Possible choices: welch_t, student_t, mannwhitney
Method to rank features
Default:
'welch_t'- --label-filter
Expression to filter labels (e.g. barcode_Q_mean_0/barcode_Q_mean > 0.5)
- --iqr-multiplier
Include values between Q25 - multiplier * IQR and Q75 - multiplier * IQR
- --perturbation
Field name to group perturbations
Default:
'gene_symbol'- --reference
Reference value in perturbation to compare against.
- --by
Stratify by groups when ranking.
- --min-labels
Require at least min-labels to include perturbation
Default:
10- --metadata
Path to CVS or Parquet file containing metadata to join with merged data.
- --join
Field(s) to join on
- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --force
Overwrite existing output
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
scallops registration
The scallops registration command provides functionality for performing image registration. This includes image alignment using ITK, cross-correlation-based registration, and applying precomputed transformations to images or labels.
Key Features:
ITK Registration: The elastix subcommand performs registration of moving images to fixed images or across timepoints using ITK. It supports the use of pre-configured ITK parameters and outputs transformed images and/or labels in Zarr format.
Cross-Correlation Registration: The cross-correlation subcommand registers images by aligning them within and across timepoints, using cross-correlation and specified channels for registration.
Transform Application: The transformix subcommand applies previously computed ITK transformations to images or labels, storing the results in a Zarr output directory.
Image registration
usage: scallops registration [-h] {elastix,transformix,cross-correlation} ...
Sub-commands
elastix
Register moving image to fixed image using ITK. If no fixed image is provided, registers moving image to a specified timepoint. Outputs are stored in Zarr format.
scallops registration elastix [-h] --moving MOVING [MOVING ...]
[--fixed FIXED [FIXED ...]]
[--moving-label [MOVING_LABEL ...]]
[--itk-parameters ITK_PARAMETERS [ITK_PARAMETERS ...]]
[--moving-image-pattern MOVING_IMAGE_PATTERN]
[--fixed-image-pattern FIXED_IMAGE_PATTERN]
[--moving-image-spacing MOVING_IMAGE_SPACING]
[--fixed-image-spacing FIXED_IMAGE_SPACING]
[--moving-channel MOVING_CHANNEL]
[--fixed-channel FIXED_CHANNEL]
[--transform-output TRANSFORM_OUTPUT_DIR]
[--label-output LABEL_OUTPUT_DIR]
[--moving-output MOVING_OUTPUT_DIR]
[--time TIME] [--unroll-channels]
[--sort SORT [SORT ...]] [--no-landmarks]
[--landmark-min-score LANDMARK_MIN_SCORE]
[--landmark-step-size LANDMARK_STEP_SIZE]
[--landmark-image-chunk-size LANDMARK_IMAGE_CHUNK_SIZE]
[--landmark-template-padding LANDMARK_TEMPLATE_PADDING [LANDMARK_TEMPLATE_PADDING ...]]
[--landmark-initialization {com,none} [{com,none} ...]]
[--landmark-com-min-quantile LANDMARK_COM_MIN_QUANTILE]
[--landmark-com-max-quantile LANDMARK_COM_MAX_QUANTILE]
[--landmark-min-count LANDMARK_MIN_COUNT]
[--output-aligned-channels-only]
[--itk-channels [ITK_CHANNELS ...]]
[--z-index Z_INDEX] [-g [GROUPBY ...]]
[-s [SUBSET ...]] [--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--verbose]
[--no-version]
required arguments
- --moving
Paths to directories containing nd2, tiff, zarr, or other Bio-Formats images
optional arguments
- --fixed
Paths to directories containing nd2, tiff, zarr, or other Bio-Formats images
- --moving-label
Path to Zarr directories containing labels to transform
- --itk-parameters
Paths to files containing ITK parameters or predefined parameter maps
Default:
['affine', 'nl-100']- --moving-image-pattern
Format string to extract metadata from the moving image file name
- --fixed-image-pattern
Format string to extract metadata from the fixed image file name
- --moving-image-spacing
Physical size y, x if image metadata does not contain this information
- --fixed-image-spacing
Physical size y, x if image metadata does not contain this information
- --moving-channel
Moving channel index (0-based) to use for alignment
Default:
0- --fixed-channel
Fixed channel index (0-based) to use for alignment
Default:
0- --transform-output
Path to output directory for transformations
Default:
'transforms'- --label-output
Path to save transformed moving labels
- --moving-output
Path to save transformed moving image
- --time, -t
Time index (0-based) or value for alignment across timepoints
Default:
'0'- --unroll-channels
Unroll channels (drop ‘t’ dimension) in output image
Default:
False- --sort
Custom sort order. Example: 20231012_20x_6W_IF 20231010_20x_6W_FISH
- --no-landmarks
Do not use landmarks to find corresponding regions between moving and fixed images to initialize the registration
Default:
False- --landmark-min-score
Minimum score to include matching region for landmark estimation
Default:
0.6- --landmark-step-size
Grid step size for landmark estimation in physical units
Default:
1000- --landmark-image-chunk-size
Image chunk size in physical units
Default:
200- --landmark-template-padding
Template padding in physical units. Values are tried until landmark-min-count landmarks are found.
Default:
[750, 1000, 1250, 2250]- --landmark-initialization
Possible choices: com, none
Initial alignment method for landmark estimation: com (center of mass) or none
Default:
['com', 'none']- --landmark-com-min-quantile
Include values >= specified quantile for center of mass computation.
Default:
0.25- --landmark-com-max-quantile
Include values <= specified quantile for center of mass computation.
Default:
0.75- --landmark-min-count
Ensure landmark-min-count landmarks are found.
Default:
100- --output-aligned-channels-only
Whether to output aligned channels only
Default:
False- --itk-channels
Paths to files containing ITK parameters or predefined parameter maps for registering across channels
- --z-index
Either max or a z-index (0-based)
Default:
max- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
transformix
Transform moving image to fixed image using previously computed ITK transformations
scallops registration transformix [-h] --transform TRANSFORM_DIR --output
OUTPUT --images IMAGES
[--image-spacing IMAGE_SPACING]
[--type {images,labels}] [--force]
[--client CLIENT]
[--dask-cluster DASK_CLUSTER]
required arguments
- --transform
Path to directory containing transformations
- --output
Path to output Zarr directory
- --images
Path to Zarr directory to transform
optional arguments
- --image-spacing
Physical size y, x if metadata does not contain this information
- --type
Possible choices: images, labels
Whether to transform images or labels
Default:
'labels'- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
cross-correlation
Register image across and within cycles
scallops registration cross-correlation [-h] -i IMAGES [IMAGES ...] -o OUTPUT
[--image-pattern IMAGE_PATTERN]
[--across-t-channel ACROSS_T_CHANNEL]
[--within-t-channel [WITHIN_T_CHANNEL ...]]
[--within-t-filter-min REGISTRATION_FILTER_MIN]
[--within-t-filter-max REGISTRATION_FILTER_MAX]
[-g [GROUPBY ...]] [-s [SUBSET ...]]
[--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Zarr output directory
optional arguments
- --image-pattern
Pattern to extract metadata from file names.
- --across-t-channel
Channel index (0-based) to use to register across cycles
- --within-t-channel
Channel indices (0-based) to use to register within cycles
- --within-t-filter-min
Replace data outside of specified percentile range [p1, p2] with uniform noise when aligning within t
Default:
0- --within-t-filter-max
Replace data outside of specified percentile range [p1, p2] with uniform noise when aligning within t
Default:
90- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
Predefined Registration parameters
Scallops provides a set of predefined parameters for registration. Options that end in wsireg where adopted from WSIreg.
Available Options
rigid
Description: Rigid registration using 1 resolution and a small step size.
Transformations: Translation, Rotation
affine
Description: Affine registration using 1 resolution and a small step size.
Transformations: Translation, Rotation, Scaling, Shearing
nl-100
Description: Non-linear registration using B-splines using 1 resolution and a final grid spacing of 100 microns..
Transformations: Non-linear (B-spline)
rigid-wsireg
Description: Rigid registration using 10 resolutions.
Transformations: Translation, Rotation
affine-wsireg
Description: Affine registration using 10 resolutions.
Transformations: Translation, Rotation, Scaling, Shearing
similarity-wsireg
Description: Similarity registration using 10 resolutions.
Transformations: Translation, Rotation, Uniform Scaling
nl-wsireg
Description: Non-linear registration using B-splines using 10 resolutions and a final grid spacing of 100 microns.
Transformations: Non-linear (B-spline)
nl2-wsireg
Description: Non-linear registration using B-splines using 10 resolutions and a final grid spacing of 75 microns.
Transformations: Non-linear (B-spline)
nl3-wsireg
Description: Non-linear registration using B-splines using 1 resolution and a final grid spacing of 200 microns.
Transformations: Non-linear (B-spline)
fi_correction-wsireg
Description: Rigid registration using 4 resolutions.
Transformations: Translation, Rotation
These parameters use mutual information as the image similarity measure; advanced mean squares and advanced normalized correlation versions of these options are available with the suffixes ams and anc respectively. For example, rigid-anc.
Note that parameters can be composed in any manner. For example rigid affine nl-100.
In order to use custom registration parameters, pass a set of JSON files to the itk-parameters argument. Please refer to the Elastix manual for more information.
scallops segment
The scallops segment command provides a command-line interface (CLI) for performing nuclei and cell segmentation. It supports various segmentation algorithms and outputs segmented labels in Zarr format.
Key Features:
Nuclei Segmentation: The nuclei subcommand performs nuclei segmentation using methods such as Stardist and Cellpose, with optional filtering based on area.
Cell Segmentation: The cell subcommand performs cell segmentation using methods like Watershed, Cellpose, and Propagation. It also supports various cytoplasmic channels, thresholding, and post-segmentation filtering.
Segmentation
usage: scallops segment [-h] {nuclei,cell} ...
Sub-commands
nuclei
Nuclei segmentation. Outputs a Zarr image containing nuclei labels.
scallops segment nuclei [-h] -i IMAGES [IMAGES ...] -o OUTPUT
[--method {cellpose,stardist}]
[--image-pattern IMAGE_PATTERN] [-g [GROUPBY ...]]
[--dapi-channel DAPI_CHANNEL] [--min-area MIN_AREA]
[--max-area MAX_AREA] [--chunks CHUNKS]
[--chunk-overlap CHUNK_OVERLAP] [--z-index Z_INDEX]
[--no-version] [--stardist-clip]
[--stardist-pmin STARDIST_PMIN]
[--stardist-pmax STARDIST_PMAX] [-s [SUBSET ...]]
[--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--verbose]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Path to output zarr image directory
optional arguments
- --method
Possible choices: cellpose, stardist
Nuclei segmentation algorithm
Default:
'stardist'- --image-pattern
Pattern to extract metadata from file names.
- -g, --groupby
Keys to group images.
- --dapi-channel
Channel index (0-based) where DAPI is found
Default:
0- --min-area
Filter labels with area < min-area
- --max-area
Filter labels with area > -max-area
- --chunks
Chunk size to use to perform segmentation in chunks
- --chunk-overlap
Chunk size overlap to use to perform segmentation using overlapping chunks
- --z-index
Either max or a z-index (0-based)
Default:
max- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False- --stardist-clip
Whether to clip normalized image values to between 0 and 1
Default:
False- --stardist-pmin
Minimum percentile for image normalization. Default is 3.
- --stardist-pmax
Maximum percentile for image normalization. Default is 99.8.
- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False
cell
Cell segmentation. Outputs a Zarr image containing cell labels.
scallops segment cell [-h] -i IMAGES [IMAGES ...] -o OUTPUT
[--nuclei-label NUCLEI_LABEL]
[--method {cellpose,propagation,watershed,watershed-intensity}]
[--threshold THRESHOLD]
[--threshold-correction-factor THRESHOLD_CORRECTION_FACTOR]
[--cyto-channel [CYTO_CHANNEL ...]]
[--image-pattern IMAGE_PATTERN] [-g [GROUPBY ...]]
[--dapi-channel DAPI_CHANNEL] [--min-area MIN_AREA]
[--max-area MAX_AREA] [--chunks CHUNKS]
[--chunk-overlap CHUNK_OVERLAP] [--z-index Z_INDEX]
[--no-version] [--nuclei-min-area NUCLEI_MIN_AREA]
[--nuclei-max-area NUCLEI_MAX_AREA] [--rolling-ball]
[--sigma CELL_SEGMENTATION_SIGMA]
[--closing-radius CLOSING_RADIUS]
[--time CELL_SEGMENTATION_T] [--shrink-nuclei]
[-s [SUBSET ...]] [--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER] [--verbose]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Path to output zarr image directory
optional arguments
- --nuclei-label
Path to zarr directory containing nuclei labels for watershed or propagation segmentation
- --method
Possible choices: cellpose, propagation, watershed, watershed-intensity
Cell segmentation algorithm. Note that only watershed and propagation will output cells that match nuclei
Default:
'propagation'- --threshold
Threshold for watershed or propagation methods. Either Li, Otsu, Local, or manually determined value
Default:
'Li'- --threshold-correction-factor
Factor to adjust the computed threshold by if threshold is not a manually determined value
Default:
1- --cyto-channel
Channel index (0-based) to infer cell segmentation from. Default is all non-DAPI channels. If more than one channel specified, use minimum across time (cycles) then mean over channels, or if only one time point is present, use mean over channels.
- --image-pattern
Pattern to extract metadata from file names.
- -g, --groupby
Keys to group images.
- --dapi-channel
Channel index (0-based) where DAPI is found
Default:
0- --min-area
Filter labels with area < min-area
- --max-area
Filter labels with area > -max-area
- --chunks
Chunk size to use to perform segmentation in chunks
- --chunk-overlap
Chunk size overlap to use to perform segmentation using overlapping chunks
- --z-index
Either max or a z-index (0-based)
Default:
max- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False- --nuclei-min-area
Filter nuclei labels with area < min-area
- --nuclei-max-area
Filter nuclei labels with area > -max-area
- --rolling-ball
Apply rolling ball subtraction to cell mask prior to computing threshold
Default:
False- --sigma
Size of gaussian kernel used to smooth the cell mask prior to computing threshold
- --closing-radius
Disk radius to use for binary closing cell labels post segmentation
- --time
Time indices (0-based) to include when computing cell segmentation mask. Defaults to all time points.
- --shrink-nuclei
Shrink nuclei prior to subtraction of nuclei from cells to identify the cytosol.
Default:
False- -s, --subset
Subset of images to include.
- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --verbose
Run in verbose mode. Useful for debugging.
Default:
False
scallops stitch
The scallops stitch command provides a command-line interface (CLI) for performing stitching of microscopy images.
Key Features:
Performance: Utilize dask for parallel processing.
Cross-correlation: Use both phase and no normalization in cross-correlation computations, automatically choosing the one that gives the best result.
Stage Position Handling: Read stage positions directly from Bioformats-supported images, such as .nd2 files, or from a CSV file.
Comprehensive Output: Outputs stitched image in OME-ZARR format, stitched positions in Parquet format, PDF report, tile boundary mask, and tile source labels in OME-ZARR format.
Z-index: Option to specify specific Z index or perform maximum Z projection.
Blending: Enable or disable image blending during stitching. When not blending, use tile closest to well center in overlapping regions.
Crop: Crop image tiles to remove edge effects.
Radial Correction: Automatically determine K for radial distortion and apply radial correction.
Stitching Evaluation: Compute error in overlapping regions after stitching.
Stitch microscopy images
usage: scallops stitch [-h] -i IMAGES [IMAGES ...] --report-output
REPORT_OUTPUT [--image-output IMAGE_OUTPUT]
[--channel-name [CHANNEL_NAME ...]]
[--image-pattern IMAGE_PATTERN] [-g [GROUPBY ...]]
[-s [SUBSET ...]] [-c ALIGN_CHANNEL]
[--radial-correction-k RADIAL_CORRECTION_K]
[--stitch-alpha STITCH_ALPHA]
[--max-shift MAX_SHIFT [MAX_SHIFT ...]]
[--no-save-image] [--no-save-labels] [--no-evaluate]
[--ffp FFP] [--dfp DFP] [--blend {linear,none}]
[--output-channels [OUTPUT_CHANNELS ...]]
[--crop-y CROP_Y] [--crop-x CROP_X]
[--stage-positions STAGE_POSITIONS]
[--image-spacing IMAGE_SPACING]
[--min-overlap-fraction MIN_OVERLAP_FRACTION]
[--random-seed RANDOM_SEED]
[--cross-correlation-upsample CROSS_CORRELATION_UPSAMPLE]
[--rename RENAME] [--flip-y-axis {1,0}]
[--flip-x-axis {1,0}] [--swap-axes {1,0}]
[--z-index Z_INDEX] [--force] [--client CLIENT]
[--dask-cluster DASK_CLUSTER]
[--expected-images EXPECTED_IMAGES] [--no-version]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- --report-output
Output directory for stitched positions and QC report.
optional arguments
- --image-output
Output zarr directory for stitched images and masks.
- --channel-name
Channel names to save in output image. If specified, must equal the number of channels in input tiles.
- --image-pattern
Pattern to extract metadata from file names.
- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- -c, --align-channel
Channel index (0-based) to use for alignment.
Default:
0- --radial-correction-k
K to correct for radial distortion. Use auto to automatically determine k and none to disable auto determination.
Default:
'auto'- --stitch-alpha
Significance level for alignment error quantification.
Default:
0.001- --max-shift
Maximum allowed per-tile shift in microns
Default:
[50, 100, 150]- --no-save-image
Do not save stitched image.
Default:
False- --no-save-labels
Do not save tile boundary label mask or tile source labels.
Default:
False- --no-evaluate
Do not evaluate stitching quality.
Default:
False- --ffp
Path for flat-field correction profile image.
- --dfp
Path for dark-field correction profile image.
- --blend
Possible choices: linear, none
Blending method for stitched images
Default:
'none'- --output-channels
Output channels to save in stitched image.
- --crop-y
Crop tiles by crop pixels along y dimension when aligning tiles. Setautomatically when radial correction is enabled.
- --crop-x
Crop tiles by crop pixels along x dimension when aligning tiles. Setautomatically when radial correction is enabled.
- --stage-positions
Optional CSV or Parquet file containing stage positions. Use when image metadata is missing stage positions. Expected columns name, y, and x, where name is the full image path.
- --image-spacing
Physical size y, x if image metadata does not contain this information
- --min-overlap-fraction
Minimum tile overlap fraction to include edge in graph. Determined automatically if not provided.
- --random-seed
Random seed for reproducibility.
Default:
239753- --cross-correlation-upsample
Upsampling factor for registration precision.
Default:
1- --rename
CSV file mapping old image IDs to new IDs for output file names.
- --flip-y-axis
Possible choices: 1, 0
Whether to flip tile y axis. Determined automatically if not provided.
- --flip-x-axis
Possible choices: 1, 0
Whether to flip tile x axis. Determined automatically if not provided.
- --swap-axes
Possible choices: 1, 0
Whether to swap tile y and x axes. Determined automatically if not provided.
- --z-index
Either max, focus, z-index (0-based), or a path to a Parquet file containing columns key and z_index. Focus selects the best z-index using the slope of the image log-log power spectrum.
Default:
'max'- --force
Overwrite existing output
Default:
False- --client
URL of the Dask scheduler. Use ‘none’ to disable distributed execution.
Default:
'none'- --dask-cluster
JSON URL or inline JSON containing dask cluster parameters.
- --expected-images
Validate that the specified number of images are provided.
- --no-version
Do not store command line arguments and scallops version in output metadata.
Default:
False
stitch-preview
The stitch-preview command provides a quick preview of stitched multi-tile microscopy images, allowing users to visualize the result before performing full stitching. It uses the stage positions for stitching and saves the resulting image.
Key Features:
Tile Positioning: Stitching uses stage positions and has options to display tile numbers and bounds.
Downsampling: Enables downsampling of the image resolution to improve performance and reduce memory requirements.
Channel Selection: Allows users to specify the channel to display from multi-channel images.
Log Transformation: Optionally apply log transformation to pixel intensities for better visualization of dim images.
Create a multi-tile image using image stage positions.
usage: scallops stitch-preview [-h] -i IMAGES [IMAGES ...] -o OUTPUT
[--image-pattern IMAGE_PATTERN]
[-g [GROUPBY ...]] [-s [SUBSET ...]] [-n] [-b]
[--no-tiles] [-c CHANNEL] [-d DOWNSAMPLE] [-l]
[--stage-positions STAGE_POSITIONS]
[--z-index Z_INDEX]
required arguments
- -i, --images
Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.
- -o, --output
Output directory.
optional arguments
- --image-pattern
Pattern to extract metadata from file names.
- -g, --groupby
Keys to group images.
- -s, --subset
Subset of images to include.
- -n, --numbers
Display tile numbers.
Default:
False- -b, --bounds
Display tile bounds.
Default:
False- --no-tiles
Do not display image tiles.
Default:
False- -c, --channel
Channel index (0-based) to display.
Default:
0- -d, --downsample
Downsample image resolution.
Default:
20- -l, --log
Log-transform pixel intensities to help visualize dim images.
Default:
False- --stage-positions
Optional CSV file containing stage positions. Use when image metadata is missing stage positions. Expected columns name, y, and x, where name is the full image path.
- --z-index
Either max, focus, z-index (0-based), or a path to a Parquet file containing columns key and z_index. Focus selects the best z-index using the slope of the image log-log power spectrum.
Default:
'max'
Outputs explained
Scallop’s command line has a series of default and optional outputs that will be described below.
In-Situ sequencing pipeline (pooled-sbs)
Let’s first touch a bit (for more info check the CLI documentation. Let’s say that we want to run in-situ sequencing pipeline from the command line using stardist (the default) as nuclei segmentation, followed by watershed cell segmentation with threshold defined by the Li’s method. Let’s assume that you would like to use the test files and that your current working directory is Scallop’s root directory. Then:
scallops pooled-sbs pipeline scallops/tests/data/experimentC/input --barcodes scallops/tests/data/experimentC/barcodes.csv --pheno=scallops/tests/data/experimentC/10X_c0-DAPI-p65ab
Will generate the following directories:
bases cells combined images.zarr phenos reads
In bases you will find the dataframes in parquet format of the bases
information extracted from your groups. Since by default we group by
tile and well, and we only have one well and two tiles, you’ll find:
./bases
├── A1-102.parquet
└── A1-103.parquet
1 directory, 2 files
Both files contain the read id, cycle, channel, intensity, cell id, coordinates y and x, well and tile information:
y |
x |
read |
peak |
cell |
t |
c |
intensity |
corrected_intensity |
well |
tile |
|---|---|---|---|---|---|---|---|---|---|---|
5 5 |
705 705 |
0 0 |
364.59483811395535 364.59483811395535 |
0 0 |
1 1 |
G T |
538.2688477073593 2706.2792184281157 |
-48.44409230154324 3510.5365471687214 |
A1 A1 |
102 102 |
Likewise, in the reads directory, you’ll find the dataframes containing the reads info:
./reads
├── A1-102.parquet
└── A1-103.parquet
1 directory, 2 files
Containing the identified reads information such as quantiles and peaks:
y |
x |
read |
peak |
cell |
barcode |
Q_0 |
Q_1 |
Q_2 |
Q_3 |
Q_4 |
Q_5 |
Q_6 |
Q_7 |
Q_8 |
Q_min |
well |
tile |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 5 |
705 756 |
0 1 |
364.59483811395535 1162.3744344193526 |
0 0 |
TATTCTTCC AAGCCAATT |
0.8225319160232156 1.0 |
0.213521539841818 0.8525545481276788 |
0.5315020019263625 0.41514790479643904 |
1.0 0.4095041269847144 |
0.1871177283289207 1.0 |
0.6452085822773499 0.5086887399634992 |
1.0 0.9443351500907629 |
0.008381143001541913 0.5966046145349035 |
0.1944106438921418 1.0 |
0.008381143001541913 0.4095041269847144 |
A1 A1 |
102 102 |
Then, the cells directory includes parquet files with cell information:
./cells
├── A1-102.parquet
└── A1-103.parquet
1 directory, 2 files
with the barcodes counts and their corresponding peaks and sequences:
peak |
cell |
cell_barcode_0 |
cell_barcode_count_0 |
cell_barcode_1 |
cell_barcode_count_1 |
barcode_count |
well |
tile |
|---|---|---|---|---|---|---|---|---|
438.1286071891493 398.4562017293166 |
36 17 |
GACCAATGG CTTCGCACT |
4 2 |
ACCGGTTTA |
1.0 0.0 |
5 2 |
A1 A1 |
102 102 |
Finally, we have the combined directory with all the information combined:
./combined/
├── A1-102.parquet
└── A1-103.parquet
1 directory, 2 files
well |
tile |
cell |
peak |
cell_barcode_0 |
cell_barcode_count_0 |
cell_barcode_1 |
cell_barcode_count_1 |
barcode_count |
cells_x |
cells_y |
cells_area |
nuclei_max_1 |
nuclei_mean_1 |
nuclei_corr_0_1 |
nuclei_y |
nuclei_median_1 |
nuclei_max_0 |
nuclei_area |
nuclei_x |
nuclei_median_0 |
nuclei_mean_0 |
sgRNA |
gene_symbol |
duplicate_prefix |
sgRNA_1 |
gene_symbol_1 |
duplicate_prefix_1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1 A1 |
102 102 |
17 19 |
398.4562017293166 147.69660807883213 |
CTTCGCACT GCTGCAGTC |
2.0 1.0 |
CAAATCCCA |
0.0 1.0 |
2.0 2.0 |
821.2 890.7610062893082 |
8.631578947368421 10.754716981132075 |
95.0 159.0 |
2819 2122 |
2280.957746478873 1663.0069444444443 |
0.87922644251275 0.7376070465901201 |
7.788732394366197 10.11111111111111 |
2278.0 1682.0 |
1646 1792 |
71.0 144.0 |
821.7605633802817 891.2569444444445 |
1237.0 1411.0 |
1239.338028169014 1363.8402777777778 |
CTTCGACACTGATGATCTGC GCTGCAAGTCTCCCACCGGA |
ATXN3L SMAD1 |
False False |
CAAATCCCCAACTCATCTCG |
RNF24 |
False |
I have purposely left the zarr directory for last.
Zarr output
Note that you can choose to generate tiff instead of zarr images (see documentation for more information) and also can control which images are saved.
We follow the OME-ZARR format, which according to Open microscopy: >OME-Zarr is an implementation of the OME-NGFF specification using the Zarr format. Arrays MUST be defined and stored in a hierarchical organization as defined by the version 2 of the Zarr specification . OME-NGFF metadata MUST be stored as attributes in the corresponding Zarr groups.
In short is a hierarchical way to storing images that is very amenable
for cloud computing. Going back to our example above, our default
images.zarr contains:
./images.zarr/
├── .zgroup
├── A1-102
│ ├── .zattrs
│ ├── .zgroup
│ ├── 0
│ └── labels
├── A1-102-phenotype
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
├── A1-103
│ ├── .zattrs
│ ├── .zgroup
│ ├── 0
│ └── labels
└── A1-103-phenotype
├── .zattrs
├── .zgroup
└── 0
Which contain the images and labels of each of the groupings. Notice
that there are also metadata hidden files called .zattrs and
.zgroup which contain metadata about each level group and attributes
(i.e. the way to organize the inner structure). Let’s zoom in to only
one of the groupings:
/Users/hleaploj/Playground/testzarr/default/images.zarr
├── .zgroup
├── A1-102
│ ├── .zattrs
│ ├── .zgroup
│ ├── 0
│ │ ├── .zarray
│ │ ├── 0
│ │ │ ├── 0
│ │ │ │ └── 0
│ │ │ │ ├── 0
│ │ │ │ │ ├── 0
│ │ │ │ │ ├── 1
│ │ │ │ │ ├── 2
│ │ │ │ │ └── 3
│ │ │ │ ├── 1
│ │ │ │ │ ├── 0
│ │ │ │ │ ├── 1
│ │ │ │ │ ├── 2
│ │ │ │ │ └── 3
│ │ │ │ ├── 2
│ │ │ │ │ ├── 0
│ │ │ │ │ ├── 1
│ │ │ │ │ ├── 2
│ │ │ │ │ └── 3
│ │ │ │ └── 3
│ │ │ │ ├── 0
│ │ │ │ ├── 1
│ │ │ │ ├── 2
│ │ │ │ └── 3
│ │ │ ├── 1
│ │ │ │ └── 0
. . . . .
. . . . .
. . . . .
│ │ └── 2
│ │ └── 0
│ │ ├── 0
│ │ │ ├── 0
│ │ │ ├── 1
│ │ │ ├── 2
│ │ │ └── 3
│ │ ├── 1
. . . .
. . . .
. . . .
│ │ └── 3
│ │ ├── 0
│ │ ├── 1
│ │ ├── 2
│ │ └── 3
│ └── labels
│ ├── .zattrs
│ ├── .zgroup
│ ├── cell
│ │ ├── .zattrs
│ │ ├── .zgroup
│ │ └── 0
│ │ ├── .zarray
│ │ ├── 0
│ │ │ ├── 0
│ │ │ └── 1
│ │ ├── 1
. . . .
. . . .
. . . .
│ │ └── 3
│ │ ├── 0
│ │ └── 1
│ ├── cytosol
│ │ ├── .zattrs
│ │ ├── .zgroup
│ │ └── 0
│ │ ├── .zarray
│ │ ├── 0
│ │ │ ├── 0
│ │ │ └── 1
│ │ ├── 1
. . . .
. . . .
. . . .
│ │ └── 3
│ │ ├── 0
│ │ └── 1
│ ├── iss-spots
│ │ ├── .zattrs
│ │ ├── .zgroup
│ │ └── 0
│ │ ├── .zarray
│ │ ├── 0
│ │ │ ├── 0
│ │ │ ├── 1
│ │ │ ├── 2
│ │ │ └── 3
│ │ ├── 1
. . . .
. . . .
. . . .
│ │ └── 3
│ │ ├── 0
│ │ ├── 1
│ │ ├── 2
│ │ └── 3
│ └── nuclei
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ │ ├── 0
│ │ └── 1
│ ├── 1
. . .
. . .
. . .
│ └── 3
│ ├── 0
│ └── 1
└── A1-102-phenotype
├── .zattrs
├── .zgroup
└── 0
├── .zarray
├── 0
│ ├── 0
│ │ ├── 0
│ │ └── 1
│ └── 1
│ ├── 0
│ └── 1
└── 1
├── 0
│ ├── 0
│ └── 1
└── 1
├── 0
└── 1
Here we see an extra hidden file, .zarray, that informs where the
actual image starts (as opposed to groups). In the above case we can see
that the image of well A1 and tile 102, has a root called A1-102,
with three groups (0, 1 and 2), which according to the official
format: >Each multiscale
level is stored as a separate Zarr array, which is a folder containing
chunk files which compose the array. > The name of the array is
arbitrary with the ordering defined by the “multiscales” metadata, but
is often a sequence starting at 0.
Therein are all the chunked data: >Chunks are stored with the nested directory layout. All but the last chunk element are stored as directories. The terminal > chunk is a file. Together the directory and file names provide the “chunk coordinate” (t, c, z, y, x), where the maximum > coordinate will be dimension_size / chunk_size.
We then see the labels group. We store here all the segmentation
information including nuclei, cells and cytosol. All these follow the
OME-ZARR schema: >All labels will be listed in .zattrs. Each dimension
of the label (t, c, z, y, x) should be either the same as the >
corresponding dimension of the image, or 1 if that dimension of the
label is irrelevant.
Storing intermediate outputs
The CLI, through the option --save by including the things to save
(see CLI
documentation):
>–save Outputs to save. Choose from
cell-labels,nuclei-labels,cytosol-labels,spot-labels,aligned,cell-mask,max,log,std,peaks,
> phenotype-aligned,bases,reads,cells,phenotype,combined,crosstalk >
Default:
cell-labels,nuclei-labels,cytosol-labels,spot-labels,aligned,phenotype-aligned,bases,reads,cells,phenotype,combined
If you use them all, you’ll find (just showing first 3 levels):
./images.zarr
├── .zgroup
├── A1-102
│ ├── .zattrs
│ ├── .zgroup
│ ├── 0
│ │ ├── .zarray
│ │ ├── 0
│ │ ├── 1
│ │ └── 2
│ └── labels
│ ├── .zattrs
│ ├── .zgroup
│ ├── cell
│ ├── cytosol
│ ├── iss-spots
│ └── nuclei
├── A1-102-cell-mask
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ ├── 2
│ └── 3
├── A1-102-log
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ └── 2
├── A1-102-max
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ └── 2
├── A1-102-peaks
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ ├── 2
│ └── 3
├── A1-102-phenotype
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ └── 1
├── A1-102-std
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ ├── 2
│ └── 3
├── A1-103
│ ├── .zattrs
│ ├── .zgroup
│ ├── 0
│ │ ├── .zarray
│ │ ├── 0
│ │ ├── 1
│ │ └── 2
│ └── labels
│ ├── .zattrs
│ ├── .zgroup
│ ├── cell
│ ├── cytosol
│ ├── iss-spots
│ └── nuclei
├── A1-103-cell-mask
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ ├── 2
│ └── 3
├── A1-103-log
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ └── 2
├── A1-103-max
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ └── 2
├── A1-103-peaks
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ ├── 1
│ ├── 2
│ └── 3
├── A1-103-phenotype
│ ├── .zattrs
│ ├── .zgroup
│ └── 0
│ ├── .zarray
│ ├── 0
│ └── 1
└── A1-103-std
├── .zattrs
├── .zgroup
└── 0
├── .zarray
├── 0
├── 1
├── 2
└── 3
Illumination correction (illum-corr)
When using
BaSiCPy to do
illumination correction, the output will include a directory,
model of models, with one subdirectory per channel:
model/
├── c0
│ ├── profiles.npy
│ └── settings.json
├── c1
│ ├── profiles.npy
│ └── settings.json
└── c2
├── profiles.npy
└── settings.json
The profiles.npy files contain the models store in numpy binary,
while the json files contain the settings for the correction.
Additionally, it will generate one or two tiff files with the flatfield and, optionally, the darkfield.
If the --plot-fit is used, a multipage pdf would be generated
following the training of the model.
Dialout analysis (dialout)
Outputs: Reads per spacer_20mer per pool. Example:
sequence |
count_T2-A06 |
count_fraction_T2-A06 |
count_T2-A04 |
count_fraction_T2-A04 |
count_T2-A05 |
count_fraction_T2-A05 |
ID |
gene_id |
gene_symbol |
dialout |
mismatches |
closest_match |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
ATTCACAGTGCTGGTCCCAA GGAGTCCTCGGAGAGCAGGA TATGCTTGTAAACACCTTGG AAACTCCCTCATCCGCCCGA GTTGCCCTCGAGGTCAATGT |
1138.0 1125.0 1067.0 496.0 496.0 |
0.0017697628704371 0.001749545895643 0.0016593470850232 0.0007713553459901 0.0007713553459901 |
592.0 477.0 607.0 257.0 378.0 |
0.0015467540373676 0.0012462866145682 0.0015859454403415 0.0006714793709518 0.0009876233549408 |
1035.0 849.0 857.0 354.0 500.0 |
0.001774706573273 0.0014557737977863 0.0014694913365169 0.0006070010888296 0.0008573461706633 |
ENSG00000170142_61 ENSG00000263001_74 ENSG00000165699_523 1 ENSG00000130725_52 |
ENSG00000170142 ENSG00000263001 ENSG00000165699 AAACTCCCTCAGCCGCCCGA ENSG00000130725 |
UBE2E1 GTF2I TSC1 UBE2M |
5.0 5.0 5.0 5.0 |
0 0 0 0 |
Summarized stats per pool. Example:
index |
n_mapped |
fraction_mapped |
average_read_count |
skew_ratio |
drop_out_ratio |
n_drop_outs |
|---|---|---|---|---|---|---|
T2-A06 T2-A04 T2-A05 |
643024 382737 583195 |
0.6599747925724300 0.66117268637843 0.6565736737818610 |
392.9778761061950 235.98227474150700 354.9631268436580 |
3.115920398009950 3.187183811129850 3.104017611447440 |
0.0 0.0014749262536873200 0.0 |
0 1 0 |