Command Line

SCALLOPS provides a powerful command-line interface (CLI) to automate various workflows and tasks related to Optical Pooled Screens (OPS). Each command supports multiple arguments and options, allowing users to customize their workflows.

scallops dialout

The scallops dialout command is designed for the analysis and reporting of pooled dialout library sequencing data. It allows users to run the dialout pipeline to analyze sequencing data, generate counts, align reads to a reference genome, and produce detailed reports. It provides two subcommands:

analysis:

The analysis subcommand is responsible for running the main dialout library sequencing data analysis pipeline. It processes sequencing reads from FASTQ files, performs alignment to a reference genome using BWA, computes Hamming distances between sequences, and generates output files such as mapped counts, dropout data, and unaligned sequences. This subcommand is essential for the initial data processing of a pooled dialout experiment.

report:

The report subcommand generates a comprehensive report based on the analysis pipeline’s output. It creates visual summaries, scatter plots, and statistics from the dialout analysis data, helping researchers interpret sequencing results. The report can include analysis of guide RNA sequences, mismatches, and dropout rates, providing valuable insights into the sequencing performance and accuracy.

usage: scallops dialout [-h] {analysis,report} ...

Sub-commands

analysis

Pooled dialout library sequencing data analysis

scallops dialout analysis [-h] --fastq FASTQ --fasta FASTA --design-csv
                          DESIGN_CSV [-o OUTPUT]
                          [--design-spacer-col DESIGN_SPACER_COL]
                          [--design-query DESIGN_QUERY]
                          [--save-unaligned-fasta]
required arguments
--fastq

Path to FASTQ directory

--fasta

FASTA file (template for mapping, including NNNs). May include multiple templates but the position of NNNs need to match. Example: TCTTGTGGAAAGGACGAAACACCGNNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATA

--design-csv

Design CSV file

optional arguments
-o, --output

Path to output directory

Default: 'dialout-analysis'

--design-spacer-col

Spacer column in design CSV

Default: 'spacer_20mer'

--design-query

Expression to filter rows of design CSV

--save-unaligned-fasta

Whether to save unaligned reads to FASTA file

Default: False

report

Pooled dialout library sequencing report

scallops dialout report [-h] --analysis-dir ANALYSIS_DIR -o OUTPUT
                        --design-csv DESIGN_CSV
                        [--design-spacer-col DESIGN_SPACER_COL]
                        [--design-query DESIGN_QUERY]
                        [--min-total-reads MIN_TOTAL_READS]
                        [--min-sample-reads MIN_SAMPLE_READS]
                        [--sample-names SAMPLE_NAMES]
required arguments
--analysis-dir

Path to analysis output directory

-o, --output

Path to output pdf file or a directory where individual png images will be created

--design-csv

Design CSV file

optional arguments
--design-spacer-col

Spacer column in design CSV

Default: 'spacer_20mer'

--design-query

Expression to filter rows of design CSV

--min-total-reads

Minimum total reads across all samples to include guide in pairwise sample plot

Default: 10

--min-sample-reads

Minimum reads to include guide in individual sample guide rank plot

Default: 2

--sample-names

Path to CSV file containing the columns index and sample

scallops extract-crops

The scallops extract-crops command extracts image crops from labeled images.

usage: scallops extract-crops [-h] -i IMAGES [IMAGES ...] -o OUTPUT --labels
                              LABELS [--image-pattern IMAGE_PATTERN]
                              [--merge MERGE] [--label-name LABEL_NAME]
                              [--crop-size CROP_SIZE]
                              [--percentile-min PERCENTILE_MIN]
                              [--percentile-max PERCENTILE_MAX]
                              [--local-percentile-normalize]
                              [--local-percentile-overlap LOCAL_PERCENTILE_OVERLAP]
                              [--label-filter LABEL_FILTER] [--chunks CHUNKS]
                              [--output-format {tiff,npy}]
                              [--gaussian-sigma GAUSSIAN_SIGMA]
                              [-g [GROUPBY ...]] [-s [SUBSET ...]] [--force]
                              [--client CLIENT] [--dask-cluster DASK_CLUSTER]
                              [--verbose] [--no-version]

required arguments

-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Path to the output directory where the results will be saved.

--labels

Path to zarr directory containing labels

--merge

Path to directory containing output from merge

optional arguments

--image-pattern

Pattern to extract metadata from file names.

--label-name

Name of labels to use. For example nuclei or cell

Default: 'cell'

--crop-size

Image crop size

Default: 224

--percentile-min

Percentile min for normalization

Default: 0.1

--percentile-max

Percentile max for normalization

Default: 99.9

--local-percentile-normalize

Perform percentile normalization locally

Default: False

--local-percentile-overlap

Overlap for local normalization

--label-filter

Expression to filter labels (e.g. barcode_Q_mean_0/barcode_Q_mean > 0.5) or path to Parquet file containing labels to include.

--chunks

Chunk size for local normalization

--output-format

Possible choices: tiff, npy

Output image format

Default: 'tiff'

--gaussian-sigma

Apply gaussian-smoothed mask to isolate target mask

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

scallops find-objects

The scallops find-objects command finds objects in label images output from segmentation.

Find objects in a labeled array and output Parquet file with label as index.

usage: scallops find-objects [-h] --labels LABELS -o OUTPUT --label-pattern
                             LABEL_PATTERN [--label-suffix [LABEL_SUFFIX ...]]
                             [-s [SUBSET ...]] [--force] [--client CLIENT]
                             [--dask-cluster DASK_CLUSTER] [--verbose]
                             [--no-version]

required arguments

--labels

Path to zarr directory containing labels

-o, --output

Path to the output directory where the results will be saved.

--label-pattern

Format string to extract metadata from labels (e.g. {well})

optional arguments

--label-suffix

Label suffixes to include (e.g. nuclei, cell, cytosol)

Default: ['cell', 'cytosol', 'nuclei']

-s, --subset

Subset of labels to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

scallops features

The scallops features command is used to compute various features from labeled images, producing output in Parquet format. Each feature is indexed by the label in the corresponding image. This command allows users to extract multiple types of features, including geometric, texture, and intensity-based features, for each region of interest, such as nuclei, cells, or cytosol.

Key Features:

  • Multi-region Feature Computation: Compute features for different regions in the images, such as nuclei, cells, and cytosol. The feature extraction process is customizable, allowing users to define which features to compute for each region.

  • Customizable Feature Sets: Users can specify which features to compute, with available shortcuts to quickly select groups of features (See Shortcuts).

  • Stacked Image Support: SCALLOPS supports processing both primary and stacked images, allowing users to compute features across multiple channels by stacking different image types together.

  • Scalable via Dask: The computation process leverages Dask for distributed and parallelized processing, enabling SCALLOPS to handle large image datasets efficiently.

Compute features and output Parquet files with label as index.

usage: scallops features [-h] -i IMAGES [IMAGES ...] -o OUTPUT --labels LABELS
                         [--image-pattern IMAGE_PATTERN]
                         [--features-nuclei FEATURES_NUCLEI [FEATURES_NUCLEI ...]]
                         [--features-cell FEATURES_CELL [FEATURES_CELL ...]]
                         [--features-cytosol FEATURES_CYTOSOL [FEATURES_CYTOSOL ...]]
                         [--objects OBJECTS]
                         [--stack-images [STACK_IMAGES ...]]
                         [--label-filter LABEL_FILTER]
                         [--stack-image-pattern STACK_IMAGE_PATTERN]
                         [--nuclei-min-area NUCLEI_MIN_AREA]
                         [--nuclei-max-area NUCLEI_MAX_AREA]
                         [--cell-min-area CELL_MIN_AREA]
                         [--cell-max-area CELL_MAX_AREA]
                         [--cytosol-min-area CYTOSOL_MIN_AREA]
                         [--cytosol-max-area CYTOSOL_MAX_AREA]
                         [--channel-rename CHANNEL_RENAME]
                         [--features-plot [FEATURES_PLOT ...]]
                         [-g [GROUPBY ...]] [-s [SUBSET ...]] [--force]
                         [--client CLIENT] [--dask-cluster DASK_CLUSTER]
                         [--verbose] [--no-version]

required arguments

-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Path to the output directory where the results will be saved.

--labels

Path to zarr directory containing labels

optional arguments

--image-pattern

Pattern to extract metadata from file names.

--features-nuclei

A space-separated list of features to extract (e.g., ‘area intensity_0 corr_0_1’). Channels are 0-indexed. Use shortcuts for efficiency:

  • For specific channels: ‘intensity_0,1,2’

  • For all channels (wildcard): ‘intensity_*’

  • For all channel pairs: ‘colocalization_*_*’

--features-cell

A space-separated list of features to extract (e.g., ‘area intensity_0 corr_0_1’). Channels are 0-indexed. Use shortcuts for efficiency:

  • For specific channels: ‘intensity_0,1,2’

  • For all channels (wildcard): ‘intensity_*’

  • For all channel pairs: ‘colocalization_*_*’

--features-cytosol

A space-separated list of features to extract (e.g., ‘area intensity_0 corr_0_1’). Channels are 0-indexed. Use shortcuts for efficiency:

  • For specific channels: ‘intensity_0,1,2’

  • For all channels (wildcard): ‘intensity_*’

  • For all channel pairs: ‘colocalization_*_*’

--objects

Path to directory containing output from find-objects

--stack-images

Path to additional images to stack with images. Add s prefix to refer to stack image channel index (e.g. corr_0_s0).

--label-filter

Path to Parquet containing labels to include.

--stack-image-pattern

Format string to extract metadata from the image file name.

--nuclei-min-area

Remove nuclei with area < nuclei-area

Default: 2

--nuclei-max-area

Remove nuclei with area > nuclei-area

--cell-min-area

Remove cell with area < cell-area

Default: 2

--cell-max-area

Remove cells with area > cell-area

--cytosol-min-area

Remove cytosolic labels with area < cytosol-area

Default: 2

--cytosol-max-area

Remove cytosolic labels with area > cytosol-area

--channel-rename

Inline JSON mapping channel index (0-based) to channel name for feature readability. Example ‘{“0”:”A”, “2”:”B”}’

--features-plot

Optional feature names to plot

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

Available Features

CellProfiler Features

  • intensity

    Measures several intensity features for identified objects. Parameters:

    1. c: Channel index.

  • granularity

    Outputs spectra of size measurements of the textures in the image. Parameters:

    1. c: Channel index.

  • intensity-distribution

    Measures radial intensity features for identified objects. Parameters:

    1. c: Channel index.

    2. bins: Number of bins to measure the distribution (default 4)

  • intensity-distribution-zernike

    Measures zernike intensity features for identified objects. Parameters:

    1. c: Channel index.

    2. moment: Maximum zernike moment (default 9)

  • haralick

    Measures the degree and nature of textures within objects to quantify their roughness and smoothness. Parameters:

    1. c: Channel index.

    2. scale: Number of pixels included in gray-level co-occurence matrix (Default 3).

  • sizeshape

    Measures several area and shape features of identified objects.

  • neighbors

    Calculates how many neighbors each object has and records various properties about the neighbors’ relationships, including the percentage of an object’s edge pixels that touch a neighbor.

  • colocalization

    Measures the colocalization and correlation between intensities in different channels on a pixel-by-pixel basis within identified objects. Parameters:

    1. c1: First channel index.

    2. c2: Second channel index.

Other Features

  • pftas

    Parameter-free threshold adjacency statistics. Outputs 54 features. Reference: Fast automated cell phenotype image classification Parameters:

    1. c: Channel index.

  • correlation-pearson-box

    Pearson correlation coefficient between two channels in the label bounding box. Typically used to measure nuclei alignment quality of ISS and phenotype images. Parameters:

    1. c1: First channel index.

    2. c2: Second channel index.

  • intersects-boundary

    Determines whether a label intersects a stitch boundary. Parameters:

    1. c: Channel index.

  • spots

    Counts the number of spots in a FISH image. Parameters:

    1. c: Channel index.

    2. min peak_distance: Minimum number of pixels separating peaks (default 3).

    3. radius: Radius of the disk footprint used for non-maximum suppression in peak_local_max (default 3).

Shortcuts

Use * for all channels. Example: intensity_*, colocalization_*_*.

Include a comma separated list of channel indices (0-based) to include. Example: intensity_0,1,2,6.

Specify a range of channel indices using start:stop:step. Example: colocalization_0_1:10:2.

Notes

Feature names are case insensitive (intensity == Intensity) and hyphens in feature names are ignored (intensitydistribution == intensity-distribution)

scallops illum-corr

The scallops illum-corr command is used for performing illumination correction on images, a crucial preprocessing step in biomedical image analysis. Uneven illumination can introduce artifacts that affect the accuracy of downstream analysis tasks like segmentation and feature extraction. SCALLOPS provides an aggregation method for illumination correction with two aggregators: Median and mean.

Key Features:

  • This method computes illumination correction by aggregating images using mean or median, followed by an optional median filter and rescaling. It offers a simple and effective approach for addressing illumination variations. The output can be saved as Zarr or TIFF images.

  • This method is designed to improve image uniformity, thereby enhancing the reliability of image analysis workflows, particularly in the context of high-throughput biomedical imaging data.

Calculate illumination correction by aggregating images by mean, median or min, followed by median filter and rescaling. Outputs flat-field TIFF or Zarr image.

usage: scallops illum-corr agg [-h] -i IMAGES [IMAGES ...] -o OUTPUT
                               [-g [GROUPBY ...]] [-s [SUBSET ...]]
                               [--image-pattern IMAGE_PATTERN]
                               [--smooth SMOOTH]
                               [--agg-method {mean,median,min}] [--no-rescale]
                               [--output-image-format {tiff,zarr}]
                               [--z-index Z_INDEX] [--channel CHANNEL]
                               [--force] [--verbose]
                               [--expected-images EXPECTED_IMAGES]
                               [--no-version] [--client CLIENT]
                               [--dask-cluster DASK_CLUSTER]

required arguments

-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Path to output Zarr image or TIFF directory

optional arguments

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

--image-pattern

Pattern to extract metadata from file names.

--smooth

The radius of the disk-shaped footprint for median filter. Default is sqrt((image_width * image_height) / (PI * 20)

--agg-method

Possible choices: mean, median, min

Method to aggregate images

Default: 'mean'

--no-rescale

Do not use 2nd percentile for robust minimum

Default: False

--output-image-format

Possible choices: tiff, zarr

Output image format

Default: 'tiff'

--z-index

Either max, focus, z-index (0-based), or a path to a Parquet file containing columns key and z_index. Focus selects the best z-index using the slope of the image log-log power spectrum.

Default: 'max'

--channel

Channel index (0-based) to select best focus z index

Default: 0

--force

Overwrite existing output

Default: False

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--expected-images

Validate that the specified number of images are provided.

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

scallops pooled-sbs

The scallops pooled-sbs command is designed for processing images from pooled in-situ sequencing (SBS) experiments. This pipeline includes spot detection, read calling, and merging of single-cell sequencing (SCS) data with phenotype data.

Key Features:

  • Spot Detection: The spot detection subcommand identifies candidate peaks in the image data, which correspond to sequencing spots. The results can be saved in multiple formats, including the raw data, filtered images, and detected peaks.

  • Reads Processing: The reads subcommand processes the detected spots to assign sequencing reads to specific labels, such as nuclei or cells. It also includes options for crosstalk correction between channels and outputs corrected and uncorrected base intensities.

  • Merging Data: The merge subcommand joins in-situ barcodes with phenotype data, allowing for a combined view of sequencing and phenotype information.

SBS image processing pipeline.

usage: scallops pooled-sbs [-h] {spot-detect,reads,merge} ...

Sub-commands

spot-detect

Run pooled in-situ sequencing spot detection. Outputs table of all candidate peaks and max, the input images with LoG and maximum filters applied. Optionally also outputs the std image, which contains the standard deviation over cycles, followed by the mean across channels to identify spot locations and the LoG image, which contains the LoG filtered image.

scallops pooled-sbs spot-detect [-h] -i IMAGES [IMAGES ...] -o OUTPUT -c
                                CHANNELS [CHANNELS ...]
                                [--image-pattern IMAGE_PATTERN]
                                [--max-filter-width MAX_FILTER_WIDTH]
                                [--sigma-log SIGMA_LOG [SIGMA_LOG ...]]
                                [--peak-neighborhood-size PEAK_NEIGHBORHOOD_SIZE]
                                [--cycles CYCLES [CYCLES ...]]
                                [--chunks CHUNKS] [--z-index Z_INDEX]
                                [-g [GROUPBY ...]] [-s [SUBSET ...]] [--force]
                                [--client CLIENT]
                                [--dask-cluster DASK_CLUSTER]
                                [--save [{log,std} ...]]
                                [--expected-cycles EXPECTED_CYCLES]
                                [--verbose] [--no-version]
required arguments
-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Path to output Zarr containing peaks, max, and optionally std and log

-c, --channel

Channel indices (0-based) to use for spot detection

optional arguments
--image-pattern

Pattern to extract metadata from file names.

--max-filter-width

Neighborhood size for max filtering on Laplacian-of-Gaussian filtered SBS data, dilating sequencing channels to compensate for single-pixel alignment error

Default: 3

--sigma-log

Size of gaussian kernel used in Laplacian-of-Gaussian filter

Default: 1

--peak-neighborhood-size

Neighborhood size for peak detection

Default: 5

--cycles

Optional subset of cycle indices (0-based) to include.

--chunks

Chunk size to use to perform parallel spot detection. If not specified, image chunk size is used

--z-index

Either max or a z-index (0-based)

Default: max

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--save

Possible choices: log, std

Additional outputs to save

--expected-cycles

Validate that the specified number of cycles are provided.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

reads

Run pooled in-situ sequencing read calling. Outputs reads, barcodes to labels assignments, crosstalk matrix, and table with corrected and uncorrected base intensities.

scallops pooled-sbs reads [-h] --spots SPOTS --labels LABELS --label-name
                          LABEL_NAME --barcodes BARCODES -o OUTPUT
                          [--read-quality-filter READ_QUALITY_FILTER]
                          [--min-area MIN_AREA] [--max-area MAX_AREA]
                          [--mismatches N_MISMATCHES]
                          [--expand-labels-distance EXPAND_LABELS_DISTANCE]
                          [--threshold-peaks THRESHOLD_PEAKS]
                          [--threshold-peaks-crosstalk THRESHOLD_PEAKS_CROSSTALK]
                          [--crosstalk-correction-method {li_and_speed,median,none}]
                          [--crosstalk-correction-by-t]
                          [--crosstalk-nreads CROSSTALK_NREADS] [--all-labels]
                          [-s [SUBSET ...]] [--bases BASES]
                          [--barcode-col BARCODE_COL] [--save-bases] [--force]
                          [--verbose] [--no-version] [--client CLIENT]
                          [--dask-cluster DASK_CLUSTER]
required arguments
--spots

Zarr output from scallops pooled-sbs spot-detect

--labels

Zarr output from scallops segment containing labels

--label-name

Name of labels to use. For example nuclei or cell

--barcodes

Path to the barcode CSV file containing a column named ‘barcode’.

-o, --output

Path to the output directory where the results will be saved.

optional arguments
--read-quality-filter

Filter reads before assigning reads to labels

--min-area

Filter labels with area < min-area

--max-area

Filter labels with area > max-area

--mismatches

Correct reads <= mismatches from closest match in barcodes

--expand-labels-distance

Expand labels by expand-labels-distance when matching reads to labels.

--threshold-peaks

Filter reads before assigning reads to labels. Use auto to automatically determine threshold.

Default: 'auto'

--threshold-peaks-crosstalk

Threshold for peaks for identifying sequencing reads used in crosstalk correction. Use auto to automatically determine threshold.

Default: 'auto'

--crosstalk-correction-method

Possible choices: li_and_speed, median, none

Method to correct channel crosstalk

Default: 'median'

--crosstalk-correction-by-t

Correct crosstalk separately for each cycle

Default: False

--crosstalk-nreads

Number of reads to sample to compute crosstalk correction. Use -1to include all reads.

Default: 500000

--all-labels

Call reads both in and outside labels.

Default: False

-s, --subset

Subset of images to include.

--bases

ISS bases

Default: 'GTAC'

--barcode-col

Barcode column in barcodes CSV

Default: 'barcode'

--save-bases

Save individual base intensities

Default: False

--force

Overwrite existing output

Default: False

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

merge

Join in-situ barcodes with phenotype data and output as Parquet.

scallops pooled-sbs merge [-h] --sbs SBS --phenotype PHENOTYPE [PHENOTYPE ...]
                          [--join-sbs {inner,outer}]
                          [--join-phenotype {inner,outer}]
                          [--phenotype-suffix [PHENOTYPE_SUFFIX ...]]
                          [--format {parquet,zarr}] --barcodes BARCODES -o
                          OUTPUT [-s [SUBSET ...]] [--barcode-col BARCODE_COL]
                          [--force] [--client CLIENT]
                          [--dask-cluster DASK_CLUSTER] [--verbose]
                          [--no-version]
required arguments
--sbs

Directory containing SBS parquet files.

--phenotype

Directories with phenotype parquet files.

optional arguments
--join-sbs

Possible choices: inner, outer

SBS join type.

Default: 'outer'

--join-phenotype

Possible choices: inner, outer

Phenotype join type.

Default: 'outer'

--phenotype-suffix

Suffix for phenotype columns.

--format

Possible choices: parquet, zarr

Output file format.

Default: 'parquet'

--barcodes

Path to the barcode CSV file containing a column named ‘barcode’.

-o, --output

Path to the output directory where the results will be saved.

-s, --subset

Subset of images to include.

--barcode-col

Barcode column in barcodes CSV

Default: 'barcode'

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

scallops norm-features

The scallops norm-features command is used to normalize features.

usage: scallops rank-features [-h] -i INPUT [INPUT ...] [--output OUTPUT]
                              [--features [FEATURES ...]]
                              [--rank-method {welch_t,student_t,mannwhitney}]
                              [--label-filter LABEL_FILTER]
                              [--iqr-multiplier IQR_MULTIPLIER]
                              [--perturbation PERTURBATION] --reference
                              REFERENCE [--by [BY ...]]
                              [--min-labels MIN_LABELS] [--metadata METADATA]
                              [--join [JOIN ...]] [--client CLIENT]
                              [--dask-cluster DASK_CLUSTER] [--force]
                              [--no-version]

required arguments

-i, --input

Path to normalized file(s)

--output

Path to Parquet file containing ranked features.

optional arguments

--features

Features to include. If not specified, all features are used.

--rank-method

Possible choices: welch_t, student_t, mannwhitney

Method to rank features

Default: 'welch_t'

--label-filter

Expression to filter labels (e.g. barcode_Q_mean_0/barcode_Q_mean > 0.5)

--iqr-multiplier

Include values between Q25 - multiplier * IQR and Q75 - multiplier * IQR

--perturbation

Field name to group perturbations

Default: 'gene_symbol'

--reference

Reference value in perturbation to compare against.

--by

Stratify by groups when ranking.

--min-labels

Require at least min-labels to include perturbation

Default: 10

--metadata

Path to CVS or Parquet file containing metadata to join with merged data.

--join

Field(s) to join on

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--force

Overwrite existing output

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

scallops rank-features

The scallops rank-features command is used to compute significance from the output of scallops norm-features.

usage: scallops rank-features [-h] -i INPUT [INPUT ...] [--output OUTPUT]
                              [--features [FEATURES ...]]
                              [--rank-method {welch_t,student_t,mannwhitney}]
                              [--label-filter LABEL_FILTER]
                              [--iqr-multiplier IQR_MULTIPLIER]
                              [--perturbation PERTURBATION] --reference
                              REFERENCE [--by [BY ...]]
                              [--min-labels MIN_LABELS] [--metadata METADATA]
                              [--join [JOIN ...]] [--client CLIENT]
                              [--dask-cluster DASK_CLUSTER] [--force]
                              [--no-version]

required arguments

-i, --input

Path to normalized file(s)

--output

Path to Parquet file containing ranked features.

optional arguments

--features

Features to include. If not specified, all features are used.

--rank-method

Possible choices: welch_t, student_t, mannwhitney

Method to rank features

Default: 'welch_t'

--label-filter

Expression to filter labels (e.g. barcode_Q_mean_0/barcode_Q_mean > 0.5)

--iqr-multiplier

Include values between Q25 - multiplier * IQR and Q75 - multiplier * IQR

--perturbation

Field name to group perturbations

Default: 'gene_symbol'

--reference

Reference value in perturbation to compare against.

--by

Stratify by groups when ranking.

--min-labels

Require at least min-labels to include perturbation

Default: 10

--metadata

Path to CVS or Parquet file containing metadata to join with merged data.

--join

Field(s) to join on

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--force

Overwrite existing output

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

scallops registration

The scallops registration command provides functionality for performing image registration. This includes image alignment using ITK, cross-correlation-based registration, and applying precomputed transformations to images or labels.

Key Features:

  • ITK Registration: The elastix subcommand performs registration of moving images to fixed images or across timepoints using ITK. It supports the use of pre-configured ITK parameters and outputs transformed images and/or labels in Zarr format.

  • Cross-Correlation Registration: The cross-correlation subcommand registers images by aligning them within and across timepoints, using cross-correlation and specified channels for registration.

  • Transform Application: The transformix subcommand applies previously computed ITK transformations to images or labels, storing the results in a Zarr output directory.

Image registration

usage: scallops registration [-h] {elastix,transformix,cross-correlation} ...

Sub-commands

elastix

Register moving image to fixed image using ITK. If no fixed image is provided, registers moving image to a specified timepoint. Outputs are stored in Zarr format.

scallops registration elastix [-h] --moving MOVING [MOVING ...]
                              [--fixed FIXED [FIXED ...]]
                              [--moving-label [MOVING_LABEL ...]]
                              [--itk-parameters ITK_PARAMETERS [ITK_PARAMETERS ...]]
                              [--moving-image-pattern MOVING_IMAGE_PATTERN]
                              [--fixed-image-pattern FIXED_IMAGE_PATTERN]
                              [--moving-image-spacing MOVING_IMAGE_SPACING]
                              [--fixed-image-spacing FIXED_IMAGE_SPACING]
                              [--moving-channel MOVING_CHANNEL]
                              [--fixed-channel FIXED_CHANNEL]
                              [--transform-output TRANSFORM_OUTPUT_DIR]
                              [--label-output LABEL_OUTPUT_DIR]
                              [--moving-output MOVING_OUTPUT_DIR]
                              [--time TIME] [--unroll-channels]
                              [--sort SORT [SORT ...]] [--no-landmarks]
                              [--landmark-min-score LANDMARK_MIN_SCORE]
                              [--landmark-step-size LANDMARK_STEP_SIZE]
                              [--landmark-image-chunk-size LANDMARK_IMAGE_CHUNK_SIZE]
                              [--landmark-template-padding LANDMARK_TEMPLATE_PADDING [LANDMARK_TEMPLATE_PADDING ...]]
                              [--landmark-initialization {com,none} [{com,none} ...]]
                              [--landmark-com-min-quantile LANDMARK_COM_MIN_QUANTILE]
                              [--landmark-com-max-quantile LANDMARK_COM_MAX_QUANTILE]
                              [--landmark-min-count LANDMARK_MIN_COUNT]
                              [--output-aligned-channels-only]
                              [--itk-channels [ITK_CHANNELS ...]]
                              [--z-index Z_INDEX] [-g [GROUPBY ...]]
                              [-s [SUBSET ...]] [--force] [--client CLIENT]
                              [--dask-cluster DASK_CLUSTER] [--verbose]
                              [--no-version]
required arguments
--moving

Paths to directories containing nd2, tiff, zarr, or other Bio-Formats images

optional arguments
--fixed

Paths to directories containing nd2, tiff, zarr, or other Bio-Formats images

--moving-label

Path to Zarr directories containing labels to transform

--itk-parameters

Paths to files containing ITK parameters or predefined parameter maps

Default: ['affine', 'nl-100']

--moving-image-pattern

Format string to extract metadata from the moving image file name

--fixed-image-pattern

Format string to extract metadata from the fixed image file name

--moving-image-spacing

Physical size y, x if image metadata does not contain this information

--fixed-image-spacing

Physical size y, x if image metadata does not contain this information

--moving-channel

Moving channel index (0-based) to use for alignment

Default: 0

--fixed-channel

Fixed channel index (0-based) to use for alignment

Default: 0

--transform-output

Path to output directory for transformations

Default: 'transforms'

--label-output

Path to save transformed moving labels

--moving-output

Path to save transformed moving image

--time, -t

Time index (0-based) or value for alignment across timepoints

Default: '0'

--unroll-channels

Unroll channels (drop ‘t’ dimension) in output image

Default: False

--sort

Custom sort order. Example: 20231012_20x_6W_IF 20231010_20x_6W_FISH

--no-landmarks

Do not use landmarks to find corresponding regions between moving and fixed images to initialize the registration

Default: False

--landmark-min-score

Minimum score to include matching region for landmark estimation

Default: 0.6

--landmark-step-size

Grid step size for landmark estimation in physical units

Default: 1000

--landmark-image-chunk-size

Image chunk size in physical units

Default: 200

--landmark-template-padding

Template padding in physical units. Values are tried until landmark-min-count landmarks are found.

Default: [750, 1000, 1250, 2250]

--landmark-initialization

Possible choices: com, none

Initial alignment method for landmark estimation: com (center of mass) or none

Default: ['com', 'none']

--landmark-com-min-quantile

Include values >= specified quantile for center of mass computation.

Default: 0.25

--landmark-com-max-quantile

Include values <= specified quantile for center of mass computation.

Default: 0.75

--landmark-min-count

Ensure landmark-min-count landmarks are found.

Default: 100

--output-aligned-channels-only

Whether to output aligned channels only

Default: False

--itk-channels

Paths to files containing ITK parameters or predefined parameter maps for registering across channels

--z-index

Either max or a z-index (0-based)

Default: max

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

transformix

Transform moving image to fixed image using previously computed ITK transformations

scallops registration transformix [-h] --transform TRANSFORM_DIR --output
                                  OUTPUT --images IMAGES
                                  [--image-spacing IMAGE_SPACING]
                                  [--type {images,labels}] [--force]
                                  [--client CLIENT]
                                  [--dask-cluster DASK_CLUSTER]
required arguments
--transform

Path to directory containing transformations

--output

Path to output Zarr directory

--images

Path to Zarr directory to transform

optional arguments
--image-spacing

Physical size y, x if metadata does not contain this information

--type

Possible choices: images, labels

Whether to transform images or labels

Default: 'labels'

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

cross-correlation

Register image across and within cycles

scallops registration cross-correlation [-h] -i IMAGES [IMAGES ...] -o OUTPUT
                                        [--image-pattern IMAGE_PATTERN]
                                        [--across-t-channel ACROSS_T_CHANNEL]
                                        [--within-t-channel [WITHIN_T_CHANNEL ...]]
                                        [--within-t-filter-min REGISTRATION_FILTER_MIN]
                                        [--within-t-filter-max REGISTRATION_FILTER_MAX]
                                        [-g [GROUPBY ...]] [-s [SUBSET ...]]
                                        [--force] [--client CLIENT]
                                        [--dask-cluster DASK_CLUSTER]
required arguments
-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Zarr output directory

optional arguments
--image-pattern

Pattern to extract metadata from file names.

--across-t-channel

Channel index (0-based) to use to register across cycles

--within-t-channel

Channel indices (0-based) to use to register within cycles

--within-t-filter-min

Replace data outside of specified percentile range [p1, p2] with uniform noise when aligning within t

Default: 0

--within-t-filter-max

Replace data outside of specified percentile range [p1, p2] with uniform noise when aligning within t

Default: 90

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

Predefined Registration parameters

Scallops provides a set of predefined parameters for registration. Options that end in wsireg where adopted from WSIreg.

Available Options

  • rigid

    • Description: Rigid registration using 1 resolution and a small step size.

    • Transformations: Translation, Rotation

  • affine

    • Description: Affine registration using 1 resolution and a small step size.

    • Transformations: Translation, Rotation, Scaling, Shearing

  • nl-100

    • Description: Non-linear registration using B-splines using 1 resolution and a final grid spacing of 100 microns..

    • Transformations: Non-linear (B-spline)

  • rigid-wsireg

    • Description: Rigid registration using 10 resolutions.

    • Transformations: Translation, Rotation

  • affine-wsireg

    • Description: Affine registration using 10 resolutions.

    • Transformations: Translation, Rotation, Scaling, Shearing

  • similarity-wsireg

    • Description: Similarity registration using 10 resolutions.

    • Transformations: Translation, Rotation, Uniform Scaling

  • nl-wsireg

    • Description: Non-linear registration using B-splines using 10 resolutions and a final grid spacing of 100 microns.

    • Transformations: Non-linear (B-spline)

  • nl2-wsireg

    • Description: Non-linear registration using B-splines using 10 resolutions and a final grid spacing of 75 microns.

    • Transformations: Non-linear (B-spline)

  • nl3-wsireg

    • Description: Non-linear registration using B-splines using 1 resolution and a final grid spacing of 200 microns.

    • Transformations: Non-linear (B-spline)

  • fi_correction-wsireg

    • Description: Rigid registration using 4 resolutions.

    • Transformations: Translation, Rotation

These parameters use mutual information as the image similarity measure; advanced mean squares and advanced normalized correlation versions of these options are available with the suffixes ams and anc respectively. For example, rigid-anc.

Note that parameters can be composed in any manner. For example rigid affine nl-100.

In order to use custom registration parameters, pass a set of JSON files to the itk-parameters argument. Please refer to the Elastix manual for more information.

scallops segment

The scallops segment command provides a command-line interface (CLI) for performing nuclei and cell segmentation. It supports various segmentation algorithms and outputs segmented labels in Zarr format.

Key Features:

  • Nuclei Segmentation: The nuclei subcommand performs nuclei segmentation using methods such as Stardist and Cellpose, with optional filtering based on area.

  • Cell Segmentation: The cell subcommand performs cell segmentation using methods like Watershed, Cellpose, and Propagation. It also supports various cytoplasmic channels, thresholding, and post-segmentation filtering.

Segmentation

usage: scallops segment [-h] {nuclei,cell} ...

Sub-commands

nuclei

Nuclei segmentation. Outputs a Zarr image containing nuclei labels.

scallops segment nuclei [-h] -i IMAGES [IMAGES ...] -o OUTPUT
                        [--method {cellpose,stardist}]
                        [--image-pattern IMAGE_PATTERN] [-g [GROUPBY ...]]
                        [--dapi-channel DAPI_CHANNEL] [--min-area MIN_AREA]
                        [--max-area MAX_AREA] [--chunks CHUNKS]
                        [--chunk-overlap CHUNK_OVERLAP] [--z-index Z_INDEX]
                        [--no-version] [--stardist-clip]
                        [--stardist-pmin STARDIST_PMIN]
                        [--stardist-pmax STARDIST_PMAX] [-s [SUBSET ...]]
                        [--force] [--client CLIENT]
                        [--dask-cluster DASK_CLUSTER] [--verbose]
required arguments
-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Path to output zarr image directory

optional arguments
--method

Possible choices: cellpose, stardist

Nuclei segmentation algorithm

Default: 'stardist'

--image-pattern

Pattern to extract metadata from file names.

-g, --groupby

Keys to group images.

--dapi-channel

Channel index (0-based) where DAPI is found

Default: 0

--min-area

Filter labels with area < min-area

--max-area

Filter labels with area > -max-area

--chunks

Chunk size to use to perform segmentation in chunks

--chunk-overlap

Chunk size overlap to use to perform segmentation using overlapping chunks

--z-index

Either max or a z-index (0-based)

Default: max

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

--stardist-clip

Whether to clip normalized image values to between 0 and 1

Default: False

--stardist-pmin

Minimum percentile for image normalization. Default is 3.

--stardist-pmax

Maximum percentile for image normalization. Default is 99.8.

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

cell

Cell segmentation. Outputs a Zarr image containing cell labels.

scallops segment cell [-h] -i IMAGES [IMAGES ...] -o OUTPUT
                      [--nuclei-label NUCLEI_LABEL]
                      [--method {cellpose,propagation,watershed,watershed-intensity}]
                      [--threshold THRESHOLD]
                      [--threshold-correction-factor THRESHOLD_CORRECTION_FACTOR]
                      [--cyto-channel [CYTO_CHANNEL ...]]
                      [--image-pattern IMAGE_PATTERN] [-g [GROUPBY ...]]
                      [--dapi-channel DAPI_CHANNEL] [--min-area MIN_AREA]
                      [--max-area MAX_AREA] [--chunks CHUNKS]
                      [--chunk-overlap CHUNK_OVERLAP] [--z-index Z_INDEX]
                      [--no-version] [--nuclei-min-area NUCLEI_MIN_AREA]
                      [--nuclei-max-area NUCLEI_MAX_AREA] [--rolling-ball]
                      [--sigma CELL_SEGMENTATION_SIGMA]
                      [--closing-radius CLOSING_RADIUS]
                      [--time CELL_SEGMENTATION_T] [--shrink-nuclei]
                      [-s [SUBSET ...]] [--force] [--client CLIENT]
                      [--dask-cluster DASK_CLUSTER] [--verbose]
required arguments
-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Path to output zarr image directory

optional arguments
--nuclei-label

Path to zarr directory containing nuclei labels for watershed or propagation segmentation

--method

Possible choices: cellpose, propagation, watershed, watershed-intensity

Cell segmentation algorithm. Note that only watershed and propagation will output cells that match nuclei

Default: 'propagation'

--threshold

Threshold for watershed or propagation methods. Either Li, Otsu, Local, or manually determined value

Default: 'Li'

--threshold-correction-factor

Factor to adjust the computed threshold by if threshold is not a manually determined value

Default: 1

--cyto-channel

Channel index (0-based) to infer cell segmentation from. Default is all non-DAPI channels. If more than one channel specified, use minimum across time (cycles) then mean over channels, or if only one time point is present, use mean over channels.

--image-pattern

Pattern to extract metadata from file names.

-g, --groupby

Keys to group images.

--dapi-channel

Channel index (0-based) where DAPI is found

Default: 0

--min-area

Filter labels with area < min-area

--max-area

Filter labels with area > -max-area

--chunks

Chunk size to use to perform segmentation in chunks

--chunk-overlap

Chunk size overlap to use to perform segmentation using overlapping chunks

--z-index

Either max or a z-index (0-based)

Default: max

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

--nuclei-min-area

Filter nuclei labels with area < min-area

--nuclei-max-area

Filter nuclei labels with area > -max-area

--rolling-ball

Apply rolling ball subtraction to cell mask prior to computing threshold

Default: False

--sigma

Size of gaussian kernel used to smooth the cell mask prior to computing threshold

--closing-radius

Disk radius to use for binary closing cell labels post segmentation

--time

Time indices (0-based) to include when computing cell segmentation mask. Defaults to all time points.

--shrink-nuclei

Shrink nuclei prior to subtraction of nuclei from cells to identify the cytosol.

Default: False

-s, --subset

Subset of images to include.

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--verbose

Run in verbose mode. Useful for debugging.

Default: False

scallops stitch

The scallops stitch command provides a command-line interface (CLI) for performing stitching of microscopy images.

Key Features:

  • Performance: Utilize dask for parallel processing.

  • Cross-correlation: Use both phase and no normalization in cross-correlation computations, automatically choosing the one that gives the best result.

  • Stage Position Handling: Read stage positions directly from Bioformats-supported images, such as .nd2 files, or from a CSV file.

  • Comprehensive Output: Outputs stitched image in OME-ZARR format, stitched positions in Parquet format, PDF report, tile boundary mask, and tile source labels in OME-ZARR format.

  • Z-index: Option to specify specific Z index or perform maximum Z projection.

  • Blending: Enable or disable image blending during stitching. When not blending, use tile closest to well center in overlapping regions.

  • Crop: Crop image tiles to remove edge effects.

  • Radial Correction: Automatically determine K for radial distortion and apply radial correction.

  • Stitching Evaluation: Compute error in overlapping regions after stitching.

Stitch microscopy images

usage: scallops stitch [-h] -i IMAGES [IMAGES ...] --report-output
                       REPORT_OUTPUT [--image-output IMAGE_OUTPUT]
                       [--channel-name [CHANNEL_NAME ...]]
                       [--image-pattern IMAGE_PATTERN] [-g [GROUPBY ...]]
                       [-s [SUBSET ...]] [-c ALIGN_CHANNEL]
                       [--radial-correction-k RADIAL_CORRECTION_K]
                       [--stitch-alpha STITCH_ALPHA]
                       [--max-shift MAX_SHIFT [MAX_SHIFT ...]]
                       [--no-save-image] [--no-save-labels] [--no-evaluate]
                       [--ffp FFP] [--dfp DFP] [--blend {linear,none}]
                       [--output-channels [OUTPUT_CHANNELS ...]]
                       [--crop-y CROP_Y] [--crop-x CROP_X]
                       [--stage-positions STAGE_POSITIONS]
                       [--image-spacing IMAGE_SPACING]
                       [--min-overlap-fraction MIN_OVERLAP_FRACTION]
                       [--random-seed RANDOM_SEED]
                       [--cross-correlation-upsample CROSS_CORRELATION_UPSAMPLE]
                       [--rename RENAME] [--flip-y-axis {1,0}]
                       [--flip-x-axis {1,0}] [--swap-axes {1,0}]
                       [--z-index Z_INDEX] [--force] [--client CLIENT]
                       [--dask-cluster DASK_CLUSTER]
                       [--expected-images EXPECTED_IMAGES] [--no-version]

required arguments

-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

--report-output

Output directory for stitched positions and QC report.

optional arguments

--image-output

Output zarr directory for stitched images and masks.

--channel-name

Channel names to save in output image. If specified, must equal the number of channels in input tiles.

--image-pattern

Pattern to extract metadata from file names.

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

-c, --align-channel

Channel index (0-based) to use for alignment.

Default: 0

--radial-correction-k

K to correct for radial distortion. Use auto to automatically determine k and none to disable auto determination.

Default: 'auto'

--stitch-alpha

Significance level for alignment error quantification.

Default: 0.001

--max-shift

Maximum allowed per-tile shift in microns

Default: [50, 100, 150]

--no-save-image

Do not save stitched image.

Default: False

--no-save-labels

Do not save tile boundary label mask or tile source labels.

Default: False

--no-evaluate

Do not evaluate stitching quality.

Default: False

--ffp

Path for flat-field correction profile image.

--dfp

Path for dark-field correction profile image.

--blend

Possible choices: linear, none

Blending method for stitched images

Default: 'none'

--output-channels

Output channels to save in stitched image.

--crop-y

Crop tiles by crop pixels along y dimension when aligning tiles. Setautomatically when radial correction is enabled.

--crop-x

Crop tiles by crop pixels along x dimension when aligning tiles. Setautomatically when radial correction is enabled.

--stage-positions

Optional CSV or Parquet file containing stage positions. Use when image metadata is missing stage positions. Expected columns name, y, and x, where name is the full image path.

--image-spacing

Physical size y, x if image metadata does not contain this information

--min-overlap-fraction

Minimum tile overlap fraction to include edge in graph. Determined automatically if not provided.

--random-seed

Random seed for reproducibility.

Default: 239753

--cross-correlation-upsample

Upsampling factor for registration precision.

Default: 1

--rename

CSV file mapping old image IDs to new IDs for output file names.

--flip-y-axis

Possible choices: 1, 0

Whether to flip tile y axis. Determined automatically if not provided.

--flip-x-axis

Possible choices: 1, 0

Whether to flip tile x axis. Determined automatically if not provided.

--swap-axes

Possible choices: 1, 0

Whether to swap tile y and x axes. Determined automatically if not provided.

--z-index

Either max, focus, z-index (0-based), or a path to a Parquet file containing columns key and z_index. Focus selects the best z-index using the slope of the image log-log power spectrum.

Default: 'max'

--force

Overwrite existing output

Default: False

--client

URL of the Dask scheduler. Use ‘none’ to disable distributed execution.

Default: 'none'

--dask-cluster

JSON URL or inline JSON containing dask cluster parameters.

--expected-images

Validate that the specified number of images are provided.

--no-version

Do not store command line arguments and scallops version in output metadata.

Default: False

stitch-preview

The stitch-preview command provides a quick preview of stitched multi-tile microscopy images, allowing users to visualize the result before performing full stitching. It uses the stage positions for stitching and saves the resulting image.

Key Features:

  • Tile Positioning: Stitching uses stage positions and has options to display tile numbers and bounds.

  • Downsampling: Enables downsampling of the image resolution to improve performance and reduce memory requirements.

  • Channel Selection: Allows users to specify the channel to display from multi-channel images.

  • Log Transformation: Optionally apply log transformation to pixel intensities for better visualization of dim images.

Create a multi-tile image using image stage positions.

usage: scallops stitch-preview [-h] -i IMAGES [IMAGES ...] -o OUTPUT
                               [--image-pattern IMAGE_PATTERN]
                               [-g [GROUPBY ...]] [-s [SUBSET ...]] [-n] [-b]
                               [--no-tiles] [-c CHANNEL] [-d DOWNSAMPLE] [-l]
                               [--stage-positions STAGE_POSITIONS]
                               [--z-index Z_INDEX]

required arguments

-i, --images

Paths to input images or CVS/Parquet with image column containing full image path and additional columns containing metadata such as plate, well, t, c, or z. Note that image pattern is ignored when CSV/Parquet is used.

-o, --output

Output directory.

optional arguments

--image-pattern

Pattern to extract metadata from file names.

-g, --groupby

Keys to group images.

-s, --subset

Subset of images to include.

-n, --numbers

Display tile numbers.

Default: False

-b, --bounds

Display tile bounds.

Default: False

--no-tiles

Do not display image tiles.

Default: False

-c, --channel

Channel index (0-based) to display.

Default: 0

-d, --downsample

Downsample image resolution.

Default: 20

-l, --log

Log-transform pixel intensities to help visualize dim images.

Default: False

--stage-positions

Optional CSV file containing stage positions. Use when image metadata is missing stage positions. Expected columns name, y, and x, where name is the full image path.

--z-index

Either max, focus, z-index (0-based), or a path to a Parquet file containing columns key and z_index. Focus selects the best z-index using the slope of the image log-log power spectrum.

Default: 'max'

Outputs explained

Scallop’s command line has a series of default and optional outputs that will be described below.

In-Situ sequencing pipeline (pooled-sbs)

Let’s first touch a bit (for more info check the CLI documentation. Let’s say that we want to run in-situ sequencing pipeline from the command line using stardist (the default) as nuclei segmentation, followed by watershed cell segmentation with threshold defined by the Li’s method. Let’s assume that you would like to use the test files and that your current working directory is Scallop’s root directory. Then:

scallops pooled-sbs pipeline scallops/tests/data/experimentC/input  --barcodes scallops/tests/data/experimentC/barcodes.csv --pheno=scallops/tests/data/experimentC/10X_c0-DAPI-p65ab

Will generate the following directories:

bases       cells       combined    images.zarr phenos      reads

In bases you will find the dataframes in parquet format of the bases information extracted from your groups. Since by default we group by tile and well, and we only have one well and two tiles, you’ll find:

./bases
├── A1-102.parquet
└── A1-103.parquet

1 directory, 2 files

Both files contain the read id, cycle, channel, intensity, cell id, coordinates y and x, well and tile information:

y

x

read

peak

cell

t

c

intensity

corrected_intensity

well

tile

5 5

705 705

0 0

364.59483811395535 364.59483811395535

0 0

1 1

G T

538.2688477073593 2706.2792184281157

-48.44409230154324 3510.5365471687214

A1 A1

102 102

Likewise, in the reads directory, you’ll find the dataframes containing the reads info:

./reads
├── A1-102.parquet
└── A1-103.parquet

1 directory, 2 files

Containing the identified reads information such as quantiles and peaks:

y

x

read

peak

cell

barcode

Q_0

Q_1

Q_2

Q_3

Q_4

Q_5

Q_6

Q_7

Q_8

Q_min

well

tile

5 5

705 756

0 1

364.59483811395535 1162.3744344193526

0 0

TATTCTTCC AAGCCAATT

0.8225319160232156 1.0

0.213521539841818 0.8525545481276788

0.5315020019263625 0.41514790479643904

1.0 0.4095041269847144

0.1871177283289207 1.0

0.6452085822773499 0.5086887399634992

1.0 0.9443351500907629

0.008381143001541913 0.5966046145349035

0.1944106438921418 1.0

0.008381143001541913 0.4095041269847144

A1 A1

102 102

Then, the cells directory includes parquet files with cell information:

./cells
├── A1-102.parquet
└── A1-103.parquet

1 directory, 2 files

with the barcodes counts and their corresponding peaks and sequences:

peak

cell

cell_barcode_0

cell_barcode_count_0

cell_barcode_1

cell_barcode_count_1

barcode_count

well

tile

438.1286071891493 398.4562017293166

36 17

GACCAATGG CTTCGCACT

4 2

ACCGGTTTA

1.0 0.0

5 2

A1 A1

102 102

Finally, we have the combined directory with all the information combined:

./combined/
├── A1-102.parquet
└── A1-103.parquet

1 directory, 2 files

well

tile

cell

peak

cell_barcode_0

cell_barcode_count_0

cell_barcode_1

cell_barcode_count_1

barcode_count

cells_x

cells_y

cells_area

nuclei_max_1

nuclei_mean_1

nuclei_corr_0_1

nuclei_y

nuclei_median_1

nuclei_max_0

nuclei_area

nuclei_x

nuclei_median_0

nuclei_mean_0

sgRNA

gene_symbol

duplicate_prefix

sgRNA_1

gene_symbol_1

duplicate_prefix_1

A1 A1

102 102

17 19

398.4562017293166 147.69660807883213

CTTCGCACT GCTGCAGTC

2.0 1.0

CAAATCCCA

0.0 1.0

2.0 2.0

821.2 890.7610062893082

8.631578947368421 10.754716981132075

95.0 159.0

2819 2122

2280.957746478873 1663.0069444444443

0.87922644251275 0.7376070465901201

7.788732394366197 10.11111111111111

2278.0 1682.0

1646 1792

71.0 144.0

821.7605633802817 891.2569444444445

1237.0 1411.0

1239.338028169014 1363.8402777777778

CTTCGACACTGATGATCTGC GCTGCAAGTCTCCCACCGGA

ATXN3L SMAD1

False False

CAAATCCCCAACTCATCTCG

RNF24

False

I have purposely left the zarr directory for last.

Zarr output

Note that you can choose to generate tiff instead of zarr images (see documentation for more information) and also can control which images are saved.

We follow the OME-ZARR format, which according to Open microscopy: >OME-Zarr is an implementation of the OME-NGFF specification using the Zarr format. Arrays MUST be defined and stored in a hierarchical organization as defined by the version 2 of the Zarr specification . OME-NGFF metadata MUST be stored as attributes in the corresponding Zarr groups.

In short is a hierarchical way to storing images that is very amenable for cloud computing. Going back to our example above, our default images.zarr contains:

./images.zarr/
├── .zgroup
├── A1-102
│   ├── .zattrs
│   ├── .zgroup
│   ├── 0   └── labels
├── A1-102-phenotype
│   ├── .zattrs
│   ├── .zgroup
│   └── 0
├── A1-103
│   ├── .zattrs
│   ├── .zgroup
│   ├── 0   └── labels
└── A1-103-phenotype
    ├── .zattrs
    ├── .zgroup
    └── 0

Which contain the images and labels of each of the groupings. Notice that there are also metadata hidden files called .zattrs and .zgroup which contain metadata about each level group and attributes (i.e. the way to organize the inner structure). Let’s zoom in to only one of the groupings:

/Users/hleaploj/Playground/testzarr/default/images.zarr
├── .zgroup
├── A1-102
│   ├── .zattrs
│   ├── .zgroup
│   ├── 0      ├── .zarray
│      ├── 0         ├── 0            └── 0                ├── 0                   ├── 0                   ├── 1                   ├── 2                   └── 3                ├── 1                   ├── 0                   ├── 1                   ├── 2                   └── 3                ├── 2                   ├── 0                   ├── 1                   ├── 2                   └── 3                └── 3                    ├── 0                    ├── 1                    ├── 2                    └── 3         ├── 1            └── 0
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .
│          └── 2              └── 0                  ├── 0                     ├── 0                     ├── 1                     ├── 2                     └── 3                  ├── 1
.   .               .   .
.   .               .   .
.   .               .   .
│                  └── 3                      ├── 0                      ├── 1                      ├── 2                      └── 3   └── labels
│       ├── .zattrs
│       ├── .zgroup
│       ├── cell
│          ├── .zattrs
│          ├── .zgroup
│          └── 0              ├── .zarray
│              ├── 0                 ├── 0                 └── 1              ├── 1
.       .       .   .
.       .       .   .
.       .       .   .
│              └── 3                  ├── 0                  └── 1       ├── cytosol
│          ├── .zattrs
│          ├── .zgroup
│          └── 0              ├── .zarray
│              ├── 0                 ├── 0                 └── 1              ├── 1
.       .       .   .
.       .       .   .
.       .       .   .
│              └── 3                  ├── 0                  └── 1       ├── iss-spots
│          ├── .zattrs
│          ├── .zgroup
│          └── 0              ├── .zarray
│              ├── 0                 ├── 0                 ├── 1                 ├── 2                 └── 3              ├── 1
.       .       .   .
.       .       .   .
.       .       .   .
│              └── 3                  ├── 0                  ├── 1                  ├── 2                  └── 3       └── nuclei
│           ├── .zattrs
│           ├── .zgroup
│           └── 0               ├── .zarray
│               ├── 0                  ├── 0                  └── 1               ├── 1
.               .   .
.               .   .
.               .   .

│               └── 3                   ├── 0                   └── 1
└── A1-102-phenotype
   ├── .zattrs
    ├── .zgroup
    └── 0
        ├── .zarray
        ├── 0
           ├── 0
              ├── 0
              └── 1
           └── 1
               ├── 0
               └── 1
        └── 1
            ├── 0
               ├── 0
               └── 1
            └── 1
                ├── 0
                └── 1

Here we see an extra hidden file, .zarray, that informs where the actual image starts (as opposed to groups). In the above case we can see that the image of well A1 and tile 102, has a root called A1-102, with three groups (0, 1 and 2), which according to the official format: >Each multiscale level is stored as a separate Zarr array, which is a folder containing chunk files which compose the array. > The name of the array is arbitrary with the ordering defined by the “multiscales” metadata, but is often a sequence starting at 0.

Therein are all the chunked data: >Chunks are stored with the nested directory layout. All but the last chunk element are stored as directories. The terminal > chunk is a file. Together the directory and file names provide the “chunk coordinate” (t, c, z, y, x), where the maximum > coordinate will be dimension_size / chunk_size.

We then see the labels group. We store here all the segmentation information including nuclei, cells and cytosol. All these follow the OME-ZARR schema: >All labels will be listed in .zattrs. Each dimension of the label (t, c, z, y, x) should be either the same as the > corresponding dimension of the image, or 1 if that dimension of the label is irrelevant.

Storing intermediate outputs

The CLI, through the option --save by including the things to save (see CLI documentation): >–save Outputs to save. Choose from cell-labels,nuclei-labels,cytosol-labels,spot-labels,aligned,cell-mask,max,log,std,peaks, > phenotype-aligned,bases,reads,cells,phenotype,combined,crosstalk > Default: cell-labels,nuclei-labels,cytosol-labels,spot-labels,aligned,phenotype-aligned,bases,reads,cells,phenotype,combined

If you use them all, you’ll find (just showing first 3 levels):

./images.zarr
├── .zgroup
├── A1-102
│   ├── .zattrs
│   ├── .zgroup
│   ├── 0      ├── .zarray
│      ├── 0      ├── 1      └── 2   └── labels
│       ├── .zattrs
│       ├── .zgroup
│       ├── cell
│       ├── cytosol
│       ├── iss-spots
│       └── nuclei
├── A1-102-cell-mask
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       ├── 2       └── 3
├── A1-102-log
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       └── 2
├── A1-102-max
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       └── 2
├── A1-102-peaks
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       ├── 2       └── 3
├── A1-102-phenotype
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       └── 1
├── A1-102-std
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       ├── 2       └── 3
├── A1-103
│   ├── .zattrs
│   ├── .zgroup
│   ├── 0      ├── .zarray
│      ├── 0      ├── 1      └── 2   └── labels
│       ├── .zattrs
│       ├── .zgroup
│       ├── cell
│       ├── cytosol
│       ├── iss-spots
│       └── nuclei
├── A1-103-cell-mask
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       ├── 2       └── 3
├── A1-103-log
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       └── 2
├── A1-103-max
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       └── 2
├── A1-103-peaks
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       ├── 1       ├── 2       └── 3
├── A1-103-phenotype
│   ├── .zattrs
│   ├── .zgroup
│   └── 0       ├── .zarray
│       ├── 0       └── 1
└── A1-103-std
    ├── .zattrs
    ├── .zgroup
    └── 0
        ├── .zarray
        ├── 0
        ├── 1
        ├── 2
        └── 3

Illumination correction (illum-corr)

When using BaSiCPy to do illumination correction, the output will include a directory, model of models, with one subdirectory per channel:

model/
├── c0
│   ├── profiles.npy
│   └── settings.json
├── c1
│   ├── profiles.npy
│   └── settings.json
└── c2
    ├── profiles.npy
    └── settings.json

The profiles.npy files contain the models store in numpy binary, while the json files contain the settings for the correction.

Additionally, it will generate one or two tiff files with the flatfield and, optionally, the darkfield.

If the --plot-fit is used, a multipage pdf would be generated following the training of the model.

Dialout analysis (dialout)

Outputs: Reads per spacer_20mer per pool. Example:

sequence

count_T2-A06

count_fraction_T2-A06

count_T2-A04

count_fraction_T2-A04

count_T2-A05

count_fraction_T2-A05

ID

gene_id

gene_symbol

dialout

mismatches

closest_match

ATTCACAGTGCTGGTCCCAA GGAGTCCTCGGAGAGCAGGA TATGCTTGTAAACACCTTGG AAACTCCCTCATCCGCCCGA GTTGCCCTCGAGGTCAATGT

1138.0 1125.0 1067.0 496.0 496.0

0.0017697628704371 0.001749545895643 0.0016593470850232 0.0007713553459901 0.0007713553459901

592.0 477.0 607.0 257.0 378.0

0.0015467540373676 0.0012462866145682 0.0015859454403415 0.0006714793709518 0.0009876233549408

1035.0 849.0 857.0 354.0 500.0

0.001774706573273 0.0014557737977863 0.0014694913365169 0.0006070010888296 0.0008573461706633

ENSG00000170142_61 ENSG00000263001_74 ENSG00000165699_523 1 ENSG00000130725_52

ENSG00000170142 ENSG00000263001 ENSG00000165699 AAACTCCCTCAGCCGCCCGA ENSG00000130725

UBE2E1 GTF2I TSC1

UBE2M

5.0 5.0 5.0

5.0

0 0 0

0

Summarized stats per pool. Example:

index

n_mapped

fraction_mapped

average_read_count

skew_ratio

drop_out_ratio

n_drop_outs

T2-A06 T2-A04 T2-A05

643024 382737 583195

0.6599747925724300 0.66117268637843 0.6565736737818610

392.9778761061950 235.98227474150700 354.9631268436580

3.115920398009950 3.187183811129850 3.104017611447440

0.0 0.0014749262536873200 0.0

0 1 0