Workflow Reference
Scallops provides two primary end-to-end pipelines written in WDL 1.0 (Workflow Description Language). These workflows are designed for scalability and reproducibility across various environments, including local machines, cloud infrastructure, and high-performance computing (HPC) clusters.
Stitching Workflow
File: stitch_workflow.wdl
This workflow performs illumination correction (flatfield estimation) followed by image stitching. It takes raw microscopy images (e.g., .nd2, .tiff) and converts them into OME-Zarr format.
Workflow Steps
Grouping: The workflow scans the input directories (urls) using the image_pattern. It groups images based on the groupby parameter (default: plate, well, timepoint).
Illumination Correction: (Optional) For each group, it calculates a flatfield image (mean or median projection). This step is parallelized across groups.
Stitching:
Applies the calculated flatfield to the raw tiles.
Corrects for radial distortion.
Aligns tiles using stage positions and cross-correlation.
Stitches tiles into an OME-Zarr image.
Inputs
Minimal Configuration (Required)
These are the absolute minimum parameters required to run the stitching workflow.
Parameter |
Type |
Description |
|---|---|---|
urls |
Array[String] |
List of directories containing raw images (e.g S3 URLs). |
image_pattern |
String |
Regex-like pattern to parse filenames (e.g., |
output_directory |
String |
Base path for outputs. |
docker |
String |
Workflow docker image. |
{
"urls": ["s3://your-bucket/experiment_data/"],
"image_pattern": "20231010_10x_6W_SBS_c{t}/plate{plate}/Well{well}_Point{skip}_{skip}_Channel{skip}_Seq{skip}.nd2",
"output_directory": "s3://your-bucket/experiment_data/stitch/iss/",
"docker":"772311241819.dkr.ecr.us-west-2.amazonaws.com/scallops:1.0.0"
}
Full Parameter Reference (Advanced)
Below is the complete list of exposed options, including optional settings for grouping, distortion correction, and resource allocation.
Parameter |
Type |
Description |
|---|---|---|
groupby |
Array[String] |
Metadata keys to group tiles by. Default: |
subset |
Array[String] |
Filter to process only specific groups (e.g., |
z_index |
String |
Specific Z-plane to stitch, or |
stitch_channel |
Int |
The reference channel index used for calculating stitching offsets. Default: |
stitch_radial_correction_k |
String |
Coefficient for barrel distortion correction. |
stitch_max_shift |
Float |
Maximum allowed shift between tiles. |
stitch_blend |
String |
Blending method for overlapping regions. |
stitch_crop |
Int |
Pixels to crop from edges before stitching. |
stitch_min_overlap_fraction |
Float |
Minimum overlap required between tiles. |
run_illumination_correction |
Boolean |
Default |
illumination_agg_method |
String |
Method for flatfield calculation. Default: |
expected_images |
Int |
Expected number of images per group (useful for QC). |
rename |
String |
Path to a 2-column CSV mapping image IDs to new IDs. |
force_stitch |
Boolean |
Force re-run of stitching even if output exists. |
Resources |
Various |
|
Outputs
The workflow generates the following directory structure in output_directory:
illumination_correction/: Contains calculated flatfield (and optionally darkfield) images in TIFF format.stitch/: Contains the stitched images in OME-Zarr format.
OPS Workflow
File: ops_workflow.wdl
The Optical Pooled Screens (OPS) workflow is a comprehensive pipeline that integrates Phenotypic imaging (IF) with In-Situ Sequencing (ISS).
Workflow Steps
- Phase 1: Phenotype Pre-processing
Registration: Aligns multiple phenotypic rounds (if applicable) to a reference timepoint (e.g., “IF”).
Segmentation: Segments Nuclei and Cells using the registered images.
Object Discovery: Creates labeled object maps for Nuclei, Cells, and Cytosol.
- Phase 2: ISS Pre-processing
Registration: Aligns the ISS anchor round (t0) to the rest of cycles to prepare the coordinate space.
- Phase 3: Integration & Analysis
Cross-Modality Registration: Aligns the Phenotype images to the ISS coordinate space.
Feature Extraction: Calculates morphological and intensity features for Nuclei, Cells, and Cytosol.
Sequencing Analysis: Detects spots in ISS channels and decodes the sequence (read calling).
Merge: Combines phenotypic features with decoded barcodes into a single dataset.
Inputs
Minimal Configuration (Required)
These are the absolute minimum parameters required to run the OPS workflow.
Parameter |
Type |
Description |
|---|---|---|
output_directory |
String |
Base path for outputs. |
iss_url |
String |
Path to stitched ISS Zarr (Required if running ISS analysis). |
phenotype_url |
String |
Path to stitched Phenotype Zarr (Required if running Phenotype analysis). |
phenotype_dapi_channel |
Int |
Channel index for DAPI in phenotype images. |
phenotype_cyto_channel |
Array[Int] |
Channel indices for Cytoplasm segmentation. |
reads_labels |
String |
Which segmentation label to assign reads to (e.g., |
docker |
String |
Workflow docker image. |
{
"output_directory": "s3://your-bucket/experiment/ops_results/",
"iss_url": "s3://your-bucket/experiment/stitch/iss/stitch/stitch.zarr/",
"phenotype_url": "s3://your-bucket/experiment/stitch/pheno/stitch/stitch.zarr/",
"phenotype_dapi_channel": 4,
"phenotype_cyto_channel": [6],
"reads_labels": "cell",
"docker":"772311241819.dkr.ecr.us-west-2.amazonaws.com/scallops:1.0.0"
}
Full Parameter Reference (Advanced)
Below is the complete list of exposed options covering registration, feature extraction, spot detection, and library configuration.
Data & Grouping
Parameter |
Type |
Description |
|---|---|---|
iss_image_pattern |
String |
Default: |
phenotype_image_pattern |
String |
Default: |
groupby |
Array[String] |
Default: |
subset |
Array[String] |
Filter specific wells/plates. |
Segmentation & Registration
Parameter |
Type |
Description |
|---|---|---|
reference_phenotype_time |
String |
Timepoint to use as reference (e.g., |
phenotype_dapi_channel_before_registration |
Int |
DAPI index before registration (for pheno-pheno alignment). |
iss_dapi_channel |
Int |
DAPI index in ISS images. |
nuclei_segmentation |
String |
Method (e.g., |
cell_segmentation_method |
String |
Method (e.g., |
cell_segmentation_extra_arguments |
String |
Extra flags (e.g., |
register_across_channels |
Boolean |
Enable cross-channel registration logic. |
Feature Extraction
Parameter |
Type |
Description |
|---|---|---|
phenotype_nuclei_features |
Array[String] |
List of features (e.g., |
phenotype_cell_features |
Array[String] |
List of features. |
phenotype_cytosol_features |
Array[String] |
List of features. |
features_cell_min_area |
Int |
Minimum area filter for cells. |
features_nuclei_min_area |
Int |
Minimum area filter for nuclei. |
Sequencing (ISS)
Parameter |
Type |
Description |
|---|---|---|
barcodes |
String |
Path to CSV containing the library design. |
barcode_column |
String |
Column name in the barcode CSV. |
iss_expected_cycles |
Int |
Number of sequencing cycles. |
iss_channels |
Array[Int] |
Channels to use for spot detection. Default: |
reads_bases |
String |
Bases order (e.g., |
spot_detection_sigma_log |
Array[Float] |
Sigma for Laplacian of Gaussian spot detection. |
Additional Parameters
Parameter |
Type |
Description |
|---|---|---|
model_dir |
String |
Path containing deep learning model resouces (See FAQ for more details.) |
run_<task> |
Boolean |
Set to |
force_<task> |
Boolean |
Set to |
Resources |
Various |
|
batch_size |
Int |
Number of groups to process in one batch. |
Outputs
The output_directory will contain subdirectories for every major step:
segment.zarr: Nuclei and Cell labels.pheno-to-iss-registered.zarr: Phenotype images transformed to align with ISS.features-nuclei-<index>/,features-cell-<index>/,features-cytosol-<index>/: Parquet files containing calculated features. The<index>refers to different splits of the data that had been run in parallel.spot-detect.zarr: Raw spot locations.reads/: Decoded reads per cell.merge/: Final Output. A merged Parquet dataset linking Cell IDs, Barcodes, and Phenotypic Features.
Running on AWS HealthOmics
AWS HealthOmics provides a managed service for running bioinformatics workflows at scale. Scallops workflows (WDL) are fully compatible with HealthOmics. We recommend using the miniwdl-omics-run tool to simplify the submission process.
Prerequisites
S3 Buckets: You must have S3 buckets for inputs (images) and outputs.
IAM Role: An IAM role with permissions to read/write to your S3 buckets and execution permissions for HealthOmics.
Docker Images: The Scallops Docker image must be in Amazon ECR (Elastic Container Registry).
Step 1: Configure Input JSON
Create a JSON file (e.g., ops_input.json) defining your inputs. Below is a minimal example for the OPS Workflow.
Note: Ensure all S3 paths end with a trailing slash / if they refer to directories.
{
"iss_url": "s3://your-bucket/experiment/ISS/stitch.zarr/",
"iss_image_pattern": "{plate}-{well}-{t}",
"phenotype_url": "s3://your-bucket/experiment/Pheno/stitch.zarr/",
"phenotype_image_pattern": "{plate}-{well}-{t}",
"subset": ["A-1", "A-2"],
"groupby": ["plate", "well"],
"output_directory": "s3://your-output-bucket/results/experiment_name/",
"reference_phenotype_time": "IF",
"phenotype_dapi_channel": 4,
"phenotype_cyto_channel": [6],
"phenotype_nuclei_features": ["intensity_*", "sizeshape", "colocalization_*_*", "spots_1,2,3"],
"phenotype_cell_features": ["intensity_*", "sizeshape", "colocalization_*_*", "spots_1,2,3"],
"phenotype_cytosol_features": ["intensity_*", "sizeshape", "colocalization_*_*", "spots_1,2,3"],
"barcodes": "s3://your-bucket/library/barcodes.csv",
"barcode_column": "opsBarcode",
"reads_labels": "cell",
"iss_expected_cycles": 7,
"reads_bases": "GTAC",
"segment_cell_threshold_correction_factor": 1.0,
"cell_segmentation_extra_arguments": "--closing-radius 5",
"docker": "123456789012.dkr.ecr.us-region-1.amazonaws.com/scallops:latest"
}
Step 2: Run with miniwdl-omics-run
Use the miniwdl-omics-run utility to submit the workflow. This tool zips your local WDL files, uploads them to S3, and triggers the HealthOmics run.
miniwdl-omics-run \
scallops/wdl/ops_workflow.wdl \
-i ops_input.json \
--role-arn arn:aws:iam::123456789012:role/YourHealthOmicsWorkflowRole \
--output-uri s3://your-output-bucket/omics-logs/ \
--name "OPS_Experiment_Run_01"
Arguments Explained:
Workflow File: Points to the local main WDL file (e.g.,
scallops/wdl/ops_workflow.wdl). It will automatically bundle dependencies likeops_tasks.wdl.-i: The input JSON file you created in Step 1.
–role-arn: The AWS IAM role ARN that HealthOmics assumes to access S3 and CloudWatch.
–output-uri: The S3 location where HealthOmics will store execution logs (different from the workflow output_directory).
–name: A custom name for the run to identify it in the AWS Console.
Customizing Workflows
Scallops’ WDL architecture is modular. Key computational steps (such as stitching, registration, and segmentation) are defined as independent Tasks in files like ops_tasks.wdl and stitch_tasks.wdl. This design allows you to construct your own custom workflows by importing these tasks, rather than relying solely on the pre-built end-to-end pipelines.
You can mix and match Scallops tasks with your own custom tasks (e.g., for QC or specific file conversions) to create tailored analysis solutions.
Example: Building a Custom Registration Workflow
Suppose you only need to perform image registration without the full segmentation or sequencing analysis. You can create a simple WDL file that imports the Scallops tasks and calls only the registration step.
Create a new WDL file (e.g.,
my_registration.wdl).Import the Scallops tasks file.
Define a workflow that calls the specific task.
version 1.0
# Import the existing Scallops tasks
import "scallops/wdl/ops_tasks.wdl" as tasks
workflow my_custom_registration {
input {
String moving_image
String fixed_image
String output_dir
String docker
}
# Call the existing Scallops registration task
call tasks.register_elastix {
input:
moving = [moving_image],
fixed = fixed_image,
transform_output_directory = output_dir + "/transforms",
moving_output_directory = output_dir + "/registered_images",
# Pass through required runtime parameters
docker = docker,
cpu = 4,
memory = "16 GiB",
# ... (other required inputs like zones, disks, etc.)
}
}
Modifying Existing Tasks
If the pre-built tasks do not perfectly fit your needs (e.g., you need to change the resource allocation or add a specific command-line flag not currently exposed), you can modify the task definitions directly:
Copy the relevant task file (e.g.,
ops_tasks.wdl) to your local directory.Edit the
runtimeblock to adjust memory/CPU, or thecommandblock to add new flags.Point your workflow to import your modified task file instead of the standard one.
# In your workflow file
import "my_modified_tasks.wdl" as tasks