Input Manifest

Both Pipeline and the CLI expect a csv manifest with the slides to process.

Schema

Column

Required

Notes

sample_id

yes

Unique identifier for the slide; used as the output file stem

image_path

yes

Absolute path to the slide file

mask_path

no

Path to a pre-computed binary tissue mask. When blank, slide2vec generates the mask on the fly using the configured segmentation method

spacing_at_level_0

no

Override for the slide’s native level-0 spacing (µm/px). When blank, slide2vec reads the spacing from the slide file’s metadata

patient_id

no

Required only for patient-level models (see below)

Example

sample_id,image_path,mask_path,spacing_at_level_0
slide-1,/data/slide-1.svs,/data/mask-1.png,0.25
slide-2,/data/slide-2.svs,,

mask_path and spacing_at_level_0 may be left blank for any row.

Patient-level manifest

When using a patient-level model (e.g. moozy), add a patient_id column to group slides that belong to the same patient:

sample_id,image_path,patient_id
slide-1a,/data/slide-1a.svs,patient-1
slide-1b,/data/slide-1b.svs,patient-1
slide-2a,/data/slide-2a.svs,patient-2

Slides sharing the same patient_id are aggregated into a single EmbeddedPatient by the model’s patient encoder. sample_id remains the unique slide identifier.

Per-slide embeddings

When running a patient-level model via Pipeline, the intermediate per-slide embeddings can be saved alongside the patient embeddings by setting save_slide_embeddings: true in config (or ExecutionOptions(save_slide_embeddings=True) in the Python API).
Slide embeddings are written to slide_embeddings/ in the output directory.