API Guide¶
Reference for the Python API. See Getting Started for introductory examples.
slide2vec exposes two main workflows:
direct in-memory embedding with
Model.embed_slide()/Model.embed_slides()artifact generation with
Pipeline.run()
EmbeddedSlide¶
Model.embed_slide() and Model.embed_slides() return
EmbeddedSlide objects:
- class slide2vec.EmbeddedSlide(*, sample_id, tile_embeddings, slide_embedding, x, y, tile_size_lv0, image_path, mask_path=None, num_tiles=None, mask_preview_path=None, tiling_preview_path=None, latents=None)¶
Bases:
objectIn-memory result of embedding a single slide.
- tile_embeddings: Any¶
Tile embeddings —
torch.Tensorof shape(N, D).
- slide_embedding: Any | None¶
Slide-level embedding —
torch.Tensorof shape(D,)for slide-level encoders;Nonefor tile-only encoders.
PreprocessingConfig¶
- class slide2vec.PreprocessingConfig(*, backend='auto', requested_spacing_um=None, requested_tile_size_px=None, requested_region_size_px=None, region_tile_multiple=None, tolerance=0.05, overlap=0.0, tissue_threshold=0.01, read_coordinates_from=None, read_tiles_from=None, on_the_fly=True, gpu_decode=False, adaptive_batching=False, use_supertiles=True, jpeg_backend='turbojpeg', num_cucim_workers=4, resume=False, segmentation=<factory>, filtering=<factory>, preview=<factory>)
Bases:
objectConfiguration for slide tiling and preprocessing.
- backend: str = 'auto'
Slide reading backend.
"auto"tries cucim → openslide → vips in order. Explicit choices:"cucim","openslide","vips","asap".
- requested_spacing_um: float | None = None
Target spacing in µm/px. Resolved from the model preset when
None.
- requested_tile_size_px: int | None = None
Tile side length in pixels at requested_spacing_um. Resolved from the model preset when
None.
- requested_region_size_px: int | None = None
Parent region side length in pixels (hierarchical mode). Auto-derived as
requested_tile_size_px × region_tile_multiplewhenNone.
- region_tile_multiple: int | None = None
Region grid width/height in tiles (e.g.
6→ 6×6 = 36 tiles per region). Enables hierarchical extraction when set; must be ≥ 2.
- tolerance: float = 0.05
Relative spacing tolerance for pyramid level selection (default
0.05).
- overlap: float = 0.0
Fractional tile overlap (
0.0= no overlap).
- tissue_threshold: float = 0.01
Minimum tissue fraction required to keep a tile (default
0.01).
- read_coordinates_from: Path | None = None
Directory containing pre-extracted tile coordinates to reuse, skipping tiling.
- read_tiles_from: Path | None = None
Directory containing pre-extracted tile images to skip the tiling step entirely.
- on_the_fly: bool = True
Read and decode tiles on demand rather than pre-loading into memory.
- gpu_decode: bool = False
Decode tiles on the GPU via CuCIM / nvImageCodec when
True.
- adaptive_batching: bool = False
Dynamically adjust batch size based on tile count.
- use_supertiles: bool = True
Group adjacent tiles into supertile batches for faster I/O.
- jpeg_backend: str = 'turbojpeg'
JPEG decode library —
"turbojpeg"(default) or"pillow".
- num_cucim_workers: int = 4
Number of CuCIM reader threads.
- resume: bool = False
Skip slides already present in the output directory when
True.
- segmentation: dict[str, Any]
method,downsample,sam2_device. See Preprocessing for details.- Type:
Forwarded to hs2p segmentation config. Supported keys
For a full breakdown of backends, segmentation methods, and preview options, see Preprocessing.
ExecutionOptions¶
- class slide2vec.ExecutionOptions(*, output_dir=None, output_format='pt', batch_size=32, num_workers=None, num_preprocessing_workers=None, num_gpus=None, precision=None, prefetch_factor=4, persistent_workers=True, save_tile_embeddings=False, save_slide_embeddings=False, save_latents=False)¶
Bases:
objectRuntime execution and output settings.
- num_workers: int | None = None¶
DataLoader worker count.
Nonemeans auto (capped by CPU / SLURM limit).
- num_preprocessing_workers: int | None = None¶
Tiling worker count.
Nonemeans auto (capped by CPU / SLURM limit).
- precision: str | None = None¶
Forward-pass dtype —
"fp16","bf16","fp32", orNone(auto-determined from the model preset).
- save_tile_embeddings: bool = False¶
Persist tile embeddings to disk when running a slide-level model.
Patient-level embedding¶
For patient-level models, use Model.embed_patient() for a single patient
or Model.embed_patients() for a batch.
Single patient¶
from slide2vec import Model
model = Model.from_preset("moozy")
result = model.embed_patient(
["/data/slide_1a.svs", "/data/slide_1b.svs"],
patient_id="patient_1",
)
print(result.patient_id) # "patient_1"
print(result.patient_embedding.shape) # torch.Size([768])
print(result.slide_embeddings) # {"slide_1a": tensor, "slide_1b": tensor}
Multiple patients¶
results = model.embed_patients(
[
{"sample_id": "slide_1a", "image_path": "/data/slide_1a.svs", "patient_id": "patient_1"},
{"sample_id": "slide_1b", "image_path": "/data/slide_1b.svs", "patient_id": "patient_1"},
{"sample_id": "slide_2a", "image_path": "/data/slide_2a.svs", "patient_id": "patient_2"},
]
)
for r in results:
print(r.patient_id, r.patient_embedding.shape)
embed_patients(...) returns one EmbeddedPatient per unique patient,
ordered by first appearance.
- class slide2vec.EmbeddedPatient(*, patient_id, patient_embedding, slide_embeddings)¶
Bases:
objectIn-memory result of embedding a single patient.
- patient_embedding: Any¶
Aggregated patient embedding —
torch.Tensorof shape(D,).
- slide_embeddings: dict[str, Any]¶
Slide-level embeddings keyed by
sample_id— each atorch.Tensorof shape(D,).
Hierarchical Feature Extraction¶
Enable hierarchical mode by setting region_tile_multiple in
PreprocessingConfig:
preprocessing = PreprocessingConfig(
requested_spacing_um=0.5,
requested_tile_size_px=224,
region_tile_multiple=6, # 6×6 = 36 tiles per region
)
The tile embeddings tensor will have shape (R, T, D) instead of (N, D).
See Hierarchical Features for the full explanation.
Pipeline¶
Use Pipeline for manifest-driven batch processing and disk
outputs:
from slide2vec import ExecutionOptions, Model, Pipeline, PreprocessingConfig
model = Model.from_preset("virchow2")
pipeline = Pipeline(
model=model,
preprocessing=PreprocessingConfig(
requested_spacing_um=0.5,
requested_tile_size_px=224,
tissue_threshold=0.1,
),
execution=ExecutionOptions(output_dir="outputs/demo", num_gpus=2),
)
result = pipeline.run(manifest_path="/path/to/slides.csv")
See Input Manifest for the full manifest schema.
Pipeline.run(...) returns a RunResult:
- class slide2vec.RunResult(*, tile_artifacts, hierarchical_artifacts, slide_artifacts, patient_artifacts=<factory>, process_list_path=None)¶
Bases:
objectReturn value of
Pipeline.run().- hierarchical_artifacts: list[HierarchicalEmbeddingArtifact]¶
Hierarchical embedding artifacts; empty when hierarchical mode is disabled.
See Output Layout for the full on-disk directory structure and file schemas.