API Guide

Reference for the Python API. See Getting Started for introductory examples.

slide2vec exposes two main workflows:

  • direct in-memory embedding with Model.embed_slide() / Model.embed_slides()

  • artifact generation with Pipeline.run()

EmbeddedSlide

Model.embed_slide() and Model.embed_slides() return EmbeddedSlide objects:

class slide2vec.EmbeddedSlide(*, sample_id, tile_embeddings, slide_embedding, x, y, tile_size_lv0, image_path, mask_path=None, annotation=None, num_tiles=None, mask_preview_path=None, tiling_preview_path=None, latents=None)

Bases: object

In-memory result of embedding a single slide.

sample_id: str

Unique slide identifier.

tile_embeddings: Any

Tile embeddings — torch.Tensor of shape (N, D).

slide_embedding: Any | None

Slide-level embedding — torch.Tensor of shape (D,) for slide-level encoders; None for tile-only encoders.

x: Any

x coordinate (pixels at level 0) of each tile’s top-left corner — array of shape (N,).

y: Any

y coordinate (pixels at level 0) of each tile’s top-left corner — array of shape (N,).

tile_size_lv0: int

Tile side length in pixels at level 0.

image_path: Path

Path to the source slide file.

mask_path: Path | None = None

Path to the tissue mask used for tiling, if any.

annotation: str | None = None

Annotation class this bag of tiles was sampled for. "tissue" for the default tissue-only path, "merged" for the union output mode, or the class name (e.g. "tumor") when annotation-aware sampling fans a slide out into one bag per class. See the annotation-aware sampling documentation.

num_tiles: int | None = None

Number of tiles extracted from the slide.

mask_preview_path: Path | None = None

Path to the mask preview image, if generated.

tiling_preview_path: Path | None = None

Path to the tiling preview image, if generated.

latents: Any | None = None

Encoder latent representations when available; None otherwise.

PreprocessingConfig

class slide2vec.PreprocessingConfig(*, backend='auto', requested_spacing_um=None, requested_tile_size_px=None, requested_region_size_px=None, region_tile_multiple=None, tolerance=0.05, overlap=0.0, read_coordinates_from=None, read_tiles_from=None, on_the_fly=True, gpu_decode=False, adaptive_batching=False, use_supertiles=True, jpeg_backend='turbojpeg', num_cucim_workers=4, resume=False, segmentation=<factory>, filtering=<factory>, preview=<factory>, masks=<factory>, independent_sampling=True)

Bases: object

Configuration for slide tiling and preprocessing.

backend: str = 'auto'

Slide reading backend. "auto" tries cucim → openslide → vips in order. Explicit choices: "cucim", "openslide", "vips", "asap".

requested_spacing_um: float | None = None

Target spacing in µm/px. Resolved from the model preset when None.

requested_tile_size_px: int | None = None

Tile side length in pixels at requested_spacing_um. Resolved from the model preset when None.

requested_region_size_px: int | None = None

Parent region side length in pixels (hierarchical mode). Auto-derived as requested_tile_size_px × region_tile_multiple when None.

region_tile_multiple: int | None = None

Region grid width/height in tiles (e.g. 6 → 6×6 = 36 tiles per region). Enables hierarchical extraction when set; must be ≥ 2.

tolerance: float = 0.05

Relative spacing tolerance for pyramid level selection (default 0.05).

overlap: float = 0.0

Fractional tile overlap (0.0 = no overlap).

read_coordinates_from: Path | None = None

Directory containing pre-extracted tile coordinates to reuse, skipping tiling.

read_tiles_from: Path | None = None

Directory containing pre-extracted tile images to skip the tiling step entirely.

on_the_fly: bool = True

Read and decode tiles on demand rather than pre-loading into memory.

gpu_decode: bool = False

Decode tiles on the GPU via CuCIM / nvImageCodec when True.

adaptive_batching: bool = False

Dynamically adjust batch size based on tile count.

use_supertiles: bool = True

Group adjacent tiles into supertile batches for faster I/O.

jpeg_backend: str = 'turbojpeg'

JPEG decode library — "turbojpeg" (default) or "pillow".

num_cucim_workers: int = 4

Number of CuCIM reader threads.

resume: bool = False

Skip slides already present in the output directory when True.

segmentation: dict[str, Any]

method, downsample, sam2_device. See Preprocessing for details.

Type:

Forwarded to hs2p segmentation config. Supported keys

filtering: dict[str, Any]

Forwarded to hs2p tile-filtering config.

preview: dict[str, Any]

Controls whether hs2p writes mask and tiling preview images. Keys: save_mask_preview, save_tiling_preview, downsample.

masks: dict[str, Any]

Annotation-mask vocabulary forwarded to hs2p’s sampling resolver. Keys: output_mode, pixel_mapping, colors, min_coverage. A partial mapping is deep-merged over DEFAULT_MASKS, so callers only state what they override (e.g. {"min_coverage": {"tissue": 0.1}}). The default {background, tissue} block is plain tissue tiling; min_coverage.tissue is the single source of truth for the tissue threshold.

independent_sampling: bool = True

When annotation sampling is active, tile each class independently (True) vs jointly across classes (False).

For a full breakdown of backends, segmentation methods, and preview options, see Preprocessing.

ExecutionOptions

class slide2vec.ExecutionOptions(*, output_dir=None, output_format='pt', batch_size=32, num_workers_per_gpu=None, num_preprocessing_workers=None, num_gpus=None, precision=None, prefetch_factor=4, save_tile_embeddings=False, save_slide_embeddings=False, save_latents=False)

Bases: object

Runtime execution and output settings.

output_dir: Path | None = None

Directory where artifacts are written. Required for Pipeline runs.

output_format: str = 'pt'

Tensor serialization format — "pt" (PyTorch, default) or "npz" (NumPy).

batch_size: int = 32

Number of tiles per forward pass.

num_workers_per_gpu: int | None = None

DataLoader worker count per GPU rank. None means auto (capped by CPU / SLURM limit, then split across the resolved GPU count).

num_preprocessing_workers: int | None = None

Tiling worker count. None means auto (capped by CPU / SLURM limit).

num_gpus: int | None = None

Number of GPUs to use. None defaults to all available GPUs.

precision: str | None = None

Forward-pass dtype — "fp16", "bf16", "fp32", or None (auto-determined from the model preset).

prefetch_factor: int = 4

DataLoader prefetch queue depth per worker (default 4).

save_tile_embeddings: bool = False

Persist tile embeddings to disk when running a slide-level model.

save_slide_embeddings: bool = False

Persist slide embeddings to disk when running a patient-level model.

save_latents: bool = False

Persist encoder latent representations when available.

resolved_num_workers_per_gpu()
Return type:

int

Patient-level embedding

For patient-level models, use Model.embed_patient() for a single patient or Model.embed_patients() for a batch.

Single patient

from slide2vec import Model

model = Model.from_preset("moozy")
result = model.embed_patient(
    ["/data/slide_1a.svs", "/data/slide_1b.svs"],
    patient_id="patient_1",
)

print(result.patient_id)              # "patient_1"
print(result.patient_embedding.shape) # torch.Size([768])
print(result.slide_embeddings)        # {"slide_1a": tensor, "slide_1b": tensor}

Multiple patients

results = model.embed_patients(
    [
        {"sample_id": "slide_1a", "image_path": "/data/slide_1a.svs", "patient_id": "patient_1"},
        {"sample_id": "slide_1b", "image_path": "/data/slide_1b.svs", "patient_id": "patient_1"},
        {"sample_id": "slide_2a", "image_path": "/data/slide_2a.svs", "patient_id": "patient_2"},
    ]
)

for r in results:
    print(r.patient_id, r.patient_embedding.shape)

embed_patients(...) returns one EmbeddedPatient per unique patient, ordered by first appearance.

class slide2vec.EmbeddedPatient(*, patient_id, patient_embedding, slide_embeddings)

Bases: object

In-memory result of embedding a single patient.

patient_id: str

Unique patient identifier.

patient_embedding: Any

Aggregated patient embedding — torch.Tensor of shape (D,).

slide_embeddings: dict[str, Any]

Slide-level embeddings keyed by sample_id — each a torch.Tensor of shape (D,).

Hierarchical Feature Extraction

Enable hierarchical mode by setting region_tile_multiple in PreprocessingConfig:

preprocessing = PreprocessingConfig(
    requested_spacing_um=0.5,
    requested_tile_size_px=224,
    region_tile_multiple=6,   # 6×6 = 36 tiles per region
)

The tile embeddings tensor will have shape (R, T, D) instead of (N, D). See Hierarchical Features for the full explanation.

Dense Tile Feature Extraction

Some tile encoders can return the spatial grid of ViT patch-token features instead of a single pooled vector per tile. This is useful for dense downstream tasks where patch-token features must stay registered to the input tile.

Dense extraction is a low-level encoder API:

  • get_dense_transform() applies the encoder’s photometric normalization without resize or center-crop, so tile geometry is preserved.

  • encode_tiles_dense(batch) accepts a normalized (B, C, H, W) tensor and returns (B, d, h, w).

  • h and w are resolved from the input size and encoder patch size (for example, a 224 px tile with an 8 px patch size returns a 28 x 28 grid).

Example:

import torch
from PIL import Image

from slide2vec.encoders import encoder_registry

encoder = encoder_registry.require("lunit")().to("cuda")
transform = encoder.get_dense_transform()

tile = Image.open("/data/tile.png").convert("RGB")
batch = transform(tile).unsqueeze(0).to(encoder.device)

with torch.no_grad():
    dense = encoder.encode_tiles_dense(batch)

print(dense.shape)  # torch.Size([1, 384, 28, 28]) for a 224 px Lunit tile

The dense transform deliberately does not resize, crop, or pad. The input height and width passed to encode_tiles_dense must be divisible by the encoder patch size, unless the specific encoder is pinned to a native input size. Unsupported encoders raise NotImplementedError.

For H-Optimus encoders, non-native dense extraction requires opting into the variable-size model setting:

encoder = encoder_registry.require("h-optimus-0")(
    dynamic_img_size=True,
    allow_non_recommended_settings=True,
).to("cuda")

Dense Attention Map Extraction

Most ViT tile encoders can also return their frozen per-head prefix-token self-attention as a dense spatial grid. A frozen ViT’s CLS-token attention doubles as a per-pixel feature (Ramchandani et al., arXiv:2602.18747); this is the attention analog of encode_tiles_dense and reuses the same get_dense_transform() (normalization only, geometry preserved).

  • encode_tiles_attention(batch, *, blocks=(-1,), include_registers=False) accepts a normalized (B, C, H, W) tensor and returns (B, K, h, w).

  • K = len(blocks) * (1 + M·include_registers) * nh, where nh is the head count and M the model’s register-token count (0 for models without registers). Each channel is one prefix-token query row’s attention over the patch grid for one head — heads are never reduced.

  • Channels are stacked in the deterministic order [block][cls, reg…][head] (block outer, in the order requested; then CLS, then any register tokens; head innermost). The CLS block (the first nh channels of each block) does not depend on include_registers — registers only append channels.

  • blocks selects transformer blocks (negative indices count from the end); include_registers adds the register-token query rows (Darcet et al.) as extra channels for models that carry them (e.g. Hibou).

Example:

import torch
from PIL import Image

from slide2vec.encoders import encoder_registry

encoder = encoder_registry.require("lunit")().to("cuda")
transform = encoder.get_dense_transform()

tile = Image.open("/data/tile.png").convert("RGB")
batch = transform(tile).unsqueeze(0).to(encoder.device)

with torch.no_grad():
    attn = encoder.encode_tiles_attention(batch)  # last block, CLS only

print(attn.shape)  # (1, nh, 28, 28) for a 224 px Lunit tile

Each value is a softmax weight: a slice of one query row over the patch keys, so values are non-negative and a channel’s spatial sum is <= 1 (the prefix-token key columns carry the remaining mass). As with dense extraction, the input must be divisible by the encoder patch size, and unsupported encoders raise NotImplementedError.

Implementation note: timm ViTs run a fused SDPA kernel that never materializes the attention matrix, so it is recomputed from each block’s own projection (bit-equivalent to the weights the fused kernel applies). HuggingFace encoders read the weights via output_attentions=True, but modern transformers default to an SDPA implementation that silently ignores that flag (it warns and returns no attentions); extraction therefore temporarily switches the model to the eager attention implementation for the forward pass and restores the previous setting afterwards.

Pipeline

Use Pipeline for manifest-driven batch processing and disk outputs:

from slide2vec import ExecutionOptions, Model, Pipeline, PreprocessingConfig

model = Model.from_preset("virchow2")
pipeline = Pipeline(
    model=model,
    preprocessing=PreprocessingConfig(
        requested_spacing_um=0.5,
        requested_tile_size_px=224,
        masks={"min_coverage": {"tissue": 0.1}},
    ),
    execution=ExecutionOptions(output_dir="outputs/demo", num_gpus=2),
)

result = pipeline.run(manifest_path="/path/to/slides.csv")

See Input Manifest for the full manifest schema.

Pipeline.run(...) returns a RunResult:

class slide2vec.RunResult(*, tile_artifacts, hierarchical_artifacts, slide_artifacts, patient_artifacts=<factory>, process_list_path=None)

Bases: object

Return value of Pipeline.run().

tile_artifacts: list[TileEmbeddingArtifact]

Tile embedding artifacts written to disk.

hierarchical_artifacts: list[HierarchicalEmbeddingArtifact]

Hierarchical embedding artifacts; empty when hierarchical mode is disabled.

slide_artifacts: list[SlideEmbeddingArtifact]

Slide embedding artifacts written to disk.

patient_artifacts: list[PatientEmbeddingArtifact]

Patient embedding artifacts; empty when no patient-level model is used.

process_list_path: Path | None = None

Path to process_list.csv, which tracks processing status per sample.

See Output Layout for the full on-disk directory structure and file schemas.