Preprocessing

This page covers the full set of options available in PreprocessingConfig and how to configure them.

class slide2vec.PreprocessingConfig(*, backend='auto', requested_spacing_um=None, requested_tile_size_px=None, requested_region_size_px=None, region_tile_multiple=None, tolerance=0.05, overlap=0.0, tissue_threshold=0.01, read_coordinates_from=None, read_tiles_from=None, on_the_fly=True, gpu_decode=False, adaptive_batching=False, use_supertiles=True, jpeg_backend='turbojpeg', num_cucim_workers=4, resume=False, segmentation=<factory>, filtering=<factory>, preview=<factory>)

Bases: object

Configuration for slide tiling and preprocessing.

backend: str = 'auto'

Slide reading backend. "auto" tries cucim → openslide → vips in order. Explicit choices: "cucim", "openslide", "vips", "asap".

requested_spacing_um: float | None = None

Target spacing in µm/px. Resolved from the model preset when None.

requested_tile_size_px: int | None = None

Tile side length in pixels at requested_spacing_um. Resolved from the model preset when None.

requested_region_size_px: int | None = None

Parent region side length in pixels (hierarchical mode). Auto-derived as requested_tile_size_px × region_tile_multiple when None.

region_tile_multiple: int | None = None

Region grid width/height in tiles (e.g. 6 → 6×6 = 36 tiles per region). Enables hierarchical extraction when set; must be ≥ 2.

tolerance: float = 0.05

Relative spacing tolerance for pyramid level selection (default 0.05).

overlap: float = 0.0

Fractional tile overlap (0.0 = no overlap).

tissue_threshold: float = 0.01

Minimum tissue fraction required to keep a tile (default 0.01).

read_coordinates_from: Path | None = None

Directory containing pre-extracted tile coordinates to reuse, skipping tiling.

read_tiles_from: Path | None = None

Directory containing pre-extracted tile images to skip the tiling step entirely.

on_the_fly: bool = True

Read and decode tiles on demand rather than pre-loading into memory.

gpu_decode: bool = False

Decode tiles on the GPU via CuCIM / nvImageCodec when True.

adaptive_batching: bool = False

Dynamically adjust batch size based on tile count.

use_supertiles: bool = True

Group adjacent tiles into supertile batches for faster I/O.

jpeg_backend: str = 'turbojpeg'

JPEG decode library — "turbojpeg" (default) or "pillow".

num_cucim_workers: int = 4

Number of CuCIM reader threads.

resume: bool = False

Skip slides already present in the output directory when True.

segmentation: dict[str, Any]

method, downsample, sam2_device. See Preprocessing for details.

Type:

Forwarded to hs2p segmentation config. Supported keys

filtering: dict[str, Any]

Forwarded to hs2p tile-filtering config.

preview: dict[str, Any]

Controls whether hs2p writes mask and tiling preview images. Keys: save_mask_preview, save_tiling_preview, downsample.

Backends

The backend field controls which slide-reading library is used:

  • "auto" — tries cucim → openslide → vips in order and picks the first available one

  • "cucim" — NVIDIA cuCIM (fastest for SVS/TIFF on GPU-equipped machines)

  • "openslide" — broad format support, CPU-only

  • "vips" — libvips, good for large TIFF files

  • "asap" — ASAP reader (requires separate installation)

Tissue Segmentation

segmentation is forwarded directly to hs2p‘s segmentation pipeline. The method key selects the algorithm:

  • hsv - heuristic based on the HSV colour space. Fast and robust for H&E slides.

  • otsu - thresholds the saturation channel using Otsu’s method.

  • threshold - applies a fixed saturation threshold.

  • sam2 - runs the AtlasPatch SAM2 tissue segmentation model on an internal 8.0 µm/px thumbnail. Requires the atlaspatch package and a compatible GPU. Additional key: sam2_device — device string for SAM2 inference (e.g. "cuda:0" or "cpu").

Example:

from slide2vec import Model, PreprocessingConfig

model = Model.from_preset("virchow2")
preprocessing = PreprocessingConfig(
    segmentation={"method": "sam2", "sam2_device": "cuda"},
)
embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)

Or in a YAML config:

tiling:
  seg_params:
      method: "sam2"
      sam2_device: "cuda"

Preview Images

slide2vec can write a tissue mask preview and a tiling preview for each slide. These are particularly useful for quality control. Both are disabled by default. Enable them via the preview dict:

preprocessing = PreprocessingConfig(
    preview={
        "save_mask_preview": True,
        "save_tiling_preview": True,
        "downsample": 32,
    }
)

Preview images are written to <output_dir>/preview/mask/<sample_id>.png and <output_dir>/preview/tiling/<sample_id>.png. Their paths are also recorded in process_list.csv and on the returned EmbeddedSlide (mask_preview_path, tiling_preview_path).