Preprocessing¶
This page covers the full set of options available in PreprocessingConfig
and how to configure them.
- class slide2vec.PreprocessingConfig(*, backend='auto', requested_spacing_um=None, requested_tile_size_px=None, requested_region_size_px=None, region_tile_multiple=None, tolerance=0.05, overlap=0.0, tissue_threshold=0.01, read_coordinates_from=None, read_tiles_from=None, on_the_fly=True, gpu_decode=False, adaptive_batching=False, use_supertiles=True, jpeg_backend='turbojpeg', num_cucim_workers=4, resume=False, segmentation=<factory>, filtering=<factory>, preview=<factory>)¶
Bases:
objectConfiguration for slide tiling and preprocessing.
- backend: str = 'auto'¶
Slide reading backend.
"auto"tries cucim → openslide → vips in order. Explicit choices:"cucim","openslide","vips","asap".
- requested_spacing_um: float | None = None¶
Target spacing in µm/px. Resolved from the model preset when
None.
- requested_tile_size_px: int | None = None¶
Tile side length in pixels at requested_spacing_um. Resolved from the model preset when
None.
- requested_region_size_px: int | None = None¶
Parent region side length in pixels (hierarchical mode). Auto-derived as
requested_tile_size_px × region_tile_multiplewhenNone.
- region_tile_multiple: int | None = None¶
Region grid width/height in tiles (e.g.
6→ 6×6 = 36 tiles per region). Enables hierarchical extraction when set; must be ≥ 2.
- read_coordinates_from: Path | None = None¶
Directory containing pre-extracted tile coordinates to reuse, skipping tiling.
- read_tiles_from: Path | None = None¶
Directory containing pre-extracted tile images to skip the tiling step entirely.
- segmentation: dict[str, Any]¶
method,downsample,sam2_device. See Preprocessing for details.- Type:
Forwarded to hs2p segmentation config. Supported keys
Backends¶
The backend field controls which slide-reading library is used:
"auto"— tries cucim → openslide → vips in order and picks the first available one"cucim"— NVIDIA cuCIM (fastest for SVS/TIFF on GPU-equipped machines)"openslide"— broad format support, CPU-only"vips"— libvips, good for large TIFF files"asap"— ASAP reader (requires separate installation)
Tissue Segmentation¶
segmentation is forwarded directly to
hs2p‘s segmentation pipeline.
The method key selects the algorithm:
hsv- heuristic based on the HSV colour space. Fast and robust for H&E slides.otsu- thresholds the saturation channel using Otsu’s method.threshold- applies a fixed saturation threshold.sam2- runs the AtlasPatch SAM2 tissue segmentation model on an internal 8.0 µm/px thumbnail. Requires theatlaspatchpackage and a compatible GPU. Additional key:sam2_device— device string for SAM2 inference (e.g."cuda:0"or"cpu").
Example:
from slide2vec import Model, PreprocessingConfig
model = Model.from_preset("virchow2")
preprocessing = PreprocessingConfig(
segmentation={"method": "sam2", "sam2_device": "cuda"},
)
embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)
Or in a YAML config:
tiling:
seg_params:
method: "sam2"
sam2_device: "cuda"
Preview Images¶
slide2vec can write a tissue mask preview and a tiling preview for each slide.
These are particularly useful for quality control.
Both are disabled by default. Enable them via the preview dict:
preprocessing = PreprocessingConfig(
preview={
"save_mask_preview": True,
"save_tiling_preview": True,
"downsample": 32,
}
)
Preview images are written to <output_dir>/preview/mask/<sample_id>.png
and <output_dir>/preview/tiling/<sample_id>.png. Their paths are also
recorded in process_list.csv and on the returned
EmbeddedSlide (mask_preview_path,
tiling_preview_path).