Model Zoo¶
To see all available presets:
from slide2vec import list_models
list_models() # all presets
list_models("tile") # tile-level only
list_models("slide") # slide-level only
Tile-level encoders¶
Preset |
Model |
Output dim |
Spacing (um) |
Notes |
|---|---|---|---|---|
|
384 |
|
Kang et al. (2023) |
|
|
384 |
|
Grisi et al. (2026) |
|
|
512 |
|
Lu et al. (2024) |
|
|
768 |
|
Filiot et al. (2023) |
|
|
768 |
|
Lu et al. (2024) |
|
|
768 |
|
Nechaev et al. (2024) |
|
|
768 / 1536 |
|
Filiot et al. (2024) |
|
|
1024 |
|
Filiot et al. (2024) |
|
|
1024 |
|
Nechaev et al. (2024) |
|
|
1024 |
|
Chen et al. (2024) |
|
|
1024 / 2048 |
|
Xiang et al. (2024) |
|
|
1280 / 2560 |
|
Vorontsov et al. (2024) |
|
|
1280 / 2560 |
|
Zimmermann et al. (2024) |
|
|
1536 |
|
Chen et al. (2024) |
|
|
1536 |
|
Xu et al. (2024) |
|
|
1536 |
|
Saillard et al. (2024) |
|
|
1536 |
|
Saillard et al. (2024) |
|
|
3072 |
|
Karasikov et al. (2025) |
Dense tile grids¶
Dense tile feature extraction is available on tile encoders that implement
encode_tiles_dense. It returns a spatial patch-token tensor (B, d, h, w)
instead of the pooled (B, D) tensor returned by encode_tiles.
The following built-in tile presets are covered by the dense encoder interface:
conch, conchv15, gigapath, h0-mini, h-optimus-0,
h-optimus-1, hibou-b, hibou-l, lunit, midnight, musk,
phikon, phikonv2, prost40m, uni, uni2, virchow, and
virchow2.
Notes:
Dense grids use patch-token dimensions. For encoders whose pooled output concatenates CLS and mean patch tokens,
dcan be smaller than the pooled output dimensionD.get_dense_transformpreserves geometry by applying normalization only. Resize, crop, padding, and sliding-window policy are the caller’s responsibility.muskdense extraction currently requires its native 384 x 384 input size.H-Optimus dense extraction at non-native input sizes requires
dynamic_img_size=Trueandallow_non_recommended_settings=Truewhen constructing the encoder.
Dense attention maps¶
Tile encoders that implement encode_tiles_attention return per-head
prefix-token self-attention as a spatial grid (B, K, h, w) — see the
“Dense Attention Map Extraction” section of API Guide for the channel-order
contract and knobs.
The following built-in tile presets are covered: conch, conchv15,
gigapath, h0-mini, h-optimus-0, h-optimus-1, hibou-b,
hibou-l, lunit, midnight, phikon, phikonv2, prost40m,
uni, uni2, virchow, and virchow2.
Notes:
muskis not covered: its BEiT3 backbone uses a non-timm attention module, so attention extraction raisesNotImplementedError(dense patch-token extraction is still available).hibou-b/hibou-lcarry register tokens; passinclude_registers=Trueto add their query rows as extra channels.conch/conchv15recover attention from their inner timm ViT trunk, the same trunk their dense extraction uses.
Slide-level encoders¶
Patient-level encoders¶
Patient-level encoders aggregate multiple slide embeddings for the same patient
into a single patient-level embedding. They require a patient_id column in
the input manifest csv (or patient_id keys in each slide dict when using
the Python API).
Preset |
Model |
Tile encoder |
Spacing (um) |
Output dim |
Notes |
|---|---|---|---|---|---|
|
|
|
768 |
Kotp et al. (2026) |
Custom registry-backed encoders¶
If you want to use a model that is not shipped with slide2vec, wrap it in
an encoder class and register it under a new preset name.
Where to put the file¶
The registry only sees a preset once the module containing
@register_encoder is imported. slide2vec auto-imports everything under
slide2vec/encoders/models/, so the simplest way to expose a custom encoder
to both the Python API and the CLI is:
Add your file as
slide2vec/encoders/models/my_tile_model.py.Add it to
slide2vec/encoders/models/__init__.py(both thefrom . import (...)block and__all__).Reinstall in editable mode if needed (
pip install -e .).
The preset name can then be used in YAML configs (model.name: my-tile-model),
Model.from_preset(...), and slide2vec.list_models().
Tile encoder example¶
import torch
from torch import Tensor
from slide2vec.encoders import TileEncoder
from slide2vec.encoders import register_encoder, resolve_requested_output_variant
@register_encoder(
"my-tile-model",
output_variants={"default": {"encode_dim": 768}},
default_output_variant="default",
input_size=224,
supported_spacing_um=0.5,
precision="fp16",
source="my-org/my-tile-model",
)
class MyTileModel(TileEncoder):
def __init__(self, *, output_variant: str | None = None):
self._output_variant = resolve_requested_output_variant(output_variant)
self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self._model = self._load_model().eval()
def _load_model(self):
...
def get_transform(self):
...
def encode_tiles(self, batch: Tensor) -> Tensor:
return self._model(batch)
@property
def encode_dim(self) -> int:
return 768
@property
def device(self) -> torch.device:
return self._device
def to(self, device: torch.device | str):
self._device = torch.device(device)
self._model = self._model.to(self._device)
return self
Once the module is imported, the preset is available through the existing API:
from slide2vec import Model
model = Model.from_preset("my-tile-model")
Slide encoder example¶
import torch
from torch import Tensor
from slide2vec.encoders import SlideEncoder
from slide2vec.encoders import register_encoder, resolve_requested_output_variant
@register_encoder(
"my-slide-model",
level="slide",
tile_encoder="my-tile-model",
tile_encoder_output_variant="default",
output_variants={"default": {"encode_dim": 512}},
default_output_variant="default",
supported_spacing_um=0.5,
precision="fp16",
source="my-org/my-slide-model",
)
class MySlideModel(SlideEncoder):
def __init__(self, *, output_variant: str | None = None):
self._output_variant = resolve_requested_output_variant(output_variant)
self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self._model = self._load_model().eval()
def _load_model(self):
...
@property
def encode_dim(self) -> int:
return 512
@property
def device(self) -> torch.device:
return self._device
def to(self, device: torch.device | str):
self._device = torch.device(device)
self._model = self._model.to(self._device)
return self
def encode_slide(
self,
tile_features: Tensor,
coordinates: Tensor | None = None,
*,
tile_size_lv0: int | None = None,
) -> Tensor:
return self._model(tile_features)
Multiple weights for the same architecture¶
Encoders are instantiated as encoder_cls(output_variant=...), so the
weights are tied to the registered class. To expose several checkpoints of
the same architecture (e.g. different pretraining stages), put the shared
logic in a base class and register one thin subclass per checkpoint. This
keeps “preset name → exact weights” as a stable invariant and avoids any
runtime configuration of paths.
The built-in phikon encoder
(slide2vec/encoders/models/phikon.py) follows this pattern:
class _PhikonBase(TileEncoder):
def __init__(self, model_name: str, *, output_variant: str | None = None):
self._model = AutoModel.from_pretrained(model_name).eval()
...
@register_encoder("phikon", ..., source="owkin/phikon")
class Phikon(_PhikonBase):
def __init__(self, *, output_variant: str | None = None):
super().__init__("owkin/phikon", output_variant=output_variant)
@register_encoder("phikonv2", ..., source="owkin/phikon-v2")
class PhikonV2(_PhikonBase):
def __init__(self, *, output_variant: str | None = None):
super().__init__("owkin/phikon-v2", output_variant=output_variant)
For local checkpoints, swap the HuggingFace identifier for a path (or any
loader you control) in each subclass. Each preset can then be selected
through the usual model.name field in YAML configs or
Model.from_preset(...).