Output Layout¶
When running Pipeline (via the Python API or the CLI),
slide2vec writes artifacts under the directory specified by
output_dir.
Directory Structure¶
<output_dir>/
├── tile_embeddings/
│ ├── <sample_id>.pt
│ └── <sample_id>.meta.json
├── hierarchical_embeddings/ ← only when region_tile_multiple is set
│ ├── <sample_id>.pt
│ └── <sample_id>.meta.json
├── slide_embeddings/ ← only for slide-level models
│ ├── <sample_id>.pt
│ └── <sample_id>.meta.json
├── slide_latents/ ← only when save_latents=True
│ └── <sample_id>.pt
├── patient_embeddings/ ← only for patient-level models
│ ├── <patient_id>.pt
│ └── <patient_id>.meta.json
├── tiles/
│ ├── <sample_id>.coordinates.npz
│ └── <sample_id>.coordinates.meta.json
├── preview/
│ ├── mask/ ← only when save_mask_preview=True
│ │ └── <sample_id>.png
│ └── tiling/ ← only when save_tiling_preview=True
│ └── <sample_id>.png
├── process_list.csv
└── config.yaml
Embedding Files¶
All .pt files can be loaded with torch.load():
import torch
tile_embeddings = torch.load("outputs/run/tile_embeddings/slide-1.pt")
# tile_embeddings: Tensor of shape (N, D)
slide_embedding = torch.load("outputs/run/slide_embeddings/slide-1.pt")
# slide_embedding: Tensor of shape (D,)
Shapes by artifact type:
Artifact |
Tensor shape |
|---|---|
|
|
|
|
|
|
|
|
Embedding Meta Files¶
Each .pt embedding file has a companion .meta.json with provenance
and shape information. The exact fields depend on the artifact type.
tile_embeddings
{
"sample_id": "slide-1",
"artifact_type": "tile_embeddings",
"backend": "cucim",
"coordinates_meta_path": "<output_dir>/tiles/slide-1.coordinates.meta.json",
"coordinates_npz_path": "<output_dir>/tiles/slide-1.coordinates.npz",
"encoder_level": "tile",
"encoder_name": "prost40m",
"feature_dim": 384,
"format": "pt",
"image_path": "/data/slide-1.tif",
"mask_path": "/data/mask-1.tif",
"num_tiles": 166,
"tile_size_lv0": 224,
}
hierarchical_embeddings
Same fields as tile_embeddings (except "artifact_type": "hierarchical_embeddings"), plus:
{
...
"num_regions": 512,
"tiles_per_region": 36
}
slide_embeddings
{
"sample_id": "slide-1",
"artifact_type": "slide_embeddings",
"encoder_level": "slide",
"encoder_name": "prism",
"feature_dim": 1280,
"format": "pt",
"image_path": "/data/slide-1.tif",
}
patient_embeddings
{
"patient_id": "patient-1",
"artifact_type": "patient_embeddings",
"encoder_name": "moozy",
"encoder_level": "patient"
"format": "pt",
"feature_dim": 768,
"num_slides": 2,
}
Coordinate Files¶
During tiling, slide2vec writes a pair of coordinate files for each slide
under tiles/:
<sample_id>.coordinates.npz— numpy archive with tile coordinate arrays<sample_id>.coordinates.meta.json— tiling provenance and parameters
Coordinate arrays
The .npz contains four arrays with tile coordinate and metadata information.
All four arrays have length N (the number of tiles) and share the same ordering as the rows of the corresponding embedding tensor.
Array |
dtype |
Description |
|---|---|---|
|
|
Left edge of each tile in level-0 pixel coordinates |
|
|
Top edge of each tile in level-0 pixel coordinates |
|
|
Sequential index of each tile |
|
|
Fraction of pixels classified as tissue in each tile |
import numpy as np
data = np.load("outputs/run/tiles/slide-1.coordinates.npz")
x = data["x"] # shape (N,) — level-0 x coordinates
y = data["y"] # shape (N,) — level-0 y coordinates
Coordinate meta files
The sidecar coordinates.meta.json is a structured file produced by the
tiling pipeline. It contains several sections:
{
"provenance": {
"sample_id": "slide-1",
"image_path": "/data/slide-1.svs",
"mask_path": "/data/mask-1.tif",
"backend": "cucim",
"requested_backend": "auto"
},
"slide": {
"dimensions": [50000, 40000],
"base_spacing_um": 0.25,
"level_downsamples": [1.0, 2.0, 4.0, 8.0, 16.0]
},
"tiling": {
"requested_tile_size_px": 224,
"requested_spacing_um": 0.5,
"effective_tile_size_px": 224,
"effective_spacing_um": 0.503,
"tile_size_lv0": 448,
"n_tiles": 1024,
...
},
"segmentation": { ... },
"filtering": { ... },
"artifact": {
"coordinate_space": "level_0",
"tile_order": "row_major",
...
}
}
These files can be reused across runs via
read_coordinates_from to skip
tiling when only the encoder changes.
Process List¶
process_list.csv tracks the status of every slide in the manifest:
sample_id,status,error
slide-1,done,
slide-2,done,
slide-3,failed,RuntimeError: slide file not found
Possible status values:
done— processed successfullyfailed— an error occurred; details are in theerrorcolumnskipped— slide was already present in the output directory