Output Layout¶

When running Pipeline (via the Python API or the CLI), slide2vec writes artifacts under the directory specified by output_dir.

Directory Structure¶

<output_dir>/
├── tile_embeddings/
│   ├── <sample_id>.pt
│   └── <sample_id>.meta.json
├── hierarchical_embeddings/       ← only when region_tile_multiple is set
│   ├── <sample_id>.pt
│   └── <sample_id>.meta.json
├── slide_embeddings/              ← only for slide-level models
│   ├── <sample_id>.pt
│   └── <sample_id>.meta.json
├── slide_latents/                 ← only when save_latents=True
│   └── <sample_id>.pt
├── patient_embeddings/            ← only for patient-level models
│   ├── <patient_id>.pt
│   └── <patient_id>.meta.json
├── tiles/
│   ├── <sample_id>.coordinates.npz
│   └── <sample_id>.coordinates.meta.json
├── preview/
│   ├── mask/                      ← only when save_mask_preview=True
│   │   └── <sample_id>.png
│   └── tiling/                    ← only when save_tiling_preview=True
│       └── <sample_id>.png
├── process_list.csv
└── config.yaml

Per-Annotation Namespacing¶

The layout above is the tissue-only (default) case. When annotation-aware sampling is enabled, each sampled class gets its own <class>/ subdirectory under every embedding directory, and the tiling artifacts are namespaced the same way:

<output_dir>/
├── tile_embeddings/
│   ├── tumor/<sample_id>.pt
│   └── stroma/<sample_id>.pt
├── slide_embeddings/
│   ├── tumor/<sample_id>.pt
│   └── stroma/<sample_id>.pt
├── tiles/
│   ├── tumor/<sample_id>.coordinates.npz
│   └── stroma/<sample_id>.coordinates.npz
└── preview/
    ├── mask/<sample_id>.png            ← one multi-label mask preview per slide
    └── tiling/
        ├── tumor/<sample_id>.png
        └── stroma/<sample_id>.png

The tissue class (and the merged output_mode) carry no class label and collapse to the flat root shown earlier — there is no tissue/ subdirectory. process_list.csv has one row per (sample_id, annotation) pair, each recording that class’s own feature_path.

Embedding Files¶

All .pt files can be loaded with torch.load():

import torch

tile_embeddings = torch.load("outputs/run/tile_embeddings/slide-1.pt")
# tile_embeddings: Tensor of shape (N, D)

slide_embedding = torch.load("outputs/run/slide_embeddings/slide-1.pt")
# slide_embedding: Tensor of shape (D,)

Shapes by artifact type:

Artifact	Tensor shape
`tile_embeddings`	`(N, D)` — N tiles, D feature dimensions
`hierarchical_embeddings`	`(R, T, D)` — R regions, T tiles per region, D feature dimensions
`slide_embeddings`	`(D,)`
`patient_embeddings`	`(D,)`

Dense tile grids are not currently written by Pipeline or the CLI. They are exposed through the low-level tile encoder API as encode_tiles_dense(...); see API Guide for usage.

Embedding Meta Files¶

Each .pt embedding file has a companion .meta.json with provenance and shape information. The exact fields depend on the artifact type.

tile_embeddings

{
   "sample_id": "slide-1",
   "artifact_type": "tile_embeddings",
   "backend": "cucim",
   "coordinates_meta_path": "<output_dir>/tiles/slide-1.coordinates.meta.json",
   "coordinates_npz_path": "<output_dir>/tiles/slide-1.coordinates.npz",
   "encoder_level": "tile",
   "encoder_name": "prost40m",
   "feature_dim": 384,
   "format": "pt",
   "image_path": "/data/slide-1.tif",
   "mask_path": "/data/mask-1.tif",
   "num_tiles": 166,
   "tile_size_lv0": 224,
}

hierarchical_embeddings

Same fields as tile_embeddings (except "artifact_type": "hierarchical_embeddings"), plus:

{
  ...
  "num_regions": 512,
  "tiles_per_region": 36
}

slide_embeddings

{
  "sample_id": "slide-1",
  "artifact_type": "slide_embeddings",
  "encoder_level": "slide",
  "encoder_name": "prism",
  "feature_dim": 1280,
  "format": "pt",
  "image_path": "/data/slide-1.tif",
}

patient_embeddings

{
  "patient_id": "patient-1",
  "artifact_type": "patient_embeddings",
  "encoder_name": "moozy",
  "encoder_level": "patient"
  "format": "pt",
  "feature_dim": 768,
  "num_slides": 2,
}

Coordinate Files¶

During tiling, slide2vec writes a pair of coordinate files for each slide under tiles/:

<sample_id>.coordinates.npz — numpy archive with tile coordinate arrays
<sample_id>.coordinates.meta.json — tiling provenance and parameters

Coordinate arrays

The .npz contains four arrays with tile coordinate and metadata information. All four arrays have length N (the number of tiles) and share the same ordering as the rows of the corresponding embedding tensor.

Array	dtype	Description
`x`	`int64`	Left edge of each tile in level-0 pixel coordinates
`y`	`int64`	Top edge of each tile in level-0 pixel coordinates
`tile_index`	`int32`	Sequential index of each tile
`tissue_fractions`	`float32`	Fraction of pixels classified as tissue in each tile

import numpy as np

data = np.load("outputs/run/tiles/slide-1.coordinates.npz")
x = data["x"]   # shape (N,) — level-0 x coordinates
y = data["y"]   # shape (N,) — level-0 y coordinates

Coordinate meta files

The sidecar coordinates.meta.json is a structured file produced by the tiling pipeline. It contains several sections:

{
  "provenance": {
    "sample_id": "slide-1",
    "image_path": "/data/slide-1.svs",
    "mask_path": "/data/mask-1.tif",
    "backend": "cucim",
    "requested_backend": "auto"
  },
  "slide": {
    "dimensions": [50000, 40000],
    "base_spacing_um": 0.25,
    "level_downsamples": [1.0, 2.0, 4.0, 8.0, 16.0]
  },
  "tiling": {
    "requested_tile_size_px": 224,
    "requested_spacing_um": 0.5,
    "effective_tile_size_px": 224,
    "effective_spacing_um": 0.503,
    "tile_size_lv0": 448,
    "n_tiles": 1024,
    ...
  },
  "segmentation": { ... },
  "filtering": { ... },
  "artifact": {
    "coordinate_space": "level_0",
    "tile_order": "row_major",
    ...
  }
}

These files can be reused across runs via read_coordinates_from to skip tiling when only the encoder changes.

Process List¶

process_list.csv tracks the status of every slide in the manifest:

sample_id,status,error
slide-1,done,
slide-2,done,
slide-3,failed,RuntimeError: slide file not found

Possible status values:

done — processed successfully
failed — an error occurred; details are in the error column
skipped — slide was already present in the output directory