Configuring the Model

Using the configurations.yml file

A configurations.yml file is an easy way to control model performance. The most important settings in this file to pay attention to are:

  1. The directory containing the scale images to process
  2. The directory containing the accompanying metadata
  3. The directory where you want the model output file to be written
  4. The directory containing the trained model weights (i.e., the multimodal-model-v2025.pth file) and, if desired, the Segment Anything Model weights. If you simply cloned the repo and have not moved anything around, the trained model weights file will be alongside the model script in the scripts subdirectory in the cloned repository. The SAM weights will be wherever you saved them upon downloading them.

Absolute file paths are generally recommended to avoid unintended behavior but will vary from computer to computer.

This YAML file is structured as key: value pairs. The order in which these entries are presented in the file does not matter, but all keys need to be included and match exactly as expected. For example:

configurations.yml
# --------------------------------------------------------------------------------------------
# Configuration for pre-processsing scale images (crop, pad, and normalization)
# --------------------------------------------------------------------------------------------

# -----Paths and general options-----
raw_image_path: "G:/Shared drives/NMFS SEFSC FATES Advanced Technology/BIOLOGY_LIFE_HISTORY_DATA/2020_Plant_10/Raw_Images"
preprocessed_image_path: "G:/Shared drives/NMFS SEFSC FATES Advanced Technology/BIOLOGY_LIFE_HISTORY_DATA/2020_Plant_10/Cropped"
input_type: ".tif"
output_type: ".jpg"
segment: "binary"

# -----Binary Threshold segmentation parameters-----
binary_threshold: 100

# -----Segment Anything Model (SAM) parameters-----
points_per_side: 16
stability_score_thresh: 0.93
downsample: 0.5
sam_model_type: "vit_b"
sam_weights_path: "C:/Users/user.name/Documents/GitHubRepos/FATES-BLH-ScaleAgeing/scripts/weights/sam_vit_b_01ec64.pth"

# -----Cropping and padding parameters-----
pad: 0.2
bottom_pad: 0.4

# -----Normalization options-----
normalization: "none"
invert: False

# --------------------------------------------------------------------------------------------
# Configuration for age inference 
# --------------------------------------------------------------------------------------------

# -----Model paths-----
image_path: "G:/Shared drives/NMFS SEFSC FATES Advanced Technology/BIOLOGY_LIFE_HISTORY_DATA/2020_Plant_10/Cropped"
metadata_path: "G:/Shared drives/NMFS SEFSC FATES Advanced Technology/BIOLOGY_LIFE_HISTORY_DATA/2020_Plant_10/Metadata/metadata.csv"
out_path: "G:/Shared drives/NMFS SEFSC FATES Advanced Technology/BIOLOGY_LIFE_HISTORY_DATA/2020_Plant_10/Model_Predictions/predictions.csv"
model_path: "C:/Users/user.name/Documents/FATES-BLH-ScaleAgeing/scripts/weights/multimodal-model-v2025.pth"
TipA note on reproducibility

For the sake of documenting workflows and facilitating future reproducibility, consider creating new configuration files for each model run (for example, one for each data set to be processed). There is no restriction on what this file can be called; you will tell the model which file to use when you execute the script. Thus, some convention like configuration-2024-atl.yml might be sensible.

Where to save your configuration file

There are two schools of thought when it comes to organizing the model files. It is ultimately up to the user to choose whichever convention is best for them.

Option 1: Alongside the model

Storing the configurations.yml file in the same directory as the model scripts is advantageous when running the model because you will not need to include the full directory path when you specify which configuration file to use. Since you will be executing the model script from the directory that script resides, the system will automatically find the configurations.yml file in that same directory.

The disadvantage to this option is that your directory may quickly become cluttered with different configuration files for different model runs.

Option 2: Alongside the data

One might opt instead to store the configurations.yml file in the same directory as the data to be processed or the directory where the model output will be written out. This is helpful for documenting workflows and ensuring reproducibility since it will be easy to see how a given data set was processed.

The disadvantage to this option is that it will require the full directory path to be included when running the model and specifying the configuration file to use.

Note

These are by no means the only options. Whatever convention is adopted, consistency is key. Your future self will thank you some day.