Evaluate Model Performance

Please read about getting started with Segments first.

Model Performance Segments

Model Performance Segments allow you to organize your dataset in Aquarium and understand model performance across important subsets of your dataset.

Use Model Performance Segments to:

  • Group subsets of your dataset for performance reporting based on metadata, embedding space, model performance, domain, etc.

  • Define target performance thresholds for Precision, Recall and F1 against the grouped set of frames.

  • Compare multiple inference sets across all Model Performance Segments and identify areas of under/over performance.

Model Performance Segment Types

  • Split - Use split segments to track and report model performance on the Test, Training and Validation sets within your dataset.

  • Regression Test - Use regression test segments to define subsets of the dataset that mimic your domain's hardest problems, set target model performance thresholds and then easily assess pass-fail for every model experiment or release candidate.

  • Scenario - Use scenarios to evaluate model performance against subsets of your data

Creating Model Performance Segments

Anywhere you can select a set of frames within a dataset, you can define a Model Performance Segment.

  1. Select a set of frames (e.g. based on a query, embedding space, model performance, etc.)

  2. Click the Add to Segment button

  3. Ensure the Frame / Crop toggle is set to Frame

  4. Choose the Model Performance Segment type

  5. Fill in relevant metadata, including the optional performance target threshold

  6. Click Save

Once submitted, your Model Performance Segments will appear in both the Metrics View and the Model Performance subsection of the Segments page.

Learn more about analyzing your inference sets in the context of Model Performance Segments.

Managing Model Performance Segments

Access the details for each Model Performance Segment from the Segments page under the Model Performance subsection.

From the elements tab of the segment details page, use Similarity Search to identify other frames from the dataset that may belong in your Regression Test or Scenario Model Performance Segments.

From the metrics tab of the segment details page, set or update performance targets and view relative model performance for all submitted inference sets.

Common Use Cases

Measuring Model Performance Against the Test and Validation Sets

  • For every dataset you upload to Aquarium, provide split as a metadata field.

  • For each split

    • Use the query bar to filter the dataset to frames in that split typically (e.g. user__split:training

    • Select all frames and click add to segment

    • Choose the Split segment type and fill in the relevant metadata.

Once defined, every uploaded inference set will automatically have performance calculated for each split. Compare and contrast performance across splits or inference sets using the Model Metrics View.

Evaluate Release Candidates or Experiments Against Business Critical Outcomes using Regression Tests

  • Based on model performance or domain context, define sets of frames that represent critical scenarios the model must perform well in.

  • Set metric thresholds based on the current deployed model's baseline or other outcome oriented targets.

  • Use the multi-model comparison view on the Scenarios tab to quickly learn whether a release candidate model

    • improved or regressed relative to the baseline

    • surpassed the target thresholds for the individual Regression Test

Last updated