Aquarium
Search…
Viewing Your Dataset
Once you've uploaded a dataset through the Aquarium API, you can go to https://illume.aquariumlearning.com/ to begin to explore and understand what's in your dataset.
To begin, select your project with the dropdown from the top-right. Then select a dataset to look at using the "Datasets" dropdown and click "Search" to populate the page.
You can switch between different dataset views using the four icons to the right of the search bar.
You can use the "Display Settings" button to toggle settings like label transparency, number of datapoints displayed per row, etc.

Grid View

The first and default view for dataset exploration is the Grid View. This view lets you quickly look through your dataset and understand what it "looks like" at a glance. Labels, if available, are overlaid over the underlying data.
You can click into an individual datapoint to see more details, such as its metadata (timestamp, device ID, etc.) and its labels.
In this detailed view, you can also toggle showing labels on the bounding boxes. If toggled on, labels will always be shown for larger, more foregrounded boxes, while labels for smaller boxes will be shown on hover:
You can also select multiple elements to group into issues. You can either select them one-by-one by clicking the circle on the top-left of each image card, or you can select a range of multiple items at once by shift-clicking.

Histogram View

The second view for dataset understanding is the Histogram View.
You often want to view the distribution of your metadata across your dataset. This is particularly useful for understanding if you have an even spread of classes, times of day, etc. in your dataset. Simply click the dropdown, select a metadata field, and you can see a histogram of the distribution of that value across the dataset.

Embedding View

The third view for dataset understanding is the Embedding View.
The previous methods of data exploration rely a lot on metadata to find interesting parts of your dataset to look at. However, there's not always metadata for important types of variation in your dataset. We can use neural network embeddings to index the raw data in your dataset to better understand its distribution.
The embedding view plots variation in the raw underlying data. Each point in the chart represents a single datapoint. In the Image view, each point is a whole image or "row" in your dataset. In the Crop view, each point represents an individual label or inference object that is in a part of the image.
The closer points are to each other, the more similar they are. The farther apart they are, the more different. Using the embedding view, you can understand the types of differences in your raw data, find clusters of similar datapoints, and examine outlier datapoints.
You can also color the embedding points with metadata to understand the distribution of metadata relative to the distribution of the raw data.
To select a group of points for visualization, you can hold shift + click and draw a lasso around a group of points. You can then scroll through individual examples in the panel with the arrows. You can also adjust the size of the detail panel by dragging the corner.
It's also possible to change what part of the image you're looking at in the preview pane. You can zoom in and out of the image preview by "scrolling" like you would with your mouse's scroll wheel / two finger scroll. You can also click and drag the image to pan the view around the image.
Once you select a set of images or crops from other Explore views (such as Grid View or Metrics View), you can use the "View Embeddings" button to switch to the embedding view with the selection highlighted. This is also available for individual crops when viewing individual frames.

Example: Finding Labeling Errors With Embedding Analysis

Coloring the embedding view with metadata can help you understand the distribution of metadata relative to the distribution of the raw data. This can be used to spot label errors / inconsistencies. When similar examples have different labels, this can indicate a problem with the labels.
In the following example, two very similar datapoints of sitting people are labeled as very different classes:

Updating Label Colors

If you want to change the color associated with a specific label, you can do so by clicking on the color square next to the label in the "Display Settings" menu:

Setting Max Visible Confidence

In the "Display Settings" menu, you can adjust the max visible confidence so that only lower confidence inferences appear:
Note that the Min Confidence Threshold setting is slightly different from the Max Confidence Visible. In combination with the Min IOU Threshold, it determines how ground truth labels are matched to inference labels for metrics calculations (see Metrics Methodology for more details).