During the course of analyzing your datasets and model inferences, you are likely to find problems! These can range from bad data / labels to actual deficiencies in model performance.
Aquarium provides a way to track these issues and then take the proper corrective action to fix them.
An issue is a grouping of datapoints, sometimes with specific labels + inferences associated with them. You can create or add to an issue from most views in Aquarium.
In the Grid View and Model Analysis View, you can select datapoints by clicking the circle on the top-left of each card. You can hold shift and click to add multiple datapoints. Once you've selected the datapoints, you can use the "Add To Issue" button to create a new issue or select an existing issue in the dropdown to add to.
In the Embedding View, you can select individual datapoints by clicking on them or select a group of datapoints with shift-click lasso-ing. Then you can use the "Add To Issue" button as before.
You can use the "inIssue" and "notInIssue" filters in the query bar to include or exclude datapoints. This is particularly useful for looking at the queue of failure datapoints that you have not yet triaged into an issue.
The following filter filters for datapoints that are not in any issue:
The Issues page, accessible from the top bar, allows you to access all issues you've created so far.
You can click into an issue to view what datapoints you've added to it so far, remove datapoints from the issue, export datapoints as a JSON file, or take corrective action to fix it.
Protip: we recommend keeping the Issues page open in a separate tab. This way, you can keep adding to an issue in one tab and then review what you've added to that issue in another tab.
Typically, an issue is caused by either a problem in the data or a problem with your model.
If the issue is caused by a problem with your data (such as a missing or inconsistent label), Aquarium can integrate into your labeling provider. This way, you can click a button, send this data to your labeling provider to fix it, and then retrain + re-evaluate your model on clean data.
If the issue is caused by a deficiency in your model (the model does badly on a particular type of difficult example), Aquarium can help automate the process of collecting more data of that difficult example so that your model performs better the next time you retrain. You can upload large unlabeled datasets, we will use our neural network embedding search to find examples in those unlabeled datasets that are similar to the current issue, and we will send them to your labeling provider.
If these features are of interest to you, contact your Aquarium representative or email us at [email protected] for more details.