2020-11-06

App Improvements

Projects Summary Page

The main "Projects" page now presents a summary view of your organizations projects, including the number of issues and member datasets / inference sets.
It also provides support in the UI for archiving datasets and projects, which was previously only exposed via the python client.
Project List
Project Summary + Archival Controls

Issue Element Quick-View

Previously, looking at the full context for an issue element would require opening a new tab, which could be cumbersome when attempting to look through many samples. Now, clicking on an issue element will quickly open a full detail view in the same window, like in the main dataset search interface.
Issue View
Clicking on an Element

Issues From Multiple Datasets

Previously, issues would support adding elements from multiple datasets, but wouldn't track the full history of which datasets / inference sets each element came from -- just the element ids and the project id. Issues now fully support adding elements from multiple datasets within the project.
Issue containing elements from multiple datasets

Previous/Next Frame Buttons

When looking through search results, the left and right arrow keys (or corresponding buttons on the UI) will now move to the next frame without having to close the full-screen view.

No More Empty Issue Names

If you're like me, you've accidentally created an issue with no name. That's no longer possible.

Python Client / Data Upload Improvements

Reduced Memory Load While Uploading (Round 1)

When uploading larger datasets (especially with large embedding vectors), the python client's memory usage could get prohibitively high. Newer python client versions have a cleaner upload process, which reduces temporary duplication of dataset information in memory.

Explicit Type Schemas for Custom Metadata

By default, Aquarium attempts to infer datatypes from your custom dataset metadata fields. This works well in the simple cases, but can lead to non-obvious schema related failures, especially when the metadata includes nullable fields.
The python client now allows users to explicitly specify the schema that should be used for custom metadata fields, and will warn when you're attempting to provide values that are likely to produce errors later.
1
# Before
2
# Oh no, this might break later on due to type inference failures
3
frame.add_user_metadata('some_field', maybe_null_value)
4
​
5
# Now
6
# Oh ok
7
frame.add_user_metadata('some_field', maybe_null_value, 'int')
Copied!

Data Ingestion Job Status

Until now, the data ingestion process has been pretty opaque. You upload data, then wait, and hope it resolves as successful some time later.
The python client now supports reporting data ingestion job status. You can see as the job is accepted, when resources are fully allocated to it, and whether it succeeds or fails. This should be a great starting point to see what's happening with your datasets, and we're planning a lot more work on this in the coming weeks.
1
# What python client usage looks like now
2
​
3
Creating Dataset
4
Waiting until the processing job finishes.
5
Processing job state: TRIGGERED
6
Processing job state: PENDING
7
Processing job state: RUNNING
8
Processing job state: DONE
9
Dataset is fully processed.
10
Submitting Inferences
11
Waiting until the processing job finishes.
12
Processing job state: TRIGGERED
13
Processing job state: PENDING
14
Processing job state: RUNNING
15
Processing job state: DONE
16
Inferences are fully processed.
Copied!

Embedding Visualization Speedups

For a variety of reasons, the embedding visualizer requires the most attention to scale with larger datasets. We've made a lot of behind-the-scenes changes over the last few weeks to better support datasets with millions of labels.
This should manifest as reduced network bandwidth (with better local asset caching), load times, and embedding UI responsiveness.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Last modified 11mo ago