2020-11-06

App Improvements

Projects Summary Page

The main "Projects" page now presents a summary view of your organizations projects, including the number of issues and member datasets / inference sets.

It also provides support in the UI for archiving datasets and projects, which was previously only exposed via the python client.

Issue Element Quick-View

Previously, looking at the full context for an issue element would require opening a new tab, which could be cumbersome when attempting to look through many samples. Now, clicking on an issue element will quickly open a full detail view in the same window, like in the main dataset search interface.

Issues From Multiple Datasets

Previously, issues would support adding elements from multiple datasets, but wouldn't track the full history of which datasets / inference sets each element came from -- just the element ids and the project id. Issues now fully support adding elements from multiple datasets within the project.

Previous/Next Frame Buttons

When looking through search results, the left and right arrow keys (or corresponding buttons on the UI) will now move to the next frame without having to close the full-screen view.

No More Empty Issue Names

If you're like me, you've accidentally created an issue with no name. That's no longer possible.

Python Client / Data Upload Improvements

Reduced Memory Load While Uploading (Round 1)

When uploading larger datasets (especially with large embedding vectors), the python client's memory usage could get prohibitively high. Newer python client versions have a cleaner upload process, which reduces temporary duplication of dataset information in memory.

Explicit Type Schemas for Custom Metadata

By default, Aquarium attempts to infer datatypes from your custom dataset metadata fields. This works well in the simple cases, but can lead to non-obvious schema related failures, especially when the metadata includes nullable fields.

The python client now allows users to explicitly specify the schema that should be used for custom metadata fields, and will warn when you're attempting to provide values that are likely to produce errors later.

# Before
# Oh no, this might break later on due to type inference failures
frame.add_user_metadata('some_field', maybe_null_value)

# Now
# Oh ok
frame.add_user_metadata('some_field', maybe_null_value, 'int')

Data Ingestion Job Status

Until now, the data ingestion process has been pretty opaque. You upload data, then wait, and hope it resolves as successful some time later.

The python client now supports reporting data ingestion job status. You can see as the job is accepted, when resources are fully allocated to it, and whether it succeeds or fails. This should be a great starting point to see what's happening with your datasets, and we're planning a lot more work on this in the coming weeks.

# What python client usage looks like now

Creating Dataset
Waiting until the processing job finishes.
Processing job state: TRIGGERED
Processing job state: PENDING
Processing job state: RUNNING
Processing job state: DONE
Dataset is fully processed.
Submitting Inferences
Waiting until the processing job finishes.
Processing job state: TRIGGERED
Processing job state: PENDING
Processing job state: RUNNING
Processing job state: DONE
Inferences are fully processed.

Embedding Visualization Speedups

For a variety of reasons, the embedding visualizer requires the most attention to scale with larger datasets. We've made a lot of behind-the-scenes changes over the last few weeks to better support datasets with millions of labels.

This should manifest as reduced network bandwidth (with better local asset caching), load times, and embedding UI responsiveness.

Last updated