Batch Exports

How to export your data in batches large or small

Overview

Aquarium allows you to export your data out of the tool in a variety of formats. The format of the data will look like depends on what view and workflow you are currently working with when initiating the export.

There are two main places you can batch export data from within Aquarium:

The types of data included in each batch export range from just including frame IDs to fully encompassing exports that include almost all of the data initially ingested by Aquarium.

Querying Your Data To Be Exported

Your exported data will reflect the current query in the view. For example, if I wanted to export data from any main view, without a query you will be exporting the entirety of the specific dataset.

If we apply a query to the view, you'll see that the number reflects just the data that fits the criteria of the query.

Exporting Data from a Segment

All segments except those of type Collection Campaign have the ability to batch export their data in a CSV or JSON format. Currently, Collection Campaigns can only be exported as JSON.

We recommend grouping your data into segments and exporting your data from there. For all segment types, there is a standard batch export available.

Segments of type Collection Campaign have an additional feature where you can search for new data within an unlabeled dataset. This feature also allows for an additional export of your data which we will cover separately.

How To Export Segment Data

For all types of segments, if you click into the segments view, then click the "Elements" tab, you will find an "Actions" button in the top right corner of the screen. Click this button to expand the dropdown. You will see the option to download a batch export file for just the data contained in the segment in either JSON or CSV format.

If you would like to enable the option "Export To Labeling" and send your data directly to a labeling provider via webhook, you can view the docs for that here.

How To Export Collection Campaign Segments

Before you can export data under this tab, you have to kick off a search through the unlabeled dataset. To learn how to run a Collection Campaign follow this guide.

For a segment of type Collection Campaign, there is one additional place that you can batch export your data. Segments of type Collection Campaign will have another tab that says "Collected". Click the arrow button to expand the drop down in order to access the download button to export your data.

Segment Export Data Format

When exporting Segment data, you'll notice far more data included than in the main view exports. The data will vary depending on the ML Task type of the project, and if it is a crop or frame based segment, and if the crop are inference or ground truth crops.

For an example of what this export will look like, here is a link that will start a download to show an example of a JSON file for a frame-level segment export using the open source RarePlanes dataset (only has one element in export).

Export Data Format

[
   {
      "element_id":"", # unique id in segment
      "element_type":"frame", #frame or crop segment
      "dataset":"", # name of dataset
      "inference_set": "", # name of inference set
      "frame_id":"", # unique frame id
      "frame_data":{
         "_idx": , # int value
         "coordinate_frames":[ # details of coordinate frame
            {
               "coordinate_frame_id":"DEFAULT",
               "coordinate_frame_metadata":null,
               "coordinate_frame_type":"IMAGE"
            }
         ],
         "custom_metrics":null, # if any custom metrics were added in
         "date_captured":"2022-06-15T15:36:45.365725+00:00", 
         "deleted":null, 
         "device_id":"default_device", # if custom device id was specificed
         "frame_id":"", # unique frame id
         "frame_window": , # window id, unix timestamp of when data was uploaded
         "geo_data":"{}", 
         "inference_metadata":null, # will be null if frame level segment
         "label_data":[], # will be empty if frame level segment
         "reuse_latest_embedding":false,
         "sensor_data":[ # details of the sensor data
            {
               "coordinate_frame":"DEFAULT",
               "data_urls":"{\"image_url\": \"https://storage.googleapis.com/aquarium-public/datasets/rareplanes/train/PS-RGB_tiled/100_1040010029990A00_tile_319.png\"}",
               "date_captured":"2022-06-15T15:36:45.365729+00:00",
               "mirror_asset":false,
               "sensor_id":"DEFAULT",
               "sensor_metadata":"{}",
               "sensor_type":"IMAGE_V0"
            }
         ],
         "table":"", # reference to table in bigquery
         "task_id":"", # reference to frame id
         "window": , #unix timestamp for when data was uploaded
      },
      "issue_name":"", # name of the segment
      "label_metadata":{ # any metadata at the label level
         "confidenceThreshold":0.1,
         "iouThreshold":0.5
      },
      "aq_link":"", # link to the element in aquarium
      "issue_id":"" # segment id
   }
]

Exporting Data From a Main View

In this context the main views are:

  • Grid View

  • Analysis View

  • Embedding View

  • Model Metrics View

In each one of these views, you will see a button toward the top right corner that has the download symbol and the number of elements that will be downloaded.

Click the button highlighted in the above image to export your data in a JSON format. Remember any queries applied to the window will reflect in which data is exported.

Batch Export Data Format

For this kind of batch export, the data format is basic: a list of all of the frame IDs currently queried in that view.

The data will be exported as JSON file and will look like this:

[
  {
    "frameId": "100_1040010046CD1500_tile_187"
  },
  {
    "frameId": "100_1040010046CD1500_tile_476"
  }
]

If your workflow requires additional data to be exported, reach out to the team and we can see what other solutions could be possible.

Last updated