Release Notes v117, July 2021

This document contains detailed information about the first public data released on the cubic millimeter dataset in July 2021.

Cell Numbers

The functional dataset contains 115,372 functionally identified region of interest masks. Because cells can appear in more than one imaging plane and scan, this corresponds to an estimated 75,909 estimated excitatory neurons. The functionally imaged volume is distinct from the reconstructed electron microscopy volume, and so of those, it is estimated that 82,428 ROI masks (and 53,195 estimated neurons) overlap with the two EM reconstruction portions, approximately 2/3 of those with the larger portion of the EM dataset and 1/3 with the smaller portion.

Within the larger portion of the electron microscopy dataset, an automated nucleus detection (see details here) determined that there were 144,120 cells, of which 82,247 were neurons in the segmented volume. Nucleus detection has not yet been run on the smaller portion, but based upon its size it’s expected to contain an additional ~75,000 cells and 43,000 neurons.

More neurons may be inside the functional scan planes, but lack of activity or gCaMP expression could cause them not to have an ROI mask. Functional imaging was done in a mouse line that only expressed in excitatory cells, so inhibitory neurons are also not expected to have functional signals in this dataset. Functional imaging is more challenging in deeper layers of cortex (deep layer 5 and 6), and so it is expected that the functional data will contain fewer recorded neurons in those regions.

Synapse detection in the larger portion detected 337,312,429 synapses that were within the segmented volume and 186,268,895 for the smaller portion, for a total of 523,581,324.

Data Manifest

Below are links to cloud paths and data descriptions for each of the data components available for download.

Minnie65 - Structural, Connectivity, and Cell Typing Data

Name

Cloudpaths

Short Description

Type (size)

Fine-aligned Image (EM)

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/em

https://storage.googleapis.com/iarpa_microns/minnie/minnie65/em

Multi-resolution electron microscopy (EM) imagery from 8,8,40 nm and above.

Precompute Image Data (117 TB)

Description: This contains the fine aligned Electron Microscopy (EM) image data downsampled  to 8,8,40 nm resolution stored in precomputed image format.  Lower resolution downsampling is available in this bucket as well, including [16, 16, 40], [32, 32, 40], [64, 64, 40], [128, 128, 80],[256, 256, 160], [512, 512, 320],[1024, 1024, 640],[2048, 2048, 1280].  Folder contains many files, for download use cloud-volume, tensor-store, for bulk download use igneous, AWS CLI or gsutil CLI.  

Proofread Segmentation (v117)

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/seg

https://storage.googleapis.com/iarpa_microns/minnie/minnie65/seg

Mulit-resolution flat / static cellular segmentation voxels and meshes from 8,8,40 nm and above.

Compressed Sharded Precomputed Segmentation Data (12 TB)

Description: This contains the fixed state of the cellular segmentation after proofreading as of June 11, 2021, where each voxel has been assigned an ID which is unique to each cellular object at 8,8,40, along with downsampled versions. Not all objects have been proofread, but a summary of the most focused efforts on cells can be found in the proofreading status metadata.  In addition the mesh folder contains meshes of each object available at 3 different levels of downsampling.  Folder contains many files, for download use cloud-volume, tensor-store, for bulk download use igneous, AWS CLI or gsutil CLI.

Watershed Segmentation

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/ws


The supervoxel segmentation

Precomputed Shareded Compressed Segmentation (42 TB)

Description: The individual supervoxels predicted by the affinity network before they were agglomerated by the automated segmentation and then modified through proofreading. Folder contains many files, for download use cloud-volume, tensor-store, for bulk download use igneous, AWS CLI or gsutil CLI.

PSD Segmentation

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/clefts


Voxel segmentation of each synapse (post-synaptic density - PSD) detected.

Precomputed Compressed Segmentation Data (127 GB)

Description: This contains a flattened segmentation of the synaptic clefts where each voxel has been assigned an ID which is unique to each synapse at 8,8,40.  Folder contains many files, for download use cloud-volume, for bulk download use igneous or AWS or gsutill CLI.

Nucleus Segmentation

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nuclei

Voxel segmentation and meshes of each cell nucleus detected in image volume.

Precomputed Compressed Segmentation Data (26.8 GB)

Description: This contains a flattened segmentation of the nucleus segmentation where each voxel has been assigned an ID which is unique to each nucleus at 8,8,40.   Folder contains many files, for download use cloud-volume, for bulk download use igneous or AWS CLI.

Nucleus Detection

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nucleus_detection/nucleus_detection_v0.csv

Metadata about each nucleus detection, including the cellular segment that it overlaps with.

CSV
(10.5 MB)

Description: A table of nuclei detections from a nucleus detection model developed by Shang Mu, Leila Elabbady, Gayathri Mahalingam and Forrest Collman. Only included nucleus detections of volume>25 um^3, below which detections are false positives, though some false positives above that threshold remain.

Column descriptions:

  • id: corresponds to the ID from the nucleus detection and segmentation
  • valid: internal check, uniformly ‘t’
  • pt_position_{x,y,z}: the location in 4,4,40 nm voxels of the nucleus location
  • pt_supervoxel_id: the ID of the supervoxel from the watershed segmentation that is under the pt_position
  • pt_root_id:the ID of the segment/root_id under the pt_position from the Proofread Segmentation (v117).
  • volume: the volume of the nucleus detection in um^3

Nucleus Neuron Classification

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nucleus_neuron_classification/nucleus_neuron_svm.csv

An automated annotation of which nuclei are neurons.

CSV
(12.8 MB)

Description: This table contains a prediction about what nucleus detections are neurons and which are likely not neurons.  This is based upon a model trained by Leila Elabbady (Allen Institute) on nucleus segmentations in Basil, processed for features such as volume, foldedness, location in cortex, etc, and applied to Minnie65.  In Basil the model had a cross validated f1 score of .97 and a recall of .97 for neurons.  Manual validation performed on a column of 1316 nuclei in Minnie65 measured a recall of .996 and a precision of .969.

Column descriptions:

  • id: corresponds to the ID from the nucleus detection and segmentation
  • valid: internal check, uniformly ‘t’
  • pt_position_{x,y,z}: the location in 4,4,40 nm voxels of the nucleus location
  • pt_supervoxel_id: the ID of the supervoxel from the watershed segmentation that is under the pt_position
  • pt_root_id:the ID of the segment/root_id under the pt_position from the Proofread Segmentation (v117).
  • classification_system: uniformly “is_neuron” for all entries.
  • cell_type: ‘neuron’ if the classifier called this a neuron, ‘not-neuron’ if it was not classified as a neuron, this contains both non-neuronal cells as well as false positive detections.

Synapse Graph

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/synapse_graph/synapses_pni_2.csv

Metadata about each synapse detection, including which cellular segmentation(s) are pre/post synaptic.

CSV
(47.5 GB)

Description: This CSV contains columns of metadata about the synapse detections.
Column descriptions:

  • id: corresponds to the ID from the PSD segmentation volume
  • valid: internal check, uniformly ‘t’
  • pre_pt_position_{x,y,z}: the location in 4,4,40 nm voxels of the pre-synaptic point
  • post_pt_position_{x,y,z}: the location in 4,4,40 nm voxels of the post-synaptic point
  • ctr_pt_position_{x,y,z}: the location of the center of mass of the PSD segmentation in 4,4,40 nm voxels
  • pre_pt_supervoxel_id: the ID of the supervoxel from the watershed segmentation that is under the pre_pt_position
  • post_pt_supervoxel_id: the ID of the supervoxel from the watershed segmentation that is under the post_pt_position
  • pre_pt_root_id: the ID of the segment/root_id under the pre_pt_position from the Proofread Segmentation (v117).
  • post_pt_root_id: the ID of the segment/root_id under the post_pt_position from the Proofread Segmentation (v117).
  • size: the size in 4,4,40 nm voxels of the PSD segmentation object.

Proofreading Status

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/proofreading_status/proofreading_status_public_release.csv

Metadata about which cells have undergone what level of proofreading

CSV
(56 KB)

Description: The proofreading status of neurons that have been comprehensively proofread as of v117. Axon and dendrite compartment status are marked separately under 'axon_status' and 'dendrite_status', as proofreading effort was applied differently to the different compartments in some cells. There are three possible status values for each compartment: 'non' indicates no comprehensive proofreading. 'clean' indicates that all false merges have been removed, but all tips have not necessarily been followed. 'extended' indicates that the cell is both clean and all tips have been followed as far as a proofreader was able to. Very small false axon merges (axon fragments approximately 5 microns or less in length) were considered acceptable for clean neurites. Note that this table does not list all edited cells, but only those with comprehensive effort toward the status mentioned here. It is meant to serve as a resource for analysis as to a list of objects that have undergone different levels of quality control by humans.

Column descriptions:

  • id: a unique identifier for this row
  • valid: internal check, uniformly ‘t’
  • pt_position_{x,y,z}: the location in 4,4,40 nm voxels at a cell body or similar core position for the cell
  • pt_supervoxel_id: the ID of the supervoxel from the watershed segmentation that is under the pt_position
  • pt_root_id: the ID of the segment/root_id under the pt_position from the Proofread Segmentation (v117).
  • valid_id: the root id when the proofreading was last checked. If the current root id in 'pt_root_id' is not the same as 'valid_id', there is no guarantee that the proofreading status is correct. Should not happen be true for all rows in this release.
  • status_dendrite: the status of proofreading for this cell’s dendrites. (clean, extended, non)
  • status_axon: the status of proofreading for this cell’s axon. (clean, extended, non)

Functional Co-registration

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/functional_coregistration/func_unit_em_match_release.csv

Metadata about which cellular segmentations correspond to which functional ROIs

CSV
(14KB)

Description: A table of EM nuclear centroids manually matched to corresponding units from the functional scans. Functional imaging performed by Paul Fahey and Jake Reimer of BCM. A functional unit is uniquely identified by its session, scan_idx and unit_id. An EM centroid may been present in more than one imaging field and therefore be associated with more than one functional unit. Coregistration of Two-Photon imaging data and EM data performed by AIBS. Coregistration: Nuno da Costa and Mark Takeno of AIBS generated correspondence points between the datasets and Dan Kapner of AIBS fit the transform. (Github: https://github.com/AllenInstitute/em_coregistration/tree/phase3). Matching: Functional unit to EM cell matching protocol developed by Stelios Papadopoulos of BCM and performed by trained personnel. Briefly, a summary image of the functional imaging field was compared to its corresponding plane from the coregistered EM volume. Both nearby neuronal somas and vessel features were used as fiducials to confirm the accuracy of coregistration locally and to determine the functional unit to EM cell match.

Column descriptions:

  • id: a unique identifier for this row
  • valid: internal check, uniformly ‘t’
  • pt_position_{x,y,z}: the location in 4,4,40 nm voxels at a cell body for the cell
  • pt_supervoxel_id: the ID of the supervoxel from the watershed segmentation that is under the pt_position
  • pt_root_id: the ID of the segment/root_id under the pt_position from the Proofread Segmentation (v117).
  • session: the ID indicating the imaging period for the mouse
  • scan_idx: the index of the scan within the imaging session
  • unit_id: the ID of the functional ROI (unique per scan)

Minnie35 - Structural, Connectivity, and Cell Typing Data 

Name

Cloudpath

Short Description

Type (size)

Fine aligned image

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie35/em

Precomputed Image (55 TB)

Proofread Segmentation v0

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie35/seg

https://storage.googleapis.com/microns_phase3/minnie/minnie35/seg

Precomputed

Sharded Compressed Segmentation (10 TB)

Watershed segmentation

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie35/ws

The supervoxel segmentation

Precomputed Sharded Compressed Segmentation (22 TB)

PSD segmentation

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie35/clefts

Precomputed Compressed Segmentation (94 GB)

Synapse Graph

https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie35/synapse_graph/assigned.csv.gz

CSV (14 GB)

Minnie - Functional and Experimental Data

Name

Cloudpath

Short Description

Type (size)

Stimulus Presentation

s3://bossdb-open-data/iarpa_microns/minnie/functional_data/stimulus_movies/

Visual stimulus presented during functional imaging scans

AVI (multiple, 9.8 GB each, 186.2 GB total)

Description: The visual stimulus shown to the animal in each scan for 19 scans was recreated by aligning, concatenating, and temporally filtering individual stimulus clips into a single movie, which was sampled by interpolation at scan depth frame times and saved as an avi file.  Please see technical documentation for details.

Functional Imaging Scans

s3://bossdb-open-data/iarpa_microns/minnie/functional_data/two_photon_functional_scans/

Two-photon functional imaging scans

TIF (multiple, 66-95 GB each, 1.3TB total)

Description: The two-photon imaging collected during 19 scans was raster- and motion-corrected, then saved as TIF files.  Please see technical documentation for details.

Structural Imaging Stack

s3://bossdb-open-data/iarpa_microns/minnie/functional_data/two_photon_structural_stacks

Two-photon structural volume enclosing imaged area

TIFF (multiple, 1.2-9.4 GB each, 10.6 GB total)

Description: Two-photon volume imaging including vasculature label of the tissue enclosing the two-photon imaged area, saved at original and upsampled resolutions as TIF files.  Please see technical documentation for details.

DataJoint Database

s3://bossdb-open-data/iarpa_microns/minnie/functional_data/two_photon_processed_data_and_metadata/

Functional Data, Meta Data, Experimental Data

SQL, Containers (225 GB total)

Description:  Scan metadata and processed data including scan and stack metadata, synchronized stimulus movies, synchronized behavioral traces, cell segmentation masks, calcium traces, and inferred spikes, pre-ingested into a containerized MYSQL v5.7 database, schematized using DataJoint.  Please see technical documentation for more details.  Instructions for setting up the containers available at https://github.com/cajal/microns-nda-access.