Learning the heterogeneous representation of brain's structure from serial SEM images using a masked autoencoder
-
Published:2023-06-08
Issue:
Volume:17
Page:
-
ISSN:1662-5196
-
Container-title:Frontiers in Neuroinformatics
-
language:
-
Short-container-title:Front. Neuroinform.
Author:
Cheng Ao,Shi Jiahao,Wang Lirong,Zhang Ruobing
Abstract
IntroductionThe exorbitant cost of accurately annotating the large-scale serial scanning electron microscope (SEM) images as the ground truth for training has always been a great challenge for brain map reconstruction by deep learning methods in neural connectome studies. The representation ability of the model is strongly correlated with the number of such high-quality labels. Recently, the masked autoencoder (MAE) has been shown to effectively pre-train Vision Transformers (ViT) to improve their representational capabilities.MethodsIn this paper, we investigated a self-pre-training paradigm for serial SEM images with MAE to implement downstream segmentation tasks. We randomly masked voxels in three-dimensional brain image patches and trained an autoencoder to reconstruct the neuronal structures.Results and discussionWe tested different pre-training and fine-tuning configurations on three different serial SEM datasets of mouse brains, including two public ones, SNEMI3D and MitoEM-R, and one acquired in our lab. A series of masking ratios were examined and the optimal ratio for pre-training efficiency was spotted for 3D segmentation. The MAE pre-training strategy significantly outperformed the supervised learning from scratch. Our work shows that the general framework of can be a unified approach for effective learning of the representation of heterogeneous neural structural features in serial SEM images to greatly facilitate brain connectome reconstruction.
Publisher
Frontiers Media SA
Subject
Computer Science Applications,Biomedical Engineering,Neuroscience (miscellaneous)
Reference38 articles.
1. Beit: Bert pre-training of image transformers;Bao;arXiv [Preprint],2021
2. “Learned versus hand-designed feature representations for 3d agglomeration,”;Bogovic;CVPR,2013
3. “BERT: Pre-training of deep bidirectional transformers for language understanding,”;Devlin;Proceedings of NAACL-HLT,2019
4. “An image is worth 16x16 words: transformers for image recognition at scale,”;Dosovitskiy;ICLR,2021
5. High-resolution, high-throughput imaging with a multibeam scanning electron microscope;Eberle;J. Microsc,2018