BiomedParse transforms biomedical image analysis with groundbreaking precision and scalability

by · News-Medical

Discover how BiomedParse redefines biomedical image analysis, tackling complex shapes and scaling new heights in precision and efficiency across nine imaging modalities!

Study: A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Image Credit: Microsoft Research

Background

Biomedical image analysis is essential for understanding physiology and anatomy at multiple scales, but traditional approaches handle image segmentation (dividing the image to separate the background from the object) and object detection and recognition (identifying objects and their locations in an image) separately. This disjointed methodology may lead to missed opportunities for joint learning across tasks, thereby limiting efficiency and accuracy.

Segmentation often requires user-drawn bounding boxes to locate objects, presenting three key challenges. First, it demands domain expertise to identify objects accurately. Second, rectangular bounding boxes poorly represent objects with irregular or complex shapes. Third, these methods are not scalable for images with numerous objects, such as cells in whole-slide pathology images, where manually outlining each object is impractical. Moreover, by focusing solely on segmentation, traditional methods neglect semantic information from related tasks, such as object types or metadata, further reducing segmentation quality. Therefore, in the present study, researchers developed BiomedParse, a unified biomedical model that integrates image segmentation, object detection, and recognition without relying on bounding boxes to overcome the challenges of conventional image analysis methods.

About the Study

To create a model capable of joint segmentation, detection, and recognition, the researchers developed a large-scale resource called BiomedParseData, which combines 45 biomedical segmentation datasets. Semantic information from these datasets, which is often noisy and inconsistent, was harmonized into a unified biomedical object ontology using GPT-4 and manual review processes. This ontology consisted of three categories (histology, organ, and abnormality), 15 meta-object types, and 82 specific object types. To support training, GPT-4 was used to generate synonymous descriptions for semantic labels, expanding the dataset to 6.8 million image–mask–description triples.

BiomedParse uses a modular design based on the SEEM (Segment Everything Everywhere All at Once) architecture. It includes an image encoder, a text encoder, a mask decoder, and a meta-object classifier for joint training with semantic information. The system operates without bounding boxes, contrasting with state-of-the-art methods like MedSAM. Instead, BiomedParse uses text prompts for segmentation and recognition, allowing broader scalability. Evaluation metrics included Dice scores for segmentation accuracy and silhouette scores for embedding quality. Tests were also used to measure BiomedParse’s ability to detect invalid text prompts using statistical methods, including the Kolmogorov–Smirnov test. The system’s performance was validated across nine imaging modalities, including pathology, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, X-ray, fluorescence microscopy, electron microscopy, phase-contrast microscopy, and brightfield microscopy. The results were compared to those of other segmentation models, such as the Segment Anything Model (SAM) and Medical SAM (MedSAM).

Results and Discussion

BiomedParse was found to achieve state-of-the-art results across image segmentation, object detection, and recognition tasks. On a test set of 102,855 instances spanning nine modalities, BiomedParse achieved the best Dice scores, outperforming MedSAM even when MedSAM was provided oracle bounding boxes. When tested on more realistic scenarios with bounding boxes generated by Grounding DINO, BiomedParse’s superiority became even more evident, particularly for challenging modalities like pathology and CT.

BiomedParse showed significant advantages in segmenting irregularly shaped objects, which traditional bounding box-based methods struggled with. Using text prompts such as “glandular structure in colon pathology,” BiomedParse achieved a median Dice score of 0.942, compared to below 0.75 for SAM and MedSAM without bounding boxes. The improvement strongly correlated with object irregularity, highlighting BiomedParse's capability to handle complex shapes. For example, BiomedParse achieved a 39.6% higher Dice score than the best-competing method on irregular objects.

For object recognition, BiomedParse identified and labeled all objects in an image without user-provided prompts. Compared to Grounding DINO, BiomedParse achieved higher precision, recall, and F1 scores. Its performance improved further as the number of objects in an image increased. Real-world validation showed BiomedParse successfully annotated immune and cancer cells in pathology slides, closely matching pathologists' annotations. While human pathologists may provide coarse-grained annotations, BiomedParse offers precise and comprehensive labeling, suggesting its potential to reduce clinician workloads in clinical applications.

BiomedParse's limitations include its need for post-processing to differentiate individual object instances, lack of conversational capabilities, and reduction of three-dimensional (3D) modalities to two-dimensional image slices, potentially missing spatiotemporal information.

Conclusion

In conclusion, BiomedParse could outperform previous biomedical image analysis methods across major imaging modalities and was shown to be more scalable and accurate, especially in recognizing and segmenting complex objects. The tool opens new avenues for high-throughput, automated biomedical image analysis-based discovery, reducing manual intervention and potentially accelerating research. Future efforts could focus on extending BiomedParse to three-dimensional data and enabling interactive, conversational capabilities for more tailored applications.

Source:

Journal reference:

  • Zhao, T., Gu, Y., Yang, J., Usuyama, N., Lee, H. H., Kiblawi, S., Naumann, T., Gao, J., Crabtree, A., Abel, J., Piening, B., Bifulco, C., Wei, M., Poon, H., & Wang, S. (2024). A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nature Methods, 1-11. DOI: 10.1038/s41592-024-02499-w, https://www.nature.com/articles/s41592-024-02499-w