A fully open and generalizable foundation model for ultrasound clinical applications

Hongyuan Zhang1, Yuheng Wu1,2, Mingyang Zhao3,4,*, Zhiwei Chen1,5, Rebecca Li6, Fei Zhu1,*, Haohan Zhao1,2, Xiaohua Yuan7, Meng Yang8, Chunli Qiu9, Xiang Cong9, Haiyan Chen10, Lina Luan11, Randolph H.L. Wong12, Huai Liao13, Colin A Graham6, Shi Chang7, Guowei Tao9, Dong Yi1, Zhen Lei1,4,14, Nassir Navab15, Sebastien Ourselin16, Jiebo Luo1,17, Hongbin Liu1,14,16, Gaofeng Meng1,4,14,*
1Center for Artificial Intelligence and Robotics, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong, China; 2City University of Hong Kong, Hong Kong, China; 3State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; 4University of Chinese Academy of Sciences, Beijing, China; 5The Chinese University of Hong Kong, Hong Kong, China; 6Accident and Emergency Medicine Academic Unit, The Chinese University of Hong Kong, Hong Kong, China; 7Xiangya Hospital Central South University, Changsha, China; 8Hunan Frontline Medical Technology Co., Ltd, Changsha, China; 9Qilu Hospital of Shandong University, Jinan, China; 10Zhongshan Hospital of Fudan University, Shanghai, China; 11Shanghai Geriatric Medical Center, Shanghai, China; 12Division of Cardiothoracic Surgery, Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China; 13Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; 14State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; 15Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany; 16School of Biomedical Engineering & Imaging Sciences, King's College London, UK; 17Department of Computer Science, University of Rochester, USA
* Corresponding authors

Abstract

The inherent safety and versatility of ultrasound imaging have made it widely accessible in modern clinical settings for disease diagnosis and health management. Artificial intelligence (AI) that can effectively learn ultrasound representations by integrating multi-source data holds significant promise for advancing clinical care. However, the scarcity of large labeled datasets in real-world clinical environments and the limited generalizability of task-specific models have hindered the development of generalizable clinical AI models for ultrasound applications. In this study, we present EchoCare, a novel ultrasound foundation model for generalist clinical use, developed via self-supervised learning on our curated, publicly available, large-scale unlabeled dataset EchoCareData. EchoCareData comprises 4.5 million ultrasound images, sourced from over 20 countries across 5 continents and acquired via a diverse range of distinct imaging devices, thus encompassing global cohorts that are multi-center, multi-device, and multi-ethnic. Unlike prior studies that adopt off-the-shelf vision foundation model architectures, we introduce a hierarchical classifier into EchoCare to enable joint learning of pixel-level and representation-level features, capturing both global anatomical contexts and local ultrasound characteristics. With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative downstream ultrasound benchmarks of varying diagnostic difficulties, spanning disease diagnosis, lesion segmentation, organ detection, landmark prediction, quantitative regression, imaging enhancement and report generation. The code and pretrained model are publicly released, rendering EchoCare accessible for fine-tuning and local adaptation, supporting extensibility to additional applications. EchoCare provides a fully open and generalizable foundation model to boost the development of AI technologies for diverse clinical ultrasound applications.

EchoCareData

EchoCareData integrates multi-center, multi-region, and multi-device sources, covering 23 hospitals across 5 continents and 20 countries, ensuring diversity in clinical practices, patient demographics, and imaging equipment.

0

+

Hospitals Worldwide

0

+

Ultrasound Devices

0

+

Continents

0

+

Countries/Regions

Results

SEGMENTATION

We evaluated different foundation models on three representative ultrasound clinical benchmarks for anatomical segmentation: the DDTI dataset for thyroid node segmentation, the Mus-V dataset for arterial-venous vessel segmentation, and the abdomen multi-organ segmentation.

Thyroid Node

Arterial-venous Vessel

Abdomen

Thyroid

Kidney

Liver

Breast

Carotid Artery

ENHANCEMENT

We evaluated EchoCare on the low-quality ultrasound image enhancement task using the USenhance benchmark dataset, which encompasses real-world clinical scans from 109 patients across five anatomical regions: thyroid, kidney, liver, breast, and carotid artery.

REPORT GENERATION

To evaluate the effectiveness of our developed foundation model in ultrasound report generation, we integrate EchoCare into an existing Transformer-based encoder–decoder report generator, where the input is the global visual features extracted from ultrasound images. The integrated model is then fine-tuned on the USData Liver dataset, which contains paired ultrasound images and corresponding expert-written reports.

🩺 Input Image Set
Ultrasound 1 Ultrasound 2
🧊 Ready to analyze image set...

Global Visits