Memory-based unsupervised video clinical quality assessment with multi-modality data in fetal ultrasound.
Zhao H., Zheng Q., Teng C., Yasrab R., Drukker L., Papageorghiou AT., Noble JA.
In obstetric sonography, the quality of acquisition of ultrasound scan video is crucial for accurate (manual or automated) biometric measurement and fetal health assessment. However, the nature of fetal ultrasound involves free-hand probe manipulation and this can make it challenging to capture high-quality videos for fetal biometry, especially for the less-experienced sonographer. Manually checking the quality of acquired videos would be time-consuming, subjective and requires a comprehensive understanding of fetal anatomy. Thus, it would be advantageous to develop an automatic quality assessment method to support video standardization and improve diagnostic accuracy of video-based analysis. In this paper, we propose a general and purely data-driven video-based quality assessment framework which directly learns a distinguishable feature representation from high-quality ultrasound videos alone, without anatomical annotations. Our solution effectively utilizes both spatial and temporal information of ultrasound videos. The spatio-temporal representation is learned by a bi-directional reconstruction between the video space and the feature space, enhanced by a key-query memory module proposed in the feature space. To further improve performance, two additional modalities are introduced in training which are the sonographer gaze and optical flow derived from the video. Two different clinical quality assessment tasks in fetal ultrasound are considered in our experiments, i.e., measurement of the fetal head circumference and cerebellar diameter; in both of these, low-quality videos are detected by the large reconstruction error in the feature space. Extensive experimental evaluation demonstrates the merits of our approach.