Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

While performing an ultrasound (US) scan, sonographers direct their gaze at regions of interest to verify that the correct plane is acquired and to interpret the acquisition frame. Predicting sonographer gaze on US videos is useful for identification of spatio-temporal patterns that are important for US scanning. This paper investigates utilizing sonographer gaze, in the form of gaze-tracking data, in a multimodal imaging deep learning framework to assist the analysis of the first trimester fetal ultrasound scan. Specifically, we propose an encoderdecoder convolutional neural network with skip connections to predict the visual gaze for each frame using 115 first trimester ultrasound videos; 29,250 video frames for training, 7,290 for validation and 9,126 for testing. We find that the dataset of our size benefits from automated data augmentation, which in turn, alleviates model overfitting and reduces structural variation imbalance of US anatomical views between the training and test datasets. Specifically, we employ a stochastic augmentation policy search method to improve segmentation performance. Using the learnt policies, our models outperform the baseline: KLD, SIM, NSS and CC (2.16, 0.27, 4.34 and 0.39 versus 3.17, 0.21, 2.92 and 0.28).

Original publication




Conference paper



Publication Date





361 - 374


Data augmentation, Fetal ultrasound, First trimester, Gaze tracking, Single frame saliency prediction, U-Net