Medical Video Analytics and Deep Learning


A wise person once said “A picture speaks a thousand words” citing the amount of information that is self-contained in a picture. Along the same lines, it wouldn’t be grossly wrong to say that “A video illustrates a thousand pictures”. Needless to say, the amount of information that can be extracted from a video is multifold times that of an image.

Video analytics has been a hot topic of research in the recent-past,mostly owing to a world which is surrounded by so many intelligent video-capturing devices, capturing data about how we live and what we do. With the recent advances in computer vision, we now have the ability to mine such massive visual data to obtain valuable insight about what is happening in the world, however, analysing the video content, extracting valuable information and enabling a machine to “learn” about the context of the video content have spawned out as new research directions.

In the medical diagnosis space, video analytics has mostly taken a back-seat when compared to image analysis. But, there is no doubt that video analytics will unambiguously be one of the biggest value additions in various areas of medicine and diagnosis space.

We at SigTuple, have researched on application of video analytics and applied our techniques in real-world for aiding diagnosis and solving complex problems. In this blog, we would like to share our experience and outline some of the challenges with video analytics in the field of Andrology for human semen analysis.

To begin with, semen analysis is one of the core aspects in infertility investigations for measurement of male fertility in clinical Andrology. Our capability, Aadi, uses video analytics to estimate pivotal parameters of ejaculated human spermatozoa like estimating the count of motile sperm and making a comment on their motility (velocity, speed, wobble, etc.), concentration and morphological characteristics.

While, there have been a lot of research on sperm tracking using video, the key challenges in semen analysis have been discussed below.

Object Detection The aim is to precisely locate objects-of-interest in video frames. While, detection of objects alone is much simpler; the challenge is to detect objects-of-interest. In semen analysis, one tough nut to crack was to differentiate between a sperm over pus cells and spermatids (immature sperms). To be metaphorical, it is like building a face-detector by analysing videos from a moving bus; which is likely to be plagued by variable appearances, texture, complex motion, blurryness and frequent occlusions.

Object Tracking For the uninitiated, the motion of a sperm is so complex that it is difficult to guess a “plausible trajectory”. Next, the objects(sperms) are similar (if not identical), the idea of using a single generic feature method can hardly distinguish between different objects of interest; finally, frequent occlusions caused due to movement of sperms across different depths of the suspension is an extremely challenging task. It generally leads to fragmentation in trajectory, adding more difficulties in arriving at a one-to-one mapping for each sperm across the time-lapse of the video.

Calculating Geometric Measurements While object detection and tracking seems tough at par, the degree of difficulty quadruples, if we want to estimate and calculate the geometric measurements of the object. Not to mention, the geometric measurements need to accurate, even though, the scene is plagued with frequent occlusions. For example, a sperm which is ovular in shape might appear circular; or a spermatid which is circular in shape might appear ovular; and what’s more? a sperm with an occluded tail-segment might look no different than a spermatid or a pus cell!

Phew! If that sounds super complex? We’re happy to claim that we’ve solved this complexity without compromising on the accuracy! How we’ve solved is an epic of its own — we will share the details of our solution in a different blogpost. In a nutshell, we applied our expertise in deep learning to model high-level visual concepts and in modelling temporal dynamics in videos. We are now able to boost video analysis performance significantly by building discriminative, compact, and efficient to compute features using a conventional 2D convolutional network and build a context connection wrapper across the temporal space applied on consecutive frames of the video.

Comments and brickbats are as always welcome!


Image Credits :