The art of teaching machines to interpret and analyse medical images


Ever since the wonder invention of microscope by Anton van Leeuwenhoek and the discovery of the X-rays by Wilhelm Röntgen, the field of medical image analysis has developed into a life saving discipline, offering previously unattained opportunities for diagnosis. Medical images have, since then, played an important role in everyday medical practice. It is even regarded as one of the most critical skill to posses by any medical professional, as errors made in diagnosis pose a potential threat to patient safety.

Medical image analysis is often described as an acquired art, people are often amazed when an expert is able to correctly discern, in a single fixation, if a medical image has even a subtlest hint of abnormality. Researchers across the globe have attributed this level of super expertise to the extraordinary perceptual skills that expert medical practitioners posses, originating from their ability to coherently ignore irrelevant information and their ability to collectively see patterns in the data, rather than viewing them as separate pieces of unrelated information. This ability and expertise of these expert medical professionals has undoubtedly classified them as one of the most indispensable professionals on planet earth. But, experts are by definition, rare. As a consequence, there is an acute shortage of medical practitioners. Hence, the need for the hour is to build systems that aid them with sufficient information, data and inferences to reduce their time to analyse and make decisions.

We at SigTuple embarked into a journey to fill this lacuna. We wanted to aid medical practitioners by providing them with critical insights and analysis of medical images by teaching the machines to interpret, learn, relearn and analyse these medical images using state-of-the-art Artificial Intelligence (AI) and Machine Learning techniques. In this article, I wanted to share a few out of the umpteen hurdles that we had to tackle in our short journey so far.

For the uninitiated, it is almost certain that when someone talks about AI and machine learning, more often than not, they are bound to talk about data. Data is an ab-initio requirement for machine learning and AI, the whole concept of these two techniques rely on the data and correctness of the data that is fed to them.

At the outset, it is a no-brainer to consider medical data to be a purest form of “big data”. The data exhibits orders of magnitude higher complexities in all four characteristics of big data, namely,

(i) Volume : At one end of the spectrum, there is a need to analyse billions of particles in sub-nanometer size range, the other, requires us to analyse the entire human body consisting of billions of parameters (as depicted in figure 1). That’s not all, if we now multiply these billion data points by millions of patients who are being treated every day; It would surely be one humongous volume of data!

Figure 1 : A chart depicting the wide spectrum of scale of the objects versus the volume of the data-points obtained from those objects in various medical images.

(ii) Variety : While, on one hand there are data points that are completely digitised through high resolution imaging systems, on the other hand, there are critical data points like physical blood and urine slides that have little or no digitisation and then there are other unstructured pieces of data like handwritten note of report or diagnosis (depicted in figure 2) which are partially digitised. Not to forget, there are truckloads of temporal, spacial and spatio-temporal datasets that are generating considerable volumes due to recent advancements in medical imaging.

The biggest challenge in data variety, though, is not just the variety, it is the variety and lack of digital data too. To get a perspective, a few hundred million patients get their blood and urine samples tested everyday in India alone and less than 0.01% of these samples are available in digital form. Needless to say, without the data being available in the digital form, it is impossible to even think of electronic storage, let aside machine learning!

Figure 2 : A chart depicting the frequency of different tests versus the availability of the tests’ data in digital form.

(iii) Veracity : Each and every modalities of analysis phase have dependencies on not just the type of data, but also on gazillion other parameters including method of analysis, chemical reagents, stains, the thickness of the glass slide, etc. A sample depiction of data veracity is shown in figure 3. The only consistency that is evident across medical data is the veracity of the data points.

Figure 3 : Data veracity introduced due to various parameters in blood slides.

(iv) Velocity : With few hundred million patients undergoing one or the other form of diagnosis every day, the velocity of the data is undoubtedly fast – needless to say, the analysis of these data points also have to be efficient enough to have real-time or near real-time responses. What’s more intriguing though is that the response time have to get faster without any compromise on accuracy!

So, does handling the 4 Vs solve a major chunk of medical data analysis? We too felt the same, we thought that just solving these issues in data would solve most problems in making the machines learn to interpret and analyse medical images, however, it was far from true! As we delved deeper into constructing the knowledge base for the machine to learn about the patterns in the data, a basic assumption of any AI and Machine learning technique — the correctness of data was in quandary. We realised that there is so much of perception that goes into medical image analysis that the outcome is known to suffer from a lack of reproducibility and consistency. Do you realise that this is a huge problem at hand? Can we ever imagine fitting a model on a data that does not have a ground truth to either learn, validate or test?

Let me take an example, we wanted to seek opinions of two experts on the labels of a few White Blood Cells (WBCs) – so we shared the images of a few prospective WBCs to them, expecting that the two experts would concur on most of the data points, but for a few. To our surprise, there was a large variation in the labels assigned to the cells. The first row in the figure below shows a few conflicting labels that were assigned.

Figure 4 : An example of inter and intra-observer variability in assigning ground truth

Okay, oftentimes perception is bound to vary with time and environment, we thought we might be able to have the label-conflicts resolved if we seek their inputs on the same data again after a few weeks. Hola! it added more salt to the wound and uncovered another element in the pandora’s box of ground truth. Not only were the labels different between the experts, but, there were cases where the experts had conflicting changes with the labels that they had assigned previously (shown in the above figure); Essentially, this is a case of not just inter-observer variability, but, also intra-observer variability. It is a bigger problem than that what meets the eye — the challenge that this volatility poses is that the correctness of an analysis can neither be created nor verified. It is a curious case involving evolution of ground truth’s definition from “a never changing truth” to “an ever changing truth” !!

So, can you imagine a machine learning or a AI problem which has to solved using a dataset that has implicitly high variance due to external factors and whose ground truth is not a constant? Does the problem seem ridiculously challenging and unsolvable?

Nope! Not really, this is a part of our daily routine and perhaps only a snapshot of the myriad super complex data science challenges that we face at SigTuple.