Manthana: Our Indigenous AI Platform

separator

We at SigTuple, are building 5 different startups.

Actually, we build smart screening solutions for the medical industry – to analyse peripheral blood smears, urine, semen, retinal scans and chest x-rays. These are 5 different startups right there if you look at it. We get asked often – “How are you able to work and deliver on multiple products in such a short amount of time?”. Everyone is surprised that we’re even thinking of attempting more than one.

Of course, that’s because we have a stellar team. But let me introduce you to the only non-human member of that team, and the one that does all the heavy lifting – Manthana. Named after a seminal episode from Indian mythology, Manthana is our home grown data and AI platform.

“AI Platform” is the new “Big Data” – something everyone claims to be doing, but very few actually do. In a conventional sense, an AI platform is something which allows you to train models using various techniques on varied data. For SigTuple, such a restricted definition will hardly solve anything. Manthana enables five major capabilities for us – data management, annotations, model training, continuous learning, and analysis/reporting.

Data Management

It’s a capital mistake to theorise before one has data.
Sherlock Holmes, A Study in Scarlet

For most people working on AI and machine learning, data is usually logs pulled out from servers or the content uploaded by users of the product. SigTuple has multiple partners sharing visual medical data with us (images, videos, digitised slides, x-rays, scans etc.). Corresponding medical reports may or may not be shared. Management of this data is very critical. Apart from basic bookkeeping, we need to know what’s the quality of each data entity coming in – whether it’s usable or not. Since it can’t be done manually, checking for this quality might itself involve computer vision or even AI techniques. Once data is ingested, we need to be able to access it smartly – searching visually, on quality metrics and on associated metacontent. All these capabilities around ingestion, management, quality and access of data in SigTuple are enabled by Manthana – saving us tons and tons of work and making life much easier.

Annotations

Water, water everywhere;
Nor any drop to drink
Samuel Taylor Coleridge, The Rhyme of the Ancient Mariner

Once data has been ingested into the system, model training and other analysis still cannot commence, because the data is not annotated. Annotated medical visual data is a mirage that everyone is chasing because it does not exist at all. Creating good quality annotated medical data sets is a painful process because of something called “inter-observer variability”. That’s a concise and polite way of saying that there’s a lot of grey area in all verticals of medicine and so a group of medical specialists may annotate the same image with different labels. Each of the thousands of images we ingest needs to be annotated by a panel of consulting medical specialists and a consensus mechanism needs to be used to come up with the final annotated dataset. All such activity needs to be tracked with an effective audit trail for backtracking and regulatory requirements. The entire orchestration is handled for us by Manthana (thankfully!) with very few manual checkpoints in the workflow. The collection of annotated visual medical data sets being built by SigTuple through this process is a gold mine in itself.

Model Training

[bzzz]
Neo: I know kung-fu.
Morpheus: Show me.
The Matrix

The model training process at SigTuple is also a little different than usual. At least for the near future, it’ll rarely be the case that have all the annotated data required for a particular model right at the onset of training. As we keep adding more data partners and the annotation workflows keep churning out validated annotations, the available training data will keep growing. Data scientists at SigTuple need to employ techniques to incrementally train existing models (by frequent additions to training data) and improve them. So right off the bat, there’s a need for a versioning system for models with metrics to showcase the difference in performance of two different versions. Each version of a model is invoked on an unseen testbed to validate its performance. The validation testbed itself is not static and will grow along with the training set as and when new data and variants are ingested into the system. Manthana handles versioning of the model itself as well as of the different kinds of datasets associated with it (training, validation etc.). Each entry in the dataset can be traced back to the source from where it came and the list of consultants who annotated it.

Any solution that we’re building, whether for blood or retinal scans, will not be a single model solution. It will employ a host of models each solving a specific problem and working in tandem along with many statistical and computer vision modules to arrive at the final report. Apart from tracking training data, versions, performance metrics for all these modules, Manthana also provides data scientists at SigTuple an ever-growing repository of all kinds of deep learning, machine learning and computer vision techniques which they can employ to implement a specific model efficiently and quickly.

Continuous Learning

“A bruise is a lesson… and each lesson makes us better.”
George R.R. Martin, A Game of Thrones

Manthana is a continuous learning platform. Performance metrics for each model employed in our solutions are tracked continuously through verification of output and the feedback loop provided in reports for corrections. Along with periodic incremental training, if a dip in performance is seen, a new training run is triggered automatically which first creates verification sets for the consultant panel on unseen data and uses their verification and corrections to augment the existing training data for the model to improve its performance in the next version. If multiple training runs of the model do not show improvement, the data scientist is prompted to either try a different model configuration or a different technique. These capabilities and workflows allow us to continuously improve each model effectively and allows us to make these improvement available instantly to our users.

Apart from continuous improvement of models, Manthana also needs to have the automated workflows to suggest addition of new labels for existing models or adding new capabilities by introducing new models into the solution (e.g. adding malaria detection to the regular blood smear analysis). These automatic upgrades to our solutions have helped us wow our beta users. This is a totally new paradigm which the medical community has never been used to since the norm is still expensive and monolithic machines which need to be replaced by even more expensive but still monolithic machines if an incremental capability needs to be added.

Analysis & Reporting

Skadoosh!
— Po, Kung Fu Panda

Manthana is not just catering to the internal workflows of SigTuple but also supports the commercial analysis pipelines for all our solutions. Every solution is modelled as a directed acyclic graph of various computer vision, statistical and AI modules, each solving a specific problem and contributing to some part of the final report. After collation and analysis of findings from each module, an insightful and interactive report is made available to medical institutions and specialists, who are our end users. Each metric in the report is backed by enough visual evidence which obviates the need for the specialist to look at the physical scan or sample. These reports can be consumed on any computing device and hence enable tele-medicine. The specialist can make any required corrections on the report before signing it off to be released to the patient. The corrections of course are fed back into the continuous learning process. Manthana supports a comprehensive reporting framework to enable these capabilities.

All infrastructure required to run these analyses in near realtime is managed by Manthana with auto-scaling capability built right into it. It also needs to make decisions on what modules need special resources (e.g. GPUs) to execute and allocates them accordingly, optimising overall costs and analysis time.

We realised early on that the deep integration of these capabilities and workflows that we have envisioned cannot be accomplished by any AI platform out there. Therefore, Manthana was born. All in all, Manthana is indispensable to anything and everything that SigTuple is building, which is why we claim that we’re not building five products, but just one – Manthana. There tons of work still to be done but it’s safe to say that the evolution of Manthana will be a key aspect in the story of SigTuple.

Image credit: http://pre12.deviantart.net/ea0f/th/pre/i/2011/237/d/3/hal9000stockby_powerbuldog-d47sjlx.png