Development of an AI-assisted algorithm for the prediction of novel causal genes and variants for mendelian disorders from whole genome sequencing Advances in DNA sequencing technologies have now enabled the rapid and cost-efficient identification of casual genes and variants for a number of diseases. This is especially true  for Mendelian  disorders, where patients who carry a causative variant in their genome, can finally obtain  a definitive  diagnosis on  their disease. However, even  with this revolutionary technology, the actual success  rate of  genetic  diagnosis via  next generation  sequencing is currently only at  around 30% for undiagnosed Mendelian disease cases. This is in part due to the limitations of the analytical methods that are available to identify and prioritize  casual variants from the vast amounts of sequencing data generated. Currently, the genetic diagnosis of Mendelian disorders is performed by comparing the genome of a patient to those of a large number of controls. Such comparisons generally produce a large list of genetic variants that are unique to the patient. Many of these are probably benign and identifying the causal gene and  variant can be a real challenge. To address this problem, we have developed a novel method thatranks candidate genes and variants  using  an AI-assisted algorithm that  relies  on IBM Watson’stext  mining approach. As a proof of concept, we used a large whole-genome sequencing (WGS) dataset on Retinitis pigmentosa (RP) with 523 cases and  2,143 controls. Our method consists of the following steps:
1) Select the  inclusion criteria  of variants to  maximize  the difference  between  true positive  rate  for  patients and  false  positive  rate  for  controls based  on  previously known causal genes from a public database.
2) Using this inclusion criteria, create a list of candidate genes  and variants.
3) Use IBM Watson to sort and prioritize this list of genes. Using this strategy on the RP WGS dataset, we were able to identify and priority 994 candidate genes. Notably, many of our top ranked genes shared structural and  functional features with previously known RP genes. We also succeeded in increasing the diagnosis rate of  RP from  37% to  52% by incorporating these top  ranked candidates without increasing the rateof false positives in controls. Going  forward  we  plan  to  further improve  the  approach  by  integrating other  AI technologies that rely on omics or image analysis data. We also plan to develop a gene and  variant  registry with  the  aim of  constructing  a  comprehensive infrastructure in Japan for  studying  the  genetics  of intractable  diseases. In this registry, various AI technologies will be implemented to perform integrative analyses across various diseases.  |