Updates

Duncan NRI New Computational Tools

The researchers at the Jan and Dan Duncan Neurological Research Institute (NRI) have developed the following computational tools that speed up the pace of scientific discovery all over the world:

Parmesan


A new natural language processing (NLP) tool - PARsing ModifiErS via Article aNnotations (PARMESAN) - can search for up-to-date information, assemble it into a central knowledge base, and even predict likely drugs that could correct specific protein imbalances. This artificial intelligence (AI)-powered tool scans through public biomedical literature databases (PubMed and PubMed Central), to identify and rank descriptions of gene-gene and drug-gene regulatory relationships. However, what stands out about PARMESAN in particular is its ability to leverage curated information to predict undiscovered relationships.

Openseize


A new open-source software to analyze large-scale one-dimensional digital signals such as EEG and electromyograph. Openseize provides instructions for extracting the right-sized chunk of data from the file on disk (‘data producer’) that can be easily stored in any average computer’s memory. Using this data producer, users can then customize processing pipelines that will operate on each chunk of the data to extract important biomarkers such as frequency densities, spike rates, and much more. Once complete, this sequence of steps is repeated for other snippets until the entire dataset is analyzed. The analyzed segments are then assembled into the final product. This open-source software has much wider applicability beyond analyzing electrical signals in the brain and muscles. In theory, it can be used to analyze any one-dimensional digital signals such as audio signals, which means there are many more ways one could explore using this software to analyze biomedical and other data in the future.

GeneEMBED


This tool compares the functional perturbations induced in gene interaction network neighborhoods in complex, polygenic diseases. In the pilot study of two independent Alzheimer's disease cohorts of 5,169 exomes and 969 genomes, GeneEMBED identified novel candidates. These were differentially expressed in post mortem AD brains and modulated neurological phenotypes in mice. Four that were differentially overexpressed and modified neurodegeneration in vivo are PLEC, UTRN, TP53, and POLD1. Notably, TP53 and POLD1 are involved in DNA break repair and inhibited by approved drugs. While these data show proof of concept in AD, GeneEMBED is a general approach that should be broadly applicable to identify genes relevant to risk mechanisms and therapy of other complex diseases.

PolyA-miner


More than half of human genes undergo alternative polyadenylation (APA) and generate mRNA transcripts with varying lengths. Increasing awareness of APA’s role in human health and disease has propelled the development of several 3’ sequencing (3’Seq) techniques. Despite the recent data explosion, computational tools that are precisely designed for 3’Seq data are not well established. PolyA-miner is developed specifically for 3’Seq data, it accounts for all non‐proximal to non‐distal APA switches using vector projections and reflect precise gene level 3’UTR changes. PolyA-miner is less susceptible to inherent data variations can also to effectively identify novel APA sites that are otherwise undetected using reference-based approaches. With the emerging importance of alternative polyadenylation in studying human diseases, PolyA-miner can significantly accelerate data analysis and help decoding the missing pieces of underlying alternative polyadenylation dynamics. 

CrypSplice


Alternative splicing of RNA is the key mechanism by which a single gene codes for multiple functionally diverse proteins. Several studies established compromised RNA homeostasis (splicing errors) and identified previously unknown class of exons, ‘cryptic’ exons, in RNA transcripts. These cryptic exons are often associated with various neurological disorders and cancers. Genome-wise detection of cryptic splice sites can facilitate a comprehensive understanding of the underlying disease mechanisms and develop therapeutic strategies. CrypSplice can effectively quantify and evaluate cryptic splicing patterns from RNASeq data using a beta‐binomial distribution model. CrypSplice, revealed extensive cryptic slicing in Amyotrophic lateral sclerosis and Spinocerebellar ataxia models.

ModelMatcher


Researchers at the Jan and Dan Duncan Neurological Research Institute (Duncan NRI) at Texas Children’s Hospital have created and launched ModelMatcher, a new virtual global networking/matchmaking platform designed to facilitate and enhance the pace of preclinical discovery and therapeutic development for numerous new and existing genetic disorders. ModelMatcher is the brainchild of Duncan NRI investigators and Baylor College of Medicine faculty, Dr. Shinya Yamamoto and Dr. Zhandong Liu. Dr. Yamamoto is a fruit fly biologist, and Dr. Liu is a computational biologist. Their teams, including Duncan NRI and Baylor graduate student, Michael Harnish, and staff programmers Lucian Liu and Seon Young Kim, created ModelMatcher as a common online space where diverse stakeholders could come together to connect, interact, generate new ideas, innovate and share discoveries with the common goal of improving the lives of patients and families.

Aminode


A user-friendly web tool, developed by Dr. Marco Sardiello’s team, for the routine and rapid inference of evolutionarily constrained regions (ECRs), hallmark for the sites of critical importance for a protein’s structure or function. Aminode is pre-loaded with the results of the analysis of the whole human proteome compared with proteomes from 62 additional vertebrate species. The profiles of the relative rates of amino acid substitution and ECR maps of human proteins can be searched and downloaded from Aminode website.

CRISPRcloud


A user-friendly, cloud-based, data analysis pipeline that was co-developed by researchers in the laboratories of Drs. Zhandong Liu and Huda Zoghbi, for the deconvolution of pooled CRISPR screening datasets. This tool serves a dual purpose of extracting, clustering and analyzing raw next generation sequencing files derived from pooled screening experiments and presents them in a user-friendly way on a secure web-based platform.

MARRVEL


A publicly available website, co-developed by Dr. Hugo Bellen’s team and researchers in the Undiagnosed Disease Network, that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL curates information from model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage.