MutationTaster


Back to catalogue >>

Reference: Schwarz J.M., Rödelsperger C., Schuelke M., Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods (2010) 7 (8) 575-576.
Hosted: Developed and maintained by the Charité – Universitätsmedizin Berlin. (http://www.mutationtaster.org/)

Summary:
MutationTaster is a fast web-based application to evaluate DNA sequence variants using information from various sources combined and evaluated in a naive Bayes classifier.

Methodology:
• Conservation is analysed using an alignment of orthologues
• Splice site analysis is done using NNSplice on a window 60 bases around the mutation site.
• Potential alterations in polyadenylation signals are assessed using the program polyadq
• Any alterations in the Kozak consensus sequence is assessed
• The relationship between amino acid and specific protein features is assessed as well as any changes in protein length
• Data from dbSNP and HapMap are integrated

A number of data sources are used:
• DNA and protein sequences are retrieved from Ensembl
• Protein features are retrieved from SwissProt/UniProt
• SNPs information is from dbSNP mapped onto human genome build GRCh37
• Genotype frequencies are from HapMap

A naïve Bayes classifier is trained with known polymorphisms and disease mutations. Different models are used for synonymous, non-synonymous and complex alterations (amino acid changes that may result in frameshifts, premature stop codons etc.)

Input:
• A gene transcript must be entered. This can be selected from a list by supplying the HGNC symbol, NCBI GeneID or Ensembl gene ID. An Ensembl transcript ID can also be entered directly.
• Codon sequence, transcript or gene must be specified.
• The alteration can be specified by either entering a few nucleotides around the mutation, entering the position of a single base change or the position of an insertion or deletion.

The scripts can be downloaded to be run locally on a unix-based system. This enables batch queries that may be required as part of a next generation sequencing pipeline.

Output:
A prediction is given as either ‘disease-causing’ or ‘polymorphism’ along with a P value indicating the security of the prediction (with 1 being most secure). A breakdown of all of the components that contribute to the prediction is also provided.