PolyPhen-2 (Polymorphism Phenotyping version 2)


Back to catalogue >>

Reference: Adzhubei I.A., Schmidt S., Peshkin L., Ramensky V.E., Gerasimova A., Bork P., Kondrashov A.S., Sunyaev S.R. A method and server for predicting damaging missense mutations. Nat Methods 7 (2010) (4) 248-249.
Hosted: Developed by the Sunyaev lab at Harvard Medical School. (http://genetics.bwh.harvard.edu/pph2/)

Summary:
PolyPhen-2 uses sequence- and structure-based information to predict the effect of variants using a Bayesian approach.

Methodology:
• Clustered and refined MSA are created to identify any functional annotation, for example in the location of the variant position.
• It also calculates profile- and identity-based scores which are combined with structural properties such as solvent accessibility, hydrophobic propensity and B-factor as well as a number of other features. All of these properties are combined using two Bayesian probabilistic models, HumDiv and HumVar, each trained on different datasets.
HumDiv has been compiled using the differences between all damaging alleles with known effects on function that lead to Mendelian diseases in UniProtKB, together with non-damaging differences present between human and mammalian orthologues.
HumVar has been trained using the differences between human disease-causing mutations in UniProtKB and common human nsSNPs with (MAF>1%) with no disease-associated annotation.
• For Mendelian disease diagnostics, the HumVar model is recommended as it should distinguish mutations with drastic effects from normal human variation. The HumDiv model is recommended for identifying variants where even mildly deleterious alleles are treated as damaging.
• False positive and true positive rates are calculated for the prediction and thresholds applied to give qualitative predictions of ‘benign’, ‘possibly damaging’ or ‘probably damaging’.

Additional options:
• BLAST hits can be sorted by e-value (default) or sequence identity.
• Map to mismatch – By default, a hit is rejected if its amino acid at the corresponding position differs from the amino acid in the input sequence. Changing to ‘YES’ should be done with caution and only where structural homologues cannot be found.
• Structural parameters can be calculated for the first hit or all hits.
• Contacts can also be calculated for the first hit or all hits.
• Sequences with a minimum alignment length can be filtered out.
• Sequences with a minimum identity in the alignment can be filtered out.
• Maximum gap length can be changed to filter out more distant homologues.
• The threshold for residue contacts can be changed from the default of 6Å.