PolyPhen (Polymorphism Phenotyping)


Back to catalogue >>

Reference: Ramensky V., Bork P., Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic acids research (2002) 30 (17) 3894-3900.
Hosted: Developed by the Bork group (EMBL) and Sunyaev lab and hosted by Harvard Medical School – This tool is no longer maintained or updated (http://genetics.bwh.harvard.edu/pph/)

Summary:
PolyPhen predicts the effect of nsSNPs by combining a number of properties relating to the structure and function of the encoded protein.

Methodology:
• A sequence-based characterisation of the site is performed to identify any functionally important regions that may be substituted (e.g. an active site).
• Mutations at these sites are assessed against a substitution matrix to evaluate their effect.
• Homologous sequences are also obtained from a BLAST search against the NCBI non-redundant database to calculate position-specific independent counts (PSICs).
• Structural features such as amino acid atomic contacts and solvent accessibility are also assessed and empirically determined cutoffs used to predict if the substitution is ‘probably damaging’, ‘possibiliy damaging’ or ‘benign’.

Input:
The user can provide a UniProt ID or copy and paste the query sequence in FASTA format. The substitution (one per run) is also required.

Additional options:
• The structural database used can be either PDB (a database of protein structures) or PQS (a database of protein structures in complexes). PDB is likely to be quicker but PQS is the default as it may provide additional residue contact information.
• BLAST hits can be sorted by e-value (default) or sequence identity.
• Map to mismatch – By default, a hit is rejected if its amino acid at the corresponding position differs from the amino acid in the input sequence. Changing to ‘YES’ should be done with caution and only where structural homologues cannot be found.
• Structural parameters can be calculated for the first hit or all hits.
• Contacts can also be calculated for the first hit or all hits.
• Sequences with a minimum alignment length can be filtered out.
• Maximum gap length can be changed to filter out more distant homologues.
• The threshold for residue contacts can be changed from the default of 6Å.