SIFT (Sorting Intolerant From Tolerant)


Back to catalogue >>

Reference: Kumar P., Henikoff S., Ng P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols. (2009) 4 (7) 1073-1081.
Hosted: Hosted at the J. Craig Venter Institute - a world leader in genomic research. Funding is expected to be ongoing. (http://sift.jcvi.org/


Summary:
SIFT is a popular web-based tool that uses sequence homology from multiple sequence alignments (MSAs) to predict if amino acid substitutions would be tolerated or damaging.

Methodology:
• Amino acid distributions at each alignment column are combined with a probability matrix to calculate normalised probabilities for every possible substitution.
• These probabilities are used to partition ‘tolerated’ (normalised probability >0.05) from ‘damaging’ (normalised probability <=0.05) substitutions.

Version options:
• SIFT Human Genome DB - provides predictions for a list of chromosome positions and alleles.
• SIFT Human Protein DB - provides predictions for all Ensembl transcripts with an assigned ENSP number.
• SIFT dbSNP DB - provides predictions for all SNPs in NCBI’s dbSNP.
• *SIFT Single Protein Tools - predictions on a single protein of interest. (* recommended to minimise user-error)

Alignment options:
SIFT can generate a MSA using a subset of PSI-BLAST results. It aims to select sequence homologues with similar functions. The user can also submit an alignment to be used with SIFT and the use of carefully curated alignments is generally recommended.

Other options:
For general diagnostic use these default settings should be adequate.

Database
: If allowing SIFT to generate the MSA, the database (DB) searched can be selected.
• SWISS-PROT - small but high quality. May not contain enough sequences for prediction.
• SWISS-PROT/TrEMBL - larger than SWISS-PROT but good quality.
• UniRef90 (default) - a clustered sequence DB from UniProt knowledgebase.
• ncbi nonredundant - largest set but can be poor quality. Only use if there you get a warning that there aren’t enough sequences to make a prediction using the other datasets.

Median conservation of sequences
: (default 3.00 recommended) During the PSI-BLAST iteration, this threshold limits the sequence diversity between homologues.

Remove sequences
: (default 90% recommended) This removes homologous sequences too similar to the query that may lead to substitutions being predicted as ‘tolerated’.