Annotation is the process of assigning functional categories to gene or gene products. In Blast2GO this assignment is done for each sequence based on the information available for the homologous sequences retrieved by Blast. Blast2GO annotations proceeds through a 2 steps strategy:
For the first step, Blast results are parsed and the identifiers of the Blast hits are found and used to query the Gene Ontology database to recover associated functional terms. Also the evidence code of each particular annotation is recovered. The evidence codes indicate how the functional assignment in the Gene Ontology database has been obtained. For example, an evidence code “inferred by direct assay” indicates that the assignment of that funcion to that gene was done based on some experimental assay. This annotation is therefore of high value. If the evidence code is “electronic annotation”, means that the annotation was generated by automatic methods without human intervetion, and therefore is more prone to be erroneous.
Once all this information is gathered, and annotation score is computed for each {GO,Query Sequence} pair and the GO is assignmet to the Query sequence if its annotation score is under a given threshold provided by the user AND there is no children term with a sufficient annotation score. The annotation score is computed as:
Annotation score{GO, Seq} = (max.sim * ECw) + (#GO-1 * GOw)
where:
EXAMPLE
Consider a given query sequence with three hit sequences,with the following GO terms:
Hit sequence 1: 60% similarity; One GO term : GO1 with Evidence Code = IDA Hit sequence 2: 65% similarity; One GO terms: GO2 with Evidence Code = ISS Hit sequence 3: 67% similarity; One GO terms: GO3 with Evidence Code = IEA GO2 and GO3 are brother terms with parent term GO4
Let compute the Annotation Score (AS) and annotation ouput in a number of scenarios:
AS(GO1) = (60 * 1) + (1-1 * 0) = 60 > 55 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 0.8) + (1-1 * 0) = 48 < 55 --> GO2 is NOT transfered AS(GO3) = (67 * 0.7) + (1-1 * 0) = 52 < 55 --> GO3 is NOT transfered AS(GO4) = (67 * 0.7) + (2-1 * 0) = 52 < 55 --> GO4 is NOT transfered
AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 55 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 0.8) + (1-1 * 5) = 48 < 55 --> GO2 is NOT transfered AS(GO3) = (67 * 0.7) + (1-1 * 5) = 52 < 55 --> GO3 is NOT transfered AS(GO4) = (67 * 0.7) + (2-1 * 5) = 58 > 55 --> GO4 is transfered
AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 50 –> GO1 is transfered to the query sequence
AS(GO2) = (65 * 0.8) + (1-1 * 5) = 52 > 50 --> GO2 is transfered to the query sequence AS(GO3) = (67 * 0.7) + (1-1 * 5) = 47 < 50 --> GO3 is NOT transfered AS(GO4) = (67 * 0.7) + (2-1 * 5) = 52 > 50 --> GO4 is NOT transfered (transferred child)
AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 55 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 1) + (1-1 * 5) = 65 > 55 --> GO2 is transfered AS(GO3) = (67 * 1) + (1-1 * 5) = 67 > 55 --> GO3 is transfered AS(GO4) = (67 * 1) + (2-1 * 5) = 72 > 55 --> GO4 is NOT transfered (transferred child)
OTHER FINE-TUNNING PARAMETERS
Additionally, a number of filters can be used to fine-tune the annotation:
Generate the GO functional annotation for 500 citrus genes and analyse the annotation results (takes some 3 to 5 minutes).