Table of Contents

Annotation with Blast2GO

Introduction

Annotation is the process of assigning functional categories to gene or gene products. In Blast2GO this assignment is done for each sequence based on the information available for the homologous sequences retrieved by Blast. Blast2GO annotations proceeds through a 2 steps strategy:

  1. All GO terms for the Blast hit sequences are collected
  2. A selection of terms in done from this original pool to extract the most reliable annotation

For the first step, Blast results are parsed and the identifiers of the Blast hits are found and used to query the Gene Ontology database to recover associated functional terms. Also the evidence code of each particular annotation is recovered. The evidence codes indicate how the functional assignment in the Gene Ontology database has been obtained. For example, an evidence code “inferred by direct assay” indicates that the assignment of that funcion to that gene was done based on some experimental assay. This annotation is therefore of high value. If the evidence code is “electronic annotation”, means that the annotation was generated by automatic methods without human intervetion, and therefore is more prone to be erroneous.

Once all this information is gathered, and annotation score is computed for each {GO,Query Sequence} pair and the GO is assignmet to the Query sequence if its annotation score is under a given threshold provided by the user AND there is no children term with a sufficient annotation score. The annotation score is computed as:

                       Annotation score{GO, Seq} = (max.sim * ECw) + (#GO-1 * GOw)

where:

EXAMPLE

Consider a given query sequence with three hit sequences,with the following GO terms:

      Hit sequence 1: 60% similarity; One GO term : GO1 with Evidence Code = IDA
      Hit sequence 2: 65% similarity; One GO terms: GO2 with Evidence Code = ISS
      Hit sequence 3: 67% similarity; One GO terms: GO3 with Evidence Code = IEA
                  GO2 and GO3 are brother terms with parent term GO4

Let compute the Annotation Score (AS) and annotation ouput in a number of scenarios:

           AS(GO1) = (60 * 1) + (1-1 * 0) = 60 > 55 --> GO1 is transfered to the query sequence
           AS(GO2) = (65 * 0.8) + (1-1 * 0) = 48 < 55 --> GO2 is NOT transfered
           AS(GO3) = (67 * 0.7) + (1-1 * 0) = 52 < 55 --> GO3 is NOT transfered
           AS(GO4) = (67 * 0.7) + (2-1 * 0) = 52 < 55 --> GO4 is NOT transfered
         AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 55 --> GO1 is transfered to the query sequence
         AS(GO2) = (65 * 0.8) + (1-1 * 5) = 48 < 55 --> GO2 is NOT transfered
         AS(GO3) = (67 * 0.7) + (1-1 * 5) = 52 < 55 --> GO3 is NOT transfered
         AS(GO4) = (67 * 0.7) + (2-1 * 5) = 58 > 55 --> GO4 is transfered

AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 50 –> GO1 is transfered to the query sequence

         AS(GO2) = (65 * 0.8) + (1-1 * 5) = 52 > 50 --> GO2 is transfered to the query sequence
         AS(GO3) = (67 * 0.7) + (1-1 * 5) = 47 < 50 --> GO3 is NOT transfered
         AS(GO4) = (67 * 0.7) + (2-1 * 5) = 52 > 50 --> GO4 is NOT transfered (transferred child)
         AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 55 --> GO1 is transfered to the query sequence
         AS(GO2) = (65 * 1) + (1-1 * 5) = 65 > 55 --> GO2 is transfered
         AS(GO3) = (67 * 1) + (1-1 * 5) = 67 > 55 --> GO3 is transfered
         AS(GO4) = (67 * 1) + (2-1 * 5) = 72 > 55 --> GO4 is NOT transfered (transferred child)

OTHER FINE-TUNNING PARAMETERS

Additionally, a number of filters can be used to fine-tune the annotation:

Exercise

Generate the GO functional annotation for 500 citrus genes and analyse the annotation results (takes some 3 to 5 minutes).

How to cite Blast2GO ?