Differences

This shows you the differences between two versions of the page.

--- biclustering [2011/06/06 12:31]
aamadoz [Filter]
+++ biclustering [2017/05/24 10:36] (current)
@@ Line 1: / Line 1: @@
+====== Introduction ======
+Different genes have different expression levels according to their specific function at each condition. **Biclustering** identifies groups of **genes with similar expression patterns under a specific subset of conditions**. These conditions may correspond to different time-points, for example in times series expression data.
+====== Input data ======
+The general structure of biclustering input data is a **numerical matrix of gene expression measurements**. Measurements of the same **gene** are supposed to be in the same **row** while measurements under the same **experimental condition** (same time point for instance) are assumed to be in the same **column**.
+The **first column** of the data file must have some **gene identifier** as for example the gene name, and the **first row** of your data must have some **time point identifier**. Note that the word GENES should be in the first column and the first row **without any comment symbol**.
+An example of biclustering input data file with the above mentioned structure would look like this:
+      GENES    condition1    condition2    condition3    condition4    condition 5
+      YAL001C  -0.19         -0.77         -0.17         -0.19         0.13
+      YAL002W  0.83          -0.01         -0.77         -0.62         0.14
+      YAL003W  -0.36         -0.22         0.22          -0.28         0.41
+      YAL004W  1.64          1.14          0.88          -0.07         0.03
+      YAL005C  1.55          1.58          1.34          0.01          0.53
+      ...      ...           ...           ...           ...           ...
+====== Methods ======
+===== Params =====
+You can select different options of how biclustering analysis should deal with missing values.
+Within **type of missing values** list there are the following options:
+  * **Remove**: it deletes the complete gene row if this gene has a missing value.
+  * **Jump**: it deletes the complete gene row if it affects the bicluster. Selected option by default.
+  * **Fill**: it assigns a value where a missing value is found. There are different methods to assign this new value:
+      * __Mean of genes__: the average of all the values of the same gene (row).
+      * __Mean of neighbors__: the average of the values of a selected number of neighbors.
+You can also compute patterns of opposite sign with **'compute sign changes'** option. By default, this option is not selected and your pattern results would be as the following image,
+{{:images:biclustering:biclustering_signs_false.jpeg|compute_sign_changes_false}}
+Computing sign changes would give you patterns with both signs as you can see in the following example,
+{{:images:biclustering:biclustering_signs_true.jpeg|compute_sign_changes_true}}
+===== Sorting =====
+Biclustering **results would be sorted by** different criteria:
+  * **Genes**: number of genes.
+  * **Time points**: number of conditions.
+  * **Size**: number of genes and number of conditions.
+  * **P-value**: bicluster p-value. Selected option by default.
+===== Filter =====
+Biclustering **results could be filtered by** different criteria:
+  * **Constant patterns**: it would not obtain those patterns in which genes do not change over time or conditions.
+  * **Minimum number of genes**: it computes those biclusters with at least a minimum number of genes.
+  * **Minimum number of time points**: it computes biclusters with at least a minimum number of conditions.
+  * **Maximum p-value**: it computes biclusters with a maximum p-value.
+  * **Minimum overlapping percentage**: it computes biclusters with a minimum percentage of overlapping between biclusters.
+====== Worked Examples ======
+===== Example 1. Cell cycle data =====
+  - Go to the Babelomics page and select Biclustering analysis from the //Expression// menu.
+  - Press //Online Examples//, select the **example number 1** and you will see how the parameters and form fields are now filled. As you can notice, this example is prepared to perform a biclustering analysis with default parameters but filters. Selected filters are a minimum of 5 conditions and a maximum p-value of 0.05.
+  - Press run, and wait for your job to be finished.
+  - When the process finishes, a new //green job// is shown at the right side of the web page. Press it to check your results.
+**Results**
+Visualization of the results would be as the following image,
+{{:images:biclustering:biclustering_results.jpeg|biclustering_example_1_results}}
+**Biclusters file** has information about all results, including a numerical matrix of expression measurements corresponding to the selected genes in each bicluster.
+In the **main box** of the visualization tool, you would see obtained biclusters. Within the **top menu** you could change how to visualize them,
+  * **Matrix**: a gradient color expression matrix. Option selected by default.
+{{:images:biclustering:matrix_view.jpeg|matrix_view}}
+  * **Expression**: a plot with gene expression values across all conditions.
+{{:images:biclustering:expression_view.jpeg|expression_view}}
+  * **Pattern**: shows the expression pattern regarding, in each step, the previous condition.
+{{:images:biclustering:pattern_view.jpeg|pattern_view}}
+  * **Trends**: shows the expression pattern.
+{{:images:biclustering:trend_view.jpeg|trend_view}}
+At the **menu on the right** there is a section with **general information** about the biclustering analysis, a section with **information of a selected bicluster**, a section where you can **manage different filters** and a **visualization section**.
+**Pattern coding** is the following,
+^U^D^N^
+|Up|Down|No change|
+**Questions**
+  * How many biclusters did you obtain?
+  * In these results, which is the minimum number of genes that you find in a bicluster? and the maximum number of genes?
+  * Select the pattern view at the top menu of the visualization box. Then, select the first bicluster (id:18191). How would you interpret this graph? What is your answer based on?
+  * Select the trend plot of this bicluster. Is it different from its pattern plot?
+  * In general, which are the differences between pattern plots and trends plots?
+Run the same example with different options from the Babelomics interface. Compare the results.
+===== Example 2. Heat stress data =====
+  - Go to the Babelomics page and select Biclustering analysis from the //Expression// menu.
+  - Press //Online Examples//, select the **example number 2** and you will see how the parameters and form fields are now filled. As you can notice, this example is prepared to perform a biclustering analysis with remove missing values, compute sign changes and filter biclusters with a minimum of 5 conditions and a maximum p-value of 0.05.
+  - Press run, and wait for your job to be finished.
+  - When the process finishes, a new //green job// is shown at the right side of the web page. Press it to check your results.
+**Questions**
+  * How many biclusters did you obtain?
+  * In these results, which is the minimum number of genes that you find in a bicluster? and the maximum number of genes?
+  * Go to page 2 of results and select the bicluster with id:85. Then, click on **'Bicluster info'** in the right menu. Are there any differences between derivative and trend patterns? What do those differences mean? In both patterns, how are gene expression values?
+  * Send this bicluster to FatiGO analysis. Are its genes functionally enriched?
+  * Then, click on **'Filter with these bicluster's genes'** in the right menu. You will see a panel to insert genes that will filter biclustering results. Click on **'Show filter preview'**. In how many biclusters are your list of genes? Which percentage of genes do you have in your list? Modify your list of genes and preview your filter.
+  * Click on **'Filter'** to obtain your filtered biclusters. If you want to manage and navigate through different filters, click on **'Manage filters'** in the **'Filters'** section of the rigth menu.
+====== References ======
+  * Sara C. Madeira, Miguel C. Teixeira, Arlindo L. Oliveira, Isabel Sá-Correia (2010) **Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithm**. [[http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.34|IEEE/ACM Transactions on Computational Biology and Bioinformatics]] 1 (7), pp. 153-165. {{:example_data:biclustering:references:madeira_et_al_2010.pdf|Pdf}}
+  * Madeira SC, Oliveira AL (2009) **A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series**. [[http://www.ncbi.nlm.nih.gov/pubmed/19497096|Algorithms for molecular biology]] 4:8.
+  * Madeira SC, Oliveira AL (2004) **Biclustering algorithms for biological data analysis: a survey**. [[http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1324618|IEEE/ACM Transactions on Computational Biology and Bioinformatics]] 1(1):24-45. {{:example_data:biclustering:references:madeira_oliveira_2004.pdf|Pdf}}
+  * A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, and P.O. Brown (2000) **Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes**. [[http://www.ncbi.nlm.nih.gov/pubmed/11102521|Molecular Biology of the Cell]] vol. 11, pp. 4241-4257.
+  * S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church (1999) **Systematic Determination of Genetic Network Architecture**. [[http://www.nature.com/ng/journal/v22/n3/abs/ng0799_281.html|Nature Genetics]] vol. 22, pp. 281-285.