This shows you the differences between two versions of the page.
biclustering [2011/06/06 12:31] aamadoz [Filter] |
biclustering [2017/05/24 10:36] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Introduction ====== | ||
+ | Different genes have different expression levels according to their specific function at each condition. **Biclustering** identifies groups of **genes with similar expression patterns under a specific subset of conditions**. These conditions may correspond to different time-points, for example in times series expression data. | ||
+ | |||
+ | ====== Input data ====== | ||
+ | |||
+ | The general structure of biclustering input data is a **numerical matrix of gene expression measurements**. Measurements of the same **gene** are supposed to be in the same **row** while measurements under the same **experimental condition** (same time point for instance) are assumed to be in the same **column**. | ||
+ | |||
+ | The **first column** of the data file must have some **gene identifier** as for example the gene name, and the **first row** of your data must have some **time point identifier**. Note that the word GENES should be in the first column and the first row **without any comment symbol**. | ||
+ | |||
+ | An example of biclustering input data file with the above mentioned structure would look like this: | ||
+ | |||
+ | GENES condition1 condition2 condition3 condition4 condition 5 | ||
+ | YAL001C -0.19 -0.77 -0.17 -0.19 0.13 | ||
+ | YAL002W 0.83 -0.01 -0.77 -0.62 0.14 | ||
+ | YAL003W -0.36 -0.22 0.22 -0.28 0.41 | ||
+ | YAL004W 1.64 1.14 0.88 -0.07 0.03 | ||
+ | YAL005C 1.55 1.58 1.34 0.01 0.53 | ||
+ | ... ... ... ... ... ... | ||
+ | |||
+ | ====== Methods ====== | ||
+ | |||
+ | ===== Params ===== | ||
+ | |||
+ | You can select different options of how biclustering analysis should deal with missing values. | ||
+ | |||
+ | Within **type of missing values** list there are the following options: | ||
+ | * **Remove**: it deletes the complete gene row if this gene has a missing value. | ||
+ | * **Jump**: it deletes the complete gene row if it affects the bicluster. Selected option by default. | ||
+ | * **Fill**: it assigns a value where a missing value is found. There are different methods to assign this new value: | ||
+ | * __Mean of genes__: the average of all the values of the same gene (row). | ||
+ | * __Mean of neighbors__: the average of the values of a selected number of neighbors. | ||
+ | |||
+ | You can also compute patterns of opposite sign with **'compute sign changes'** option. By default, this option is not selected and your pattern results would be as the following image, | ||
+ | |||
+ | {{:images:biclustering:biclustering_signs_false.jpeg|compute_sign_changes_false}} | ||
+ | |||
+ | Computing sign changes would give you patterns with both signs as you can see in the following example, | ||
+ | |||
+ | {{:images:biclustering:biclustering_signs_true.jpeg|compute_sign_changes_true}} | ||
+ | |||
+ | ===== Sorting ===== | ||
+ | |||
+ | Biclustering **results would be sorted by** different criteria: | ||
+ | * **Genes**: number of genes. | ||
+ | * **Time points**: number of conditions. | ||
+ | * **Size**: number of genes and number of conditions. | ||
+ | * **P-value**: bicluster p-value. Selected option by default. | ||
+ | |||
+ | ===== Filter ===== | ||
+ | |||
+ | Biclustering **results could be filtered by** different criteria: | ||
+ | |||
+ | * **Constant patterns**: it would not obtain those patterns in which genes do not change over time or conditions. | ||
+ | * **Minimum number of genes**: it computes those biclusters with at least a minimum number of genes. | ||
+ | * **Minimum number of time points**: it computes biclusters with at least a minimum number of conditions. | ||
+ | * **Maximum p-value**: it computes biclusters with a maximum p-value. | ||
+ | * **Minimum overlapping percentage**: it computes biclusters with a minimum percentage of overlapping between biclusters. | ||
+ | |||
+ | |||
+ | ====== Worked Examples ====== | ||
+ | |||
+ | ===== Example 1. Cell cycle data ===== | ||
+ | |||
+ | - Go to the Babelomics page and select Biclustering analysis from the //Expression// menu. | ||
+ | - Press //Online Examples//, select the **example number 1** and you will see how the parameters and form fields are now filled. As you can notice, this example is prepared to perform a biclustering analysis with default parameters but filters. Selected filters are a minimum of 5 conditions and a maximum p-value of 0.05. | ||
+ | - Press run, and wait for your job to be finished. | ||
+ | - When the process finishes, a new //green job// is shown at the right side of the web page. Press it to check your results. | ||
+ | |||
+ | **Results** | ||
+ | |||
+ | Visualization of the results would be as the following image, | ||
+ | |||
+ | {{:images:biclustering:biclustering_results.jpeg|biclustering_example_1_results}} | ||
+ | |||
+ | **Biclusters file** has information about all results, including a numerical matrix of expression measurements corresponding to the selected genes in each bicluster. | ||
+ | |||
+ | In the **main box** of the visualization tool, you would see obtained biclusters. Within the **top menu** you could change how to visualize them, | ||
+ | * **Matrix**: a gradient color expression matrix. Option selected by default. | ||
+ | |||
+ | {{:images:biclustering:matrix_view.jpeg|matrix_view}} | ||
+ | |||
+ | * **Expression**: a plot with gene expression values across all conditions. | ||
+ | |||
+ | {{:images:biclustering:expression_view.jpeg|expression_view}} | ||
+ | |||
+ | * **Pattern**: shows the expression pattern regarding, in each step, the previous condition. | ||
+ | |||
+ | {{:images:biclustering:pattern_view.jpeg|pattern_view}} | ||
+ | |||
+ | * **Trends**: shows the expression pattern. | ||
+ | |||
+ | {{:images:biclustering:trend_view.jpeg|trend_view}} | ||
+ | |||
+ | At the **menu on the right** there is a section with **general information** about the biclustering analysis, a section with **information of a selected bicluster**, a section where you can **manage different filters** and a **visualization section**. | ||
+ | |||
+ | **Pattern coding** is the following, | ||
+ | |||
+ | ^U^D^N^ | ||
+ | |Up|Down|No change| | ||
+ | |||
+ | **Questions** | ||
+ | |||
+ | * How many biclusters did you obtain? | ||
+ | * In these results, which is the minimum number of genes that you find in a bicluster? and the maximum number of genes? | ||
+ | * Select the pattern view at the top menu of the visualization box. Then, select the first bicluster (id:18191). How would you interpret this graph? What is your answer based on? | ||
+ | * Select the trend plot of this bicluster. Is it different from its pattern plot? | ||
+ | * In general, which are the differences between pattern plots and trends plots? | ||
+ | |||
+ | Run the same example with different options from the Babelomics interface. Compare the results. | ||
+ | ===== Example 2. Heat stress data ===== | ||
+ | |||
+ | - Go to the Babelomics page and select Biclustering analysis from the //Expression// menu. | ||
+ | - Press //Online Examples//, select the **example number 2** and you will see how the parameters and form fields are now filled. As you can notice, this example is prepared to perform a biclustering analysis with remove missing values, compute sign changes and filter biclusters with a minimum of 5 conditions and a maximum p-value of 0.05. | ||
+ | - Press run, and wait for your job to be finished. | ||
+ | - When the process finishes, a new //green job// is shown at the right side of the web page. Press it to check your results. | ||
+ | |||
+ | **Questions** | ||
+ | |||
+ | * How many biclusters did you obtain? | ||
+ | * In these results, which is the minimum number of genes that you find in a bicluster? and the maximum number of genes? | ||
+ | * Go to page 2 of results and select the bicluster with id:85. Then, click on **'Bicluster info'** in the right menu. Are there any differences between derivative and trend patterns? What do those differences mean? In both patterns, how are gene expression values? | ||
+ | * Send this bicluster to FatiGO analysis. Are its genes functionally enriched? | ||
+ | * Then, click on **'Filter with these bicluster's genes'** in the right menu. You will see a panel to insert genes that will filter biclustering results. Click on **'Show filter preview'**. In how many biclusters are your list of genes? Which percentage of genes do you have in your list? Modify your list of genes and preview your filter. | ||
+ | * Click on **'Filter'** to obtain your filtered biclusters. If you want to manage and navigate through different filters, click on **'Manage filters'** in the **'Filters'** section of the rigth menu. | ||
+ | |||
+ | ====== References ====== | ||
+ | |||
+ | * Sara C. Madeira, Miguel C. Teixeira, Arlindo L. Oliveira, Isabel Sá-Correia (2010) **Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithm**. [[http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.34|IEEE/ACM Transactions on Computational Biology and Bioinformatics]] 1 (7), pp. 153-165. {{:example_data:biclustering:references:madeira_et_al_2010.pdf|Pdf}} | ||
+ | * Madeira SC, Oliveira AL (2009) **A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series**. [[http://www.ncbi.nlm.nih.gov/pubmed/19497096|Algorithms for molecular biology]] 4:8. | ||
+ | * Madeira SC, Oliveira AL (2004) **Biclustering algorithms for biological data analysis: a survey**. [[http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1324618|IEEE/ACM Transactions on Computational Biology and Bioinformatics]] 1(1):24-45. {{:example_data:biclustering:references:madeira_oliveira_2004.pdf|Pdf}} | ||
+ | * A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, and P.O. Brown (2000) **Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes**. [[http://www.ncbi.nlm.nih.gov/pubmed/11102521|Molecular Biology of the Cell]] vol. 11, pp. 4241-4257. | ||
+ | * S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church (1999) **Systematic Determination of Genetic Network Architecture**. [[http://www.nature.com/ng/journal/v22/n3/abs/ng0799_281.html|Nature Genetics]] vol. 22, pp. 281-285. |