Differences

This shows you the differences between two versions of the page.

--- differential_expression_example [2010/09/30 15:44]
mbleda
+++ differential_expression_example [2017/05/24 10:36] (current)
@@ Line 1: / Line 1: @@
+====== Differential Expression Analysis ======
+In this example we are going to perform a complete differential expression analysis with [[http://babelomics.bioinfo.cipf.es|Babelomics]].
+**<fc #800000><fs medium>__1. Get an experimental DataSet__</fs></fc>**((For further information on How to get an experimental DataSet go [[data_downloading|here]].))
+We are interested in analysing the differencies between cases and controls in **Pulmonary sarcoidosis**:
+    * Go to the [[http://www.ncbi.nlm.nih.gov/geo/|GEO website]] and download the raw data of the experiment [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16538|GSE16538]].
+    * Take a look at the some information provided in the experiment's description: summary, sample description, authors, PubmedID. Focus your attention on information concerning the microarray chip and the samples. We can see that they have 12 samples (6 cases and 6 controls) and that the platform used is **Affymetrix Human Genome U133 Plus 2.0 Array**.
+\\
+{{ :images:general:geo3.png |GEO Experiment}}
+\\
+    * The file downloaded "**GSE16538_RAW.tar**" contains the necessary **CEL files** to go on with our study. If you are interested, you can uncompress the file and inside you will see other 12 compressed files. Each file corresponds to a single sample. Here you have the {{:example_data:pulmonary_sarcoidosis:sarcoidosis_gse16538_sampleinfo.txt|sample information for GSE16538 DataSet}}. \\ \\
+**<fc #800000>__<fs medium>2. Upload your data</fs>__</fc>**((For further information on How to upload data go [[data_upload|here]].))
+   * Go to //**Upload data**// and upload the GSE16538_RAW.tar file that you have just downloaded. \\ \\
+**<fc #800000>__<fs medium>3. Data Preprocessing</fs>__</fc>** \\
+**<fs medium>Normalization</fs>**((For further information about Normalization go [[microarray_normalization|here]].)) \\ The purpose of normalization is to adjust the data for unwanted invalances, drifts or biases and make possible the comparison between samples. Preprocessing includes three steps: //background correction//, //between array normalization// and //reporter summarization//.
+   * Since our DataSet uses a single color Affymetrix platform, go to //**Processing data**// > //**Normalize**// > //**Expression**// > //**One-channel**// > //**Affymetrix **//
+      - Browse in the uploaded data and select the raw data of our DataSet (the one you have just uploaded).
+      - Tick **RMA**(([[rma|RMA Normalization]])) as the analysis method.
+      - Insert a **job name** (we will call it '//rma_sarcoidosis//') and **Run**.
+      - When the process have finished, go to //**Job list**// in the right menu and click on it ('//rma_sarcoidosis//' in our case). \\ There you can see a summary of the **Normalization process**. \\ Focus your attention now on the plots provided: \\ \\  - **__Box-plot__**: represents the distribution of intensity measurements of one array. You would expect **all of them to show the same shape**. If you are normalizing two color arrays you will also expect all of the boxes to be centered in zero. \\ \\  {{ :images:differential_expression_example:box_plot.png |Box plot}} \\  As expected, all samples show a high resemblance in shape and also share a similar median value (around 6.5). \\ However, it may happen that we have a **problematic sample**. We will identify it because low-quality arrays are significantly elevated or more spread out, relative to the other arrays. Click {{|here}} to see an example of box-plot with a problematic sample. \\ \\ - **__MA plot__**: we will find one MA plot per sample. It represents the normalized intensity distribution of each sample against a consensus mean sample. A LOESS line fitting the trend between M and A values is drawn in red. After normalization you expect no trend in the LOESS line, that is, you expect it to be as close as possible tho the horizontal 0 axis. \\ \\ {{ :images:differential_expression_example:ma_plot.png |MA Plot}} \\ All samples seem to be correct so, we can go on with the preprocessing. \\ \\
+      - Go to //**Data list**// in the right menu. You will see a file called '//rma.summary//'. This file contains the **expression data matrix** generated after the normalization. You can download it(([[data_administration|How to download a file?]])) if you want to see its format. Here you have an example: \\ \\  {{ :images:differential_expression_example:datamatrix.png?800 |Data matrix}} \\ \\
+**<fs medium>Edit data</fs>** \\ With this step we can assign to each sample the correspondent experimental conditions or parameters. This is an essential feature of the relationship between the assay and the sample data.
+   * Go to //**Processing data**// > //**Edit**// and create a **new variable**.((For further information about data edition go [[edit_data|here]].))
+      - Browse in //Job generated// the expression data matrix created in '//rma_sarcoidosis//'.
+      - Create a **Variable** called 'PHENOTYPE' with values 'NORMAL' and 'SARCOIDOSIS'.
+      - Assign the correspondent **value to each sample** according to the information provided in the experiment and //Submit//. At the end you will have something like this: \\ \\ {{ :images:edit_data:assingn_variables.png |Edit data}} \\ \\
+**<fc #800000>__<fs medium>4. Differential Expression Analysis</fs>__</fc>** ((For further information about Differential Expression go [[differential_expression|here]].)) \\ \\
+Our objective is to analyze differential gene expression between a sarcoidotic and a normal lung. So, we are going to carry out a **two-class comparison**. For further information about class comparison go [[class_comparison|here]]\\
+   * Go to //**Expression**// > //**Differential expression**// > //**Class comparison**//.
+      - Browse in //Job generated// the expression data matrix created in '//rma_sarcoidosis//'.
+      - Select the **class name** '//PHENOTYPE//'. Automatically, you will see two drop-down menus with the values to compare. For our case, choose '//NORMAL//' in one of them and '//SARCOIDOSIS//' in the other. \\ \\ {{ :images:differential_expression_example:class_to_analyse.png |Select class to analyse}} \\ \\
+      - Set **Limma**(([[class_comparison|T-test]])) as the test to apply.
+      - Set also **Benjamini and Hochberg (BH), FDR** as the multiple-test correction.
+      - Set the adjusted p-value in **0.05**(([[adjustedpvalues|Adjusted p-value]])).
+      - **Submit** and when the job has finished, take a look at the results. \\ \\ {{ :images:differential_expression_example:heatmap.png |Heatmap}} \\ \\ What you can see after the analysis is a **heatmap** with the intensities for these genes that are significantly differentially expressed. Each <fc #008000>**gene**</fc> is represented in a row and each <fc #808000>**condition or array**</fc> is represented in a column. High intensity measurements of gene expression are represented in <fc #FF0000>**red**</fc> colors while <fc #0000FF>**blue**</fc> colors represent lower measurements. Genes are sorted according to their expression patterns in the same order as they are in the output file. Experimental conditions or arrays are ordered depending on their labels. \\ \\  {{ :images:differential_expression_example:de_files.png |Files to dowload}} \\ \\ The red square in the image above this words shows the number of genes differentially expressed in a significant way. In green, all the files you can download just clicking: \\ \\ - <fc #008000>**1.**</fc> //__Limma output file__//: this file contains a table with the statistic value, p-value, and adjusted p-value for each one of the genes in the array. \\ - <fc #008000>**2.**</fc> //__Significative values dataset__//: An **expression data matrix**(([[babelomics_expression_data|Expression Data Matrix]])) only containing the most significant 500 genes differentially expressed. \\ - <fc #008000>**3.**</fc> //__Significative values table__//: As well as in 'limma output' (1), it is a file contains a table with the statistic value, p-value, and adjusted p-value but just for the most significant 500 genes which are differentially expressed. \\ \\ At the bottom of the page you will see a collection of links which will help you go on with the **Functional profiling analysis**((To continue with this analysis, click [[fatigo_pipeline_example|here]].)). \\ \\ {{ :images:differential_expression_example:continue_processing.png |Continue processing}}