The very first step is to get an experimental dataset. If we do not have our own one, we can easily find one in a public repository like GEO or ArrayExpress. They provide a user-friendly interface to query easily their databases. Both archives allow the user to browse or query the experiments via free text search (e.g. experiment accession numbers, authors, laboratory, publication, key words), and filter the experiments retrieved by species or array design or experiment type. Once the desired experiment is identified, the user can find more information about the samples, protocols used, experimental design, etc. and most importantly can export the data associated with the selected experiment.
The way of accessing to these repositories is described below.
1. Go to the GEO home page: http://www.ncbi.nlm.nih.gov/geo/ GEO data can be retrieved in several ways:
2. Enter a keyword or any valid accessing code.
GEO data can be retrieved in several ways:
As with any other Entrez database, keywords or a simple Boolean phrase may be entered and restricted to any number of supported attribute fields, enabling effective query and mining of GEO data. Tools available under the 'Preview/Index' tab can help you construct complex, fielded queries.
3. Identify a DataSet of interest
After querying GEO, we will get a list of results with the related DataSets. There are some features that will help us to identify the appropriate dataset:
4. Once you have identified a DataSet of interest, click on the record link. By accessing to this link we are redirected to a page with information about the experiment carried out (summary, sample description, etc.) and also about the authors and the PubmedID. We are going to focus our attention on information concerning the microarray chip and the samples. We can see that they have 12 samples (6 cases and 6 controls) and that the platform used is Affymetrix Human Genome U133 Plus 2.0 Array1).
In order to download the the raw data of the experiment, go to the bottom of the page and click on your favorite download mode: ftp or html.
Here you have an example:
Archive/File | Name | Date | Time | Size | Type |
---|---|---|---|---|---|
Archive | GSE16538_RAW.tar | 06/11/2009 | 07:35:06 | 64798720 | TAR |
File | GSM415386.CEL.gz | 06/10/2009 | 10:45:28 | 5516509 | CEL |
File | GSM415387.CEL.gz | 06/10/2009 | 10:45:32 | 5514041 | CEL |
File | GSM415388.CEL.gz | 06/10/2009 | 10:45:35 | 5396385 | CEL |
File | GSM415389.CEL.gz | 06/10/2009 | 10:45:38 | 5391068 | CEL |
File | GSM415390.CEL.gz | 06/10/2009 | 10:45:41 | 5321878 | CEL |
File | GSM415391.CEL.gz | 06/10/2009 | 10:45:44 | 5370707 | CEL |
File | GSM415392.CEL.gz | 06/10/2009 | 10:45:47 | 5273116 | CEL |
File | GSM415393.CEL.gz | 06/10/2009 | 10:45:50 | 5347133 | CEL |
File | GSM415394.CEL.gz | 06/10/2009 | 10:45:53 | 5442786 | CEL |
File | GSM415395.CEL.gz | 06/10/2009 | 10:45:56 | 5474703 | CEL |
File | GSM415396.CEL.gz | 06/10/2009 | 10:45:59 | 5400721 | CEL |
File | GSM415397.CEL.gz | 06/10/2009 | 10:46:02 | 5329862 | CEL |
1. Go to the ArrayExpress main homepage, at http://www.ebi.ac.uk/arrayexpress/
2. In the Experiments box, on the left-hand side of the page, type in a word or a phrase or GO term by which you want to retrieve the experiments, (e.g. 'stress') and click Query button.
3. Choosing a DataSet.
This will bring up a window with a list of experiments in the reverse order of their publication. For each experiment the following information are displayed:
By clicking the + button on the left-hand side of each row you will get a more detailed view of each experiment.
4. Downloading data.
Data is sometimes offered in two ways: