Table of Contents

Get an experimental DataSet

The very first step is to get an experimental dataset. If we do not have our own one, we can easily find one in a public repository like GEO or ArrayExpress. They provide a user-friendly interface to query easily their databases. Both archives allow the user to browse or query the experiments via free text search (e.g. experiment accession numbers, authors, laboratory, publication, key words), and filter the experiments retrieved by species or array design or experiment type. Once the desired experiment is identified, the user can find more information about the samples, protocols used, experimental design, etc. and most importantly can export the data associated with the selected experiment.

The way of accessing to these repositories is described below.

Getting data from GEO

1. Go to the GEO home page: http://www.ncbi.nlm.nih.gov/geo/ GEO data can be retrieved in several ways:

2. Enter a keyword or any valid accessing code.

GEO data can be retrieved in several ways:

As with any other Entrez database, keywords or a simple Boolean phrase may be entered and restricted to any number of supported attribute fields, enabling effective query and mining of GEO data. Tools available under the 'Preview/Index' tab can help you construct complex, fielded queries.


Accessing GEO

3. Identify a DataSet of interest

After querying GEO, we will get a list of results with the related DataSets. There are some features that will help us to identify the appropriate dataset:


Choosing a DataSet

4. Once you have identified a DataSet of interest, click on the record link. By accessing to this link we are redirected to a page with information about the experiment carried out (summary, sample description, etc.) and also about the authors and the PubmedID. We are going to focus our attention on information concerning the microarray chip and the samples. We can see that they have 12 samples (6 cases and 6 controls) and that the platform used is Affymetrix Human Genome U133 Plus 2.0 Array1).

In order to download the the raw data of the experiment, go to the bottom of the page and click on your favorite download mode: ftp or html.
Download raw data

Here you have an example:

Archive/File Name Date Time Size Type
Archive GSE16538_RAW.tar 06/11/2009 07:35:06 64798720 TAR
File GSM415386.CEL.gz 06/10/2009 10:45:28 5516509 CEL
File GSM415387.CEL.gz 06/10/2009 10:45:32 5514041 CEL
File GSM415388.CEL.gz 06/10/2009 10:45:35 5396385 CEL
File GSM415389.CEL.gz 06/10/2009 10:45:38 5391068 CEL
File GSM415390.CEL.gz 06/10/2009 10:45:41 5321878 CEL
File GSM415391.CEL.gz 06/10/2009 10:45:44 5370707 CEL
File GSM415392.CEL.gz 06/10/2009 10:45:47 5273116 CEL
File GSM415393.CEL.gz 06/10/2009 10:45:50 5347133 CEL
File GSM415394.CEL.gz 06/10/2009 10:45:53 5442786 CEL
File GSM415395.CEL.gz 06/10/2009 10:45:56 5474703 CEL
File GSM415396.CEL.gz 06/10/2009 10:45:59 5400721 CEL
File GSM415397.CEL.gz 06/10/2009 10:46:02 5329862 CEL

Getting data from ArrayExpress

1. Go to the ArrayExpress main homepage, at http://www.ebi.ac.uk/arrayexpress/

2. In the Experiments box, on the left-hand side of the page, type in a word or a phrase or GO term by which you want to retrieve the experiments, (e.g. 'stress') and click Query button. Querying ArrayExpress

3. Choosing a DataSet.

This will bring up a window with a list of experiments in the reverse order of their publication. For each experiment the following information are displayed:

By clicking the + button on the left-hand side of each row you will get a more detailed view of each experiment.
Accessing an experiment

4. Downloading data.

Data is sometimes offered in two ways:

1) Platforms Babelomics can read 5 file formats from 3 different platforms (or, more appropriately, from 3 different scanners): In Babelomics (as in general microarray contexts) we consider such files to be the raw data of the microarray experiment; the starting point of the data analysis process.