Presentation
Web usage
Data Management
Data Preprocessing
Expression Data Analysis
Genomic Data Analysis
Functional Profiling Analysis
Babelomics as you can see it now is the result of many years of work and development. This version, released in 2010, is the convergence of two separated suits of genomic data analysis: GEPAS and the old Babelomics.
The first program implemented as a web server was the SOTA algorithm in late 2000 at the CNIO. Then, new tools such as the preprocessor, more clustering methods (SOM, hierarchical), methods for gene selection (pomelo tool), supervised clustering and for functional profiling (the popular FatiGO) were added during 2001 within an integrated environment.
The first official release of GEPAS version 1.0 was in mid 2002, in the CNIO, and was soon published in the first special web issue of NAR in 2003 (Herrero et al., 2003, NAR).
During 2003 GEPAS was oriented towards gene selection and prediction. New versions of pomelo (gene selection) and the Tnasas program (predictor) were added to GEPAS. A module for normalization of two-color arrays DNMAD was also included. Support for array-CGH was included for the first time as a viewer: the InSilicoCGH tool. Finally, another module was added for functional profiling: the FatiWise. This second version was officially released in the special web server issue of NAR in 2004 (Herrero et al., 2004, NAR).
During 2004 GEPAS underwent more additions such as the k-means clustering algorithm and improvements in programs such as DNMAD, InSilicoCHG, etc. Nevertheless, the most important improvement of this version of the tool was to include the link to a new suite for functional annotation: Babelomics. During 2004, GEPAS was used for analysing more than 75,000 experiments, with a average of 300 experiments analysed per day. This version was officially released in the special web issue of NAR in 2005 (Vaquerizas et al., 2005, NAR)
In August 2005 the GEPAS version 2.0 was released. A completely new interface was provided and new tools were added. Among them we can cite the CAAT, a hierarchical cluster analyser and viewer, and the ISACGH, a tool for estimating copy number in arrayCGH experiments, including visualization of results. Also, normalization for affy arrays was provided.
In February 2006 the GEPAS version 3.0 was released (Montaner et al., 2006, NAR). CAAT has been fully integrated in the clustering section. Differential gene expression was expanded beyond the simple t-test and new, more reliable tests, such as data-adaptive test, SAM, Bayesian regularised t-test, etc., have been included (pomelo module was discontinued). ISACGH was improved with a DAS server that allows visualization of the results over the Ensembl and functional annotation is provided through a direct link to GEPAS. A new database schema for cross-equivalence of the different gene identifiers and the functional terms was also implemented. Although t is not visible for the users it increases the number of available gene ID equivalences. The GEPAS suite was also improved. A new interface was also added, with the possibility of checking functional enrichment in heterogeneous terms (GO, pathways, etc) simultaneously. More terms have been added to FatiScan. We have also implemented the GSEA method.
In September 2006 the obsolete Tnasas tool was deprecated and a new tool, the Prophet, substituted it. Prophet ( Medina et al., 2007, Bioinformatics ) was much more efficient and it was the only tool in its category able to build up predictors that can be further used to predict class membership for new samples.
During 2007 GEPAS has been completely re-engineered and now it is based on SOAP web services and on new Web 2.0 technology features such as AJAX. This has facilitated the design of a new interface that allows asynchronous use, as well as projects, jobs and user management. Thus, the users can choose between the traditional anonymous sessions without login in (as in previous versions) or to log into the new environment with username and password. This new environment offers persistent sessions in which data keep stored as well as different facilities for tracking of the operations performed. Thus the beta version GEPAS 4.0 was launched in June 2007.
In July 2007, an improved version of IDconverter (new code-name: Rosetta), the protein and gene ID convertor, including a large number of species and databases was implemented. Rosetta allows importing any microarray file regardless of the IDs used in the platform. More species and gene references have been added and now the converter module supports more than 10 species and more than 40 id references for human (including SNP and orthologous information)
In October 2007 the version GEPAS 4.0 was launched. This would be the ultimate version of the tool developed on its own, outside Babelomics.
Up to 2005, several bioinformatic tools focused on genomic functional profiling analyses had been developed within our research group. Babelomics was born as an effort to combine all those tools into a general purpose analysis suit. Users would then have easier access and understanding of the implemented methodologies.
FatiGO was published in 2004 (Al-Shahrour et al., 2004).
FatiScan was officially released in 2005 too (Al-Shahrour et al., 2005, Bioinformatics) although it was running by 2004.
The first version of Babelomics came to light at the CIPF, and was published in the special web issue of NAR in 2005 (Al-Shahrour et al., 2005 NAR). It included FatiGO, FatiWise, TransFat, TMT and GenomeGO, as well as FatiScan.
In February 2006 Babelomics 2.0 was released. The FatiGO+ module included the functionality of FatiWise, TransFat and GenomeGO, which are discontinued. More biological information was added, including the CisRed database and bioentities obtained through text-mining methods using the almaKnowledgeServer. TMT was improved and new information on expression in tissues added. The MARMITE module which uses the bioentities for functional annotation was also included. FatiScan for the study of functionally related genes was improved and the GSEA method (Subramanian et al., 2005) was also included.
FatiGO+ (Al-Shahrour et al., 2007), Marmite and MarmiteScan (Minguez et al, 2007) were published during 2007.
In February 2008 the version 3.0 of Babelomics is released. Babelomics v3.0 has been extensive reengineered and now it includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. Babelomics permits for sub-selection of terms in order to test more focused hypothesis. Also gene-annotation correspondence tables can be imported, which allows testing with user-defined functional modules. Finally, a tool for the “de novo” functional annotation of sequences has been included in the system. This allows using yet unannotated organisms in the program.
In 2010 Babelomics 4.0 was completely reprogrammed. Many technical improvements where included such as full user-project management. Also scientific developments as the inclusion of protein-protein analysis tools are available in this release. Also as a main characteristic of this version, GEPAS tween tool is now fully integrated into Babelomics.