Babelomics' formats: Expression Data Matrix

Babelomics stores expression measurements (intensity measurements in the case of microarray data) in a matrix format. Each of those matrices contains the measurements of several arrays in an experiment. Each array is sorted in a column and each gene is ordered in a row.

All arrays in the experiment are assumed to be of the same type and therefore to have the same gene identifiers. This gene identifiers are stored in the first column of the data matrix, naming its rows.

This matrix structures are saved in plain text files. The columns are separated by tabs and the rows are separated by a newline or carriage return. An end of line is also needed at the very end of the file. Reading and rewriting this data should be straight forward by using any spreadsheet.

Almost any line in the file starting with the symbol # is considered a comment line and is not used for computations. Exceptions are lines starting with a Babelomics reserved words which also contain the # symbol.

Empty lines are not allowed in the data files. Please make sure to take them out or to comment them using # before submitting your data or you may get unexpected errors.

An example of expression data file with the above mentioned structure would look like this:

    gene1	10.23	9.98	10.41	10.55	10.65	9.69
    gene2	10.51	9.74	10.65	10.63	10.43	10.35
    gene3	9.89	10.02	9.89	11.03	10.21	10.77
    gene4	10.25	10.83	8.94	10.16	10.49	10.46
    gene...	...	...	...	...	...	...