Activity 3

Objectives: Cluster Analysis for zebrafish embryogenesis data

Data: We would like to perform a hierarchical clustering analysis of genes in “zebrafish_embryo.txt”. This example file contains the first 999 genes of the 3,657 genes that showed significant levels of differential expression in Mathavan et al. study (2005).

Workflow + questions:

Open the file and explore its structure.
Upload your file to Babelomics 5.0. Go to section Expression>Clustering
Cluster samples for different scenarios:
1. UPGMA + Euclidean
  - Do you see any patterns of gene expression between different developmental stages?
  - Could you download files with newick format? Do you know this format?
2. UPGMA + Correlation coeff. (Pearson)
3. Which distance parameter is better for proper clustering?
Repeat the analysis using the same distance parameters and SOTA method:
1. SOTA + Euclidean
2. SOTA + Correlation coeff. (Pearson)
3. Do the results change based on the method or the distance parameter?