Activity 3

Objectives: Cluster Analysis for zebrafish embryogenesis data

Data: We would like to perform a hierarchical clustering analysis of genes in “zebrafish_embryo.txt”. This example file contains the first 999 genes of the 3,657 genes that showed significant levels of differential expression in Mathavan et al. study (2005).

Workflow + questions:

  1. Open the file and explore its structure.
  2. Upload your file to Babelomics 5.0. Go to section Expression>Clustering
  3. Cluster samples for different scenarios:
    1. UPGMA + Euclidean
      • Do you see any patterns of gene expression between different developmental stages?
      • Could you download files with newick format? Do you know this format?
    2. UPGMA + Correlation coeff. (Pearson)
    3. Which distance parameter is better for proper clustering?
  4. Repeat the analysis using the same distance parameters and SOTA method:
    1. SOTA + Euclidean
    2. SOTA + Correlation coeff. (Pearson)
    3. Do the results change based on the method or the distance parameter?