Activity 1

Goal: We would like to generate a predictor to classify patients as ALL (Acute Lymphoblastic Leukemia ) or AML (Acute Myeloid Leukemia ).

Data:

Workflow:

  1. Explore both files from a text editor:
    • How many genes and samples are there for each file? Are the same genes for both files?
    • Any specific detail about headers?
  2. Upload your files to Babelomics 5.0. Go to section Expression > Class Prediction
  3. Select these parameters:
    • Algorithm: KNN
    • Error estimation: KFold. Repeats: 10; folds:5
    • Correlation-based Feature Selection (CFS)

Questions:

  1. Train results:
    • The summary includes three interesting tables + summary plot. Could you explain the meaning for each of them?
    • How many genes were used for the prediction?
    • Are there any samples with more difficult to classify?
  2. Test results:
    • Could you comment final results for the group of new individuals?