Worked examples
Colorectal cancer study
Colorectal cancer(CRC), commonly known as colon cancer or bowel cancer, is a cancer from uncontrolled cell growth in the colon or rectum (parts of the large intestine), or in the appendix. Genetic analysis shows that colon and rectal tumours are essentially genetically the same cancer. A recent microarray study has been used to systematically search for genes differentially expressed in early onset CRC. This study compares 12 CRC and 10 healthy control individuals. Gene expression datasets can be found at GEO database with the identifier GSE4107.
From the downloaded CEL files, we can obtain the normalized data matrix. For this step we are going to use a user-friendly web tool called
Babelomics. The necessary steps to undertake this analysis have been described in this
tutorial. If you want to avoid this step, you can download form here the
normalized matrix and the
experimental design.
Login in PATHiWAYS and create a new bucket. See the instructions for creating a new bucking in
data management section.
Upload the expression matrix and the experimental design in
My WebDrive. See the instructions for upload data in
data management section.
Click Press to run PATHiWAYS button to start the analysis.
Set Human as specie and Affymetrix Human Genome U133 Plus 2.0 Array as platform.
Select your normalized matrix by clicking Browse data button.
Select your experimental design data by clicking Browse data button and type the first and the second condition for the analysis. The first one is the one that act as control and the second one as a disease. In this case type CONTROL and CRC respectively.
In other parameters select 90th percentile as the way to summarize the probabilities of all the genes in a node.
In Pathways section check the All option in order to perform the analysis for all the pathways.
In Job section give a name to the new job (e.g CRC study) and select the previously created bucket.
Submit the job (Press the run button).
A better interpretation of the results can be found in Result interpretation section.
Breast cancer study
Breast cancer is a type of cancer originating from breast tissue, most commonly from the inner lining of milk ducts or the lobules that supply the ducts with milk. Cancers originating from ducts are known as ductal carcinomas, while those originating from lobules are known as lobular carcinomas. Breast cancer occurs in humans and other mammals. While the overwhelming majority of human cases occur in women, male breast cancer can also occur.
Breast cancer dataset, 31 normal samples and 31 malignant samples of peripheral blood mononuclear cells (PBMC), was obtained from the Gene Expression Omnibus (GEO) public repository (GSE27562).
From the downloaded CEL files, we can obtain the normalized data matrix. For this step we are going to use a user-friendly web tool called
Babelomics. The necessary steps to undertake this analysis have been described in this
tutorial. If you want to avoid this step, you can download form here the
normalized matrix and the
experimental design.
Login in the web tool and create a new bucket. See the instructions for creating a new bucking in
data management section.
Upload the expression matrix and the experimental design in
My WebDrive. See the instructions for upload data in
data management section.
Click Press to run PATHiPRED button to start the analysis.
Set Human as specie and Affymetrix Human Genome U133 Plus 2.0 Array as platform.
Select your normalized matrix by clicking Browse data button.
Select your experimental design data by clicking Browse data button and your experimental design type (Categorical). Type the first and the second condition for the analysis. The first one is the one that act as control and the second one as a disease. In this case type normal and malignant respectively.
In other parameters select a
Summ of
90th percentile as the way to summarize the probabilities of all the genes in a node and select
K-fold of
10. Note that
k-fold is the number of partitions to perform a
k-fold cross-validation.
In Pathways section check the All option in order to perform the analysis for all the pathways.
In Job section give a name to the new job (e.g Breast cander study) and select the previously created bucket.
Submit the job (Press the run button).
A better interpretation of the results can be found in PATHiPRED result interpretation section.
New dataset prediction
A prediction with a new dataset using the result model could be performed from PATHiPRED results web page.
You can download CEL files from another study of expression data from human breast tissue (
GSE7904) and custom your experimental design. Here, we propose a prepared example of
11 CEL files, including 7 normal and 4 BRCA1 (
experimental design).
Press Apply to a new dataset button from PATHiPRED results page.
Set Human as specie and Affymetrix Human Genome U133 Plus 2.0 Array as platform.
Upload compressed CELs file in
My WebDrive. See the instructions for upload data in
data management section.
Select your compressed CELs file by clicking Browse data button.
Select your experimental design data by clicking Browse data button and your experimental design type (Categorical). Type the control (condition 1) and disease (condition 2); for the proposed example, the first one is Normal and the second one is BRCA1.
In other parameters select a Summ of 90th percentile as the way to summarize the probabilities of all the genes in a node.
In Pathways section check the All option in order to perform the analysis for all the pathways.
In Job section give a name to the new job (e.g Breast cander study) and select the previously created bucket.
Submit the job (Press the run button).
Erlotinib sensitivity study
An approach proposed by several research groups during the past decade is to build genomic predictors of drug response from large panels of cancer cell lines. Analyses of these data are promising in improving our understanding of the mechanisms of action of drugs. Erlotinib targets the epidermal growth factor receptor (EGFR) tyrosine kinase, which is highly expressed and occasionally mutated in various forms of cancer. Erlotinib has shown a survival benefit in the treatment of lung cancer in phase III trials.
Erlotinib screened bone cell lines were obtained from ArrayExpress (E-MTAB-783). Cell line drug sensitivity was measured as the concentration at which the drug inhibited 50% of the cellular growth (IC50) and was obtained from Genomics of Drug Sensitivity in Cancer study.
-
Login in the web tool and create a new bucket. See the instructions for creating a new bucking in
data management section.
Upload compressed CELs file and the experimental design in
My WebDrive. See the instructions for upload data in
data management section.
Click Press to run PATHiPRED button to start the analysis.
Set Human as specie and Affymetrix Human Genome U133A 2.0 Array as platform.
Select your compressed CELs file by clicking Browse data button.
Select your experimental design data by clicking Browse data button and your experimental design type (Continuous).
In other parameters select a
Summ of
90th percentile as the way to summarize the probabilities of all the genes in a node and select
K-fold of
10. Note that
k-fold is the number of partitions to perform a
k-fold cross-validation.
In Pathways section check the All option in order to perform the analysis for all the pathways.
In Job section give a name to the new job (e.g Breast cander study) and select the previously created bucket.
Submit the job (Press the run button).
A better interpretation of the results can be found in PATHiPRED result interpretation section.