# Differences

This shows you the differences between two versions of the page.

association_analysis [2010/09/26 14:52]
dmontaner
association_analysis [2017/05/24 10:36] (current)
Line 1: Line 1:
+====== Association tests ======

+The purpose of this set of tests is to study association between genetic markers and phenotype or traits. In general, the idea of population association studies is to identify patterns of polymorphisms that vary systematically between individuals with different disease states and could therefore represent the effects of risk-enhancing or protective alleles.
+
+The statistical determination of how associated the genotype and phenotype are, it can be analysed with different tests that we propose in this section, where the use of one test or other principally depends on the type of incoming data.
+\\  \\
+(More details are explained below this introduction for every specific test).
+
+===== Chi-square case/control =====
+
+
+The **chi-square** test statistic is designed **to test** the null hypothesis that there is no **association** between the rows and columns of a contingency table. For example, to determine whether there is an association between a particular SNP variant and phenotype (case/control) might collect data that could be assembled into a 2x2 table. In this case, the two columns could be defined by whether the subject have a disease (case) or not (control), while the rows represent the two variant of an allele SNP. The cells of the table would contain the number of observations or patients as defined by these two variables.
+\\ \\
+For every SNP, the chi-square test statistic builds a 2x2 contingency table by counting the number of times each possible allele SNP appears in a case or control sample. We check if there is difference between the allele proportion presence on the phenotype variable (case and control).
+\\ \\
+This statistic is calculated by the sum of observed minus expected count squared and divided by the expected. When the observed number of events deviates significantly from the expected counts, then it is unlikely that the null hypothesis is true, and it is likely that there is a row-column association. Conversely, a small chi-square value indicates that the observed values are similar to the expected values leading us to conclude that the null hypothesis is plausible.
+
+In terms of pvalues, a chi-square probability of .05 or less is interpreted as justification for rejecting the null hypothesis that the row variable is unrelated to the column variable.
+\\ \\
+
+**Example:**
+Observed values for data presented in a 2x2 contingency table (columns represent phenotype, rows genotype)
+\\
+^   ^ Case ^ Control ^ Total ^
+^ allele A | a | b | a+b |
+^ allele T | c | d | c+d |
+^ Total | a+c | b+d | n |
+
+**HINT**: When there is a **small number of counts** in the table, the use of the **chi-square test statistic may not be appropriate**. Specifically, it has been recommended that this test not be used if any cell in the table has an expected count of less than one, or if 20 percent of the cells have an expected count that is greater than five. Under this scenario, the //Fisher's exact// test is recommended for conducting tests of hypothesis.
+
+===== Fisher's exact test =====
+
+The purpose of this test ([[http://en.wikipedia.org/wiki/Fisher_exact_test|fisher's exact]]) is similar than the chi-square test studying **association between genotype** and **disease trait** (phenotype) with the use of contingency tables, but a difference of the chi-square test we use the fisher's exact test when the **sample sizes are small**.
+\\ \\
+We provide p-values and adjusted (corrected p-values) to check the significance on to test the null hypothesis that there is not association between variables...
+
+In the results file ...
+
+
+===== Linear Model =====
+
+The **linear model** allows for multiple covariates when testing for both **quantitative trait** and disease trait SNP association, and for interactions with those covariates. The **covariates** can either be **continuous** or binary (i.e. for categorical covariates, you must first make a set of binary dummy variables). \\
+\\
+
+===== Logistic test =====
+
+The **logistic model** also allows for multiple covariates as the linear model. But in this case, the logistic model is useful when the **covariates are binary**.
+\\ \\