Much caution must be adopted when dealing with large sets of data because of the high probability of occurrence of spurious associations. It is quite common in many conventional testing strategies in gene expression analysis, genotyping, etc., conducting an unusually high number of tests.
Addressing multiple testing properly is a rather complex problem. Many of the conventional correction methods (e.g. Bonferroni or Sidak) are based on the consideration that a p-value should be adjusted by multiplying a reasonable significant threshold (e.g. p< 0.05) for the number of tests performed to obtain a new threshold. Whenever many thousands of tests are performed the original assumption risks to be too conservative.
In the context of multiple independent tests (one per gene or one per annotation) it is considered as more appropriate to control the proportion of errors among the identified functional terms whose differences among groups of genes or proteins cannot be attributed to chance instead. The expectation of this proportion is the False Discovery Rate (FDR). Different procedures offer strong control of the FDR under independence and some specific types of positive dependence of the tests statistics (Benjamini and Hochberg, 1995), or under arbitrary dependency of test statistics (Westfall and Young, 1993).