Sampling, Sample Size and Power to Detect a Statistically Significant Difference


The selection of a sampling method and sample size is a critical step in the applied research process. The choice depends on factors such as the focus and characteristics of the study, the research population and the effect sizes of interest.

Recently, we were approached by a patron who was seeking to explore the prevalence of a rare disease in a study population and compare it to the already established disease prevalence in a super-population. The study population of little under 10,000 individuals was relatively static and separated into strata of different sizes.

The questions that we are asked to address, while taking account of constraint financial resources, were:

  • What is the best method to establish the disease prevalence in the study population?
  • What is the minimum sample size to ensure at least 80% power of the test to detect a particular difference in prevalence between the study population and the super-population?

There were a couple of interesting complications that we had to consider. Firstly, the disease was most certainly distributed non-proportionally to the stratum size (Complication 1). Secondly, the medical test used was not hundred percent accurate -- sensitivity was smaller than specificity and both were smaller than 1 (Complication 2).

After some research and consultation, we reached the conclusion that the most appropriate sampling method would be to sample proportionally to the size of a stratum. This method would produce a sample that is representative of the study population and therefore would help address the possible variability of prevalence across strata (Complication 1).

To conduct sample size and power calculations, we developed a simulation experiment in Stata as an alternative to pre-existing theory-based sample size and power formulas. Our decision to pursue this avenue was driven primarily by the potential consequences of failing to address the inaccuracy of the medical test. Specifically, if ignored, Complication 2 could have caused an upward bias in the estimate of prevalence in the study population. This would have resulted in an inflated estimate of power, as seen from the graph below.