Page 4 of 13

CM6.{1-2,4} | CM6.{1-2,4} | Statistical Data Workflow — Summary & Reflection

KEY TAKEAWAYS

This module has walked you through the complete statistical data workflow as practised in community medicine:

Research question formulation (CM6.1): A SMART, PICO-structured research question is the starting point of every investigation — it determines design, variables, and analysis before any data are collected.

Types of data (CM6.2, CM6.4): Variables are classified by scale (nominal → ordinal → interval → ratio) and discreteness (discrete vs continuous). Scale of measurement determines which statistics and tests are valid.

Sampling techniques (CM6.4): When a census is impractical, probability sampling ensures representativeness. The five techniques — simple random, stratified, systematic, cluster, and multistage — are matched to the research context, with multistage sampling most common in large national surveys.

Classification and presentation (CM6.2): Raw data are organised into frequency distributions and presented as histograms (continuous), bar charts (discrete/categorical), pie charts (proportions), or ogives (cumulative). Histogram vs bar chart is a high-yield distinction.

Descriptive statistics (CM6.4): Central tendency (mean for symmetric data, median for skewed/ordinal) and dispersion (SD for spread of individuals, SE for precision of the mean, CV for relative variability, IQR for skewed data) together characterise a distribution. The SD–SE distinction is clinically and statistically critical.

In the next module, these descriptive skills feed directly into hypothesis testing, probability distributions, and tests of significance — the inferential engine of community medicine research.

REFLECT

Return to the district civil surgeon's challenge in the hook: she has data from 200 households in a district of 40,000 families. Using the concepts from this module, consider: Which sampling method would give the most representative 200 households from a district spanning both urban and rural areas? If the neonatal weight data are skewed because of a few severely growth-restricted infants, should she report the mean or median birth weight to the health committee, and why? And if she wants to compare variability in neonatal weight across two talukas that have different mean weights, which measure of dispersion is most appropriate? Jot down your answers, then discuss with a classmate: do you agree on all three choices? If not, what is the reasoning behind each position?