Page 3 of 13
CM6.{1-2,4} | CM6.{1-2,4} | Statistical Data Workflow — SDL Guide (Part 3)
Applying the Statistical Data Workflow: A Community Example
To consolidate the workflow, consider a complete community medicine investigation from beginning to end.
Scenario: The Block Medical Officer of a tribal block in Odisha wants to assess the nutritional status of adolescent girls (10–19 years) in the block, which has 35 villages spread across a forested area with poor road connectivity.
Step 1 — Research Question (CM6.1): Using PICO: 'What is the prevalence of undernutrition (BMI-for-age below 5th percentile per WHO growth reference) among adolescent girls aged 10–19 years in [Block Name], Odisha, in 2024 (P, I=exposure, O, Time)?' The question is specific, measurable, achievable, and relevant to a district-level policy decision about the SABLA programme.
Step 2 — Variable Classification (CM6.2, CM6.4): Primary outcome: BMI-for-age z-score (continuous, ratio scale). Derived binary outcome: undernourished vs not (nominal). Exposure variables: dietary intake (ordinal — food frequency categories), water source (nominal — well/hand-pump/piped), presence of SABLA centre in village (nominal — yes/no).
Step 3 — Sampling (CM6.4): A complete census of 35 villages is feasible for a preliminary survey; however, with 12,000 adolescent girls in the block and limited staff, a stratified multistage sample is chosen: stratify villages into 'accessible' (within 10 km of metalled road) and 'remote' (beyond 10 km); randomly select 10 villages from each stratum (cluster stage 1); within each village, list all adolescent girls using Aanganwadi records and randomly select 30 using SRS (stage 2). Total n = 600 girls.
Step 4 — Data Organisation (CM6.2): Weight and height measured; BMI-for-age computed. Data entered into a frequency distribution table with class intervals: BMI-for-age z-score (< −3, −3 to −2, −2 to −1, −1 to 0, > 0). Histogram drawn. Categorical variables cross-tabulated (contingency table: nutritional status × SABLA access).
Step 5 — Descriptive Summary (CM6.4): Mean BMI-for-age z-score = −1.8 (SD 1.4). Median = −1.6. Prevalence of undernutrition (z-score < −2) = 38.5% (95% CI to be computed in SDL 2). Access to SABLA centre: 42% of sampled girls — will be tested for association in SDL 2.
Interpretation: The mean (−1.8) and median (−1.6) are close but not equal, suggesting mild negative skew (more girls clustered around severe undernutrition than the mean reflects). The BMO reports these descriptive statistics to the health committee along with the histogram — a complete, reproducible, and unbiased summary that enables evidence-based programme planning.
SELF-CHECK
In a sample of 200 women, the mean haemoglobin is 10.8 g/dL with an SD of 2.1 g/dL. What is the Standard Error of the Mean (SE)?
A. 2.1 g/dL
B. 0.148 g/dL
C. 0.21 g/dL
D. 4.41 g/dL
Reveal Answer
Answer: B. 0.148 g/dL
SE = SD / √n = 2.1 / √200 = 2.1 / 14.14 ≈ 0.148 g/dL. The SD (2.1 g/dL) describes variability among individuals in the sample — how spread out individual haemoglobin values are. The SE (0.148 g/dL) is much smaller and describes the precision of the sample mean as an estimate of the population mean; it would be used to construct a 95% confidence interval (approximately mean ± 2 SE = 10.8 ± 0.296 g/dL). As sample size increases, SE decreases, but SD remains approximately the same — a key distinction tested in NMC papers.