Page 2 of 21
CM7.{1,3} | CM7.{1,3} | Epidemiology Concepts and Data Sources — SDL Guide (Part 2)
Evaluating and Applying Data Sources
Choosing a data source is not merely a matter of availability — it requires evaluating whether the source can answer the specific epidemiological question with adequate validity, completeness, and timeliness. Four criteria guide this evaluation.
Completeness refers to the proportion of events in the population that are actually captured by the system. India's CRS captures approximately 86–90% of births but far fewer deaths with cause attribution, especially in rural areas where death registration and medical certification are incomplete. The SRS, though sample-based, achieves higher effective completeness through its dual-record reconciliation design.
Validity (accuracy) asks whether the information recorded is correct. Cause-of-death data from CRS is often of low validity in rural areas because deaths are certified by non-physicians or not medically certified at all. Verbal autopsy methods — structured interviews with family members — have been developed as a validity-improving alternative for uncertified deaths. NFHS data on height, weight, and haemoglobin is measured directly (high validity); self-reported morbidity has lower validity due to recall bias and lay-person disease labelling.
Timeliness is critical for outbreak detection and programme monitoring. Sentinel surveillance systems and the IDSP P-form (community-level syndromic reporting) are designed for rapid (weekly or daily) reporting. NFHS rounds, by contrast, provide data every 4–7 years — useful for secular trend analysis but not for real-time response.
Representativeness asks whether the source covers the population of interest. Hospital-based records capture only those who seek care — systematically excluding the poorest, most rural, and least-mobile segments of the population (the health-seeking bias). National household surveys (SRS, NFHS) are designed to be nationally representative through stratified multistage sampling.
In practice, most epidemiological questions require triangulation — combining multiple sources. For example, estimating India's true tuberculosis burden requires combining: Nikshay notifications (officially diagnosed cases) + mortality data from SRS/CRS (deaths attributable to TB) + prevalence surveys (undiagnosed cases in community) + mathematical modelling. No single source alone is sufficient.
When a student is asked 'which data source would you use to assess infant mortality trends across Indian states?', the answer is the SRS — because it is the only system with the sampling design, dual-record completeness correction, and state-level disaggregation needed for this comparison. The NFHS would be an acceptable alternative for a specific point-in-time estimate. Hospital records would be inappropriate (health-seeking bias). This type of applied matching — question to source to justification — is what CM7.3 demands.
CLINICAL PEARL
The IDSP 'P-form' catches outbreaks faster than laboratory confirmation. India's Integrated Disease Surveillance Programme uses three report types: the P-form (community/peripheral level, syndrome-based, weekly) captures unusual clusters of fever, jaundice, or respiratory illness before aetiology is confirmed; the L-form (laboratory) confirms; the S-form (clinical) from hospitals provides a more detailed clinical picture. In major outbreaks (Surat plague 1994, Nipah Kerala 2018), the earliest signal came from community health workers using syndrome-based reporting — not from laboratory results. As a doctor, reporting a cluster of unexplained deaths or febrile illness to the district surveillance officer — even before a cause is established — is an epidemiological obligation, not an optional courtesy.
SELF-CHECK
A health researcher wants to estimate the prevalence of hypertension in rural India with state-level disaggregation. Which data source is MOST appropriate?
A. Civil Registration System (CRS)
B. Hospital outpatient records
C. National Family Health Survey (NFHS)
D. Integrated Disease Surveillance Programme (IDSP) weekly reports
Reveal Answer
Answer: C. National Family Health Survey (NFHS)
The NFHS includes direct blood-pressure measurement in a nationally representative household sample with state-level disaggregation — making it the most appropriate source for estimating hypertension prevalence in the community. CRS records vital events (births/deaths), not chronic disease prevalence. Hospital records suffer from health-seeking bias and miss undiagnosed hypertension. IDSP is designed for communicable disease outbreak surveillance, not NCD prevalence estimation.
Applying Epidemiology to Public Health Practice
The translation from epidemiological concept to public health action is best understood through a worked scenario. Consider a district health officer in Rajasthan who receives a report: over the past three weeks, 47 cases of acute watery diarrhoea have been notified from a single block, compared to the usual 5–8 per week. She must answer five questions in sequence, each requiring a different epidemiological data source:
Step 1 — Confirm the excess: Is 47 cases truly above baseline? She retrieves the past 12 months of IDSP P-form reports for that block to establish a seasonal baseline. This confirms a seven-fold excess — an epidemic.
Step 2 — Define the population at risk and describe the pattern: Who is affected by person (age, sex, occupation), place (which villages, which water sources), and time (when did cases begin, is the curve point-source or propagated)? She uses the IDSP S-form data and a spot map overlaid on a geographic information system.
Step 3 — Generate hypotheses about cause: The place clustering around a borewell and the abrupt onset (point-source epidemic curve) suggest contaminated water. She draws on national sentinel surveillance data and laboratory guidelines from IDSP to structure sample collection.
Step 4 — Test and confirm: Laboratory confirmation of Vibrio cholerae from stool samples and water samples. She now uses the CRS to identify any deaths in the block during this period and assess case-fatality ratio.
Step 5 — Implement and evaluate control: Water chlorination, ORS distribution, and case isolation. She will monitor IDSP P-form reports weekly to confirm the epidemic curve is falling.
This worked scenario demonstrates that epidemiology is not abstract theory — it is the operational logic that joins a data signal to a field action. The concepts (triad, person-place-time, natural history) provide the interpretive framework; the data sources (IDSP, SRS, CRS) provide the raw material; the principles (completeness, validity, timeliness) determine which source is trustworthy at each step.