The National Family Health Survey 2015–2016
We used the fourth round of the NFHS from 2015 to 2016 to conduct this study. Households were defined as a group of individuals who normally live together and take their meals from a common kitchen. Overall, this survey used a stratified two-stage sampling frame (states, and urban/rural areas within states) to select households and participants [22]. More specifically, households were selected from primary sampling units, defined as groups of adjacent households, which were villages in rural areas and census enumeration blocks in urban areas. We collectively refer to these village and census enumeration blocks as communities in this paper. As such, this dataset contains data from each of India’s 36 states/union territories, all 640 districts, 28,522 out of over 650,000 communities, 601,509 households, and 699,686 women between the ages of 15–49. For the purposes of this paper, the survey included data on a total of 259,627 children from 180,227 households.
Variables and sample sizes
The outcomes included in our analysis were child height-for-age Z score (HAZ), weight-for-height Z score (WHZ), weight-for-age Z score (WAZ), standardized hemoglobin measures (HB), and birthweight (measured in grams). We also included stunting (<− 2 SD HAZ), wasting (<− 2 SD WHZ), underweight (<− 2 SD WAZ), anemia (< 11.0 g/dL), and low birthweight (< 2500 g). The dataset had complete HAZ, WAZ, and WHZ data for 225,002 children from 164,664 households. The dataset had complete HB data for 209,496 children from 157,746 households, and birthweight data for 184,852 children from 140,572 households. Sample sizes are fully described in Fig. 1 below. The number of states/union territories [23] and districts (640) stayed the same in all the analyses hence they are not reported in the figure.
Covariates included 19 risk factors known to be associated with child malnutrition [7, 24, 25]. We included three different nutrition variables, early breastfeeding initiation, vitamin A supplementation (only asked in reference to children between 6 and 59 months), and the use of iodized salt. Each of these three variables was dichotomized yes/no. We included four environmental variables, household access to improved sanitation, household use of clean cooking fuel, safe child stool disposal, and household access to improved drinking water. These variables were also dichotomized yes/no. A total of six health coverage variables were included. These were whether the child experienced an infectious disease in the past two weeks, complete child vaccinations, presence of skilled birth attendant at birth, provision of oral rehydration therapy after diarrhea, care seeking for pneumonia, and whether family planning needs were met. All of these variables were dichotomized yes/no. Household wealth, mother’s level of education, and child’s birth order were included as socioeconomic risk factors. Households were dichotomized by either being in the poorest wealth quintile, or in all other quintiles. Mother’s education was dichotomized based on whether the mother had received no education or at least primary education. We dichotomized birth order as before or after the 6th birth. Finally, we included three maternal characteristic risk factors which were mother’s age at marriage, maternal height, and maternal body mass index (BMI). Age at marriage was dichotomized as above or below 18. Mother’s height was dichotomized as above and below 145 cm, and mother’s BMI was dichotomized as above and below 18.5 kg/m2.
After performing listwise deletion for all missing observations for the above covariates, our fully adjusted sample sizes were 204,979 children with HAZ, WAZ, and WHZ outcome data, 190,227 children with HB outcome data, and 167,838 children with birthweight outcome data.
Statistical analysis
Our statistical analysis consisted of a series of four- and five- level models in order to compare the addition of households as a level of analysis. Additionally, each set of models included an ‘unadjusted’ and ‘adjusted’ model. The unadjusted models contained only child age (in months) and sex, while the fully adjusted models contained all 19 risk factors outlined above. Further, our analysis was composed of primary and secondary analyses. The primary analysis included the full sample of households, with varying numbers of children per household, whereas the secondary analysis focused on a restricted subsample composed of only those households with more than one child. This approach was taken in order to examine whether household variance estimates were attenuated by between-child estimates in households with more than one child.
Continuous outcomes
We used multilevel modeling to decompose the proportion of variation in our continuous outcomes – HAZ, WHZ, WAZ, HB, and birthweight – attributable to children at level one, nested in urban/rural communities at level two, districts at level three, and states at level four. Multilevel modeling is a statistical methodology commonly used in the field of public health to elucidate the effects of both compositional and contextual factors on health [26,27,28]. For this four-level model, we estimated equation (1), which took the basic form Yijkl = β0 + β1Xijkl + (e0ijkl + u0jkl + v0kl + f0l) where Yijkl is one of the outcomes for child i nested in community j, district k, and state l. In this model, Xijkl is a vector of covariates, and the random effects e0ijkl, u0jkl, v0kl, and f0l are the residual differentials for children, communities, districts, and states, respectively. We then included households such that we decomposed the proportion of variation in the same continuous outcomes attributable to children at level one, nested in households at level two, communities at level three, districts at level four, and states at level five. For this five-level model, we estimated equation (2), which took the basic form Yijklm = β0 + β1Xijkl. + (e0iyjkl + h0yjkl + u0jkl + v0kl + f0l) where Yiyjkl is one of the outcomes for child i nested in household y, community j, district k, and state l. In this model, Xiyjkl is a vector of covariates, and the random effects e0iyjkl, h0yjkl, u0jkl, v0kl, and f0l are the residual differentials for children, households, communities, districts, and states, respectively. In both the four-level and five-level models, the residual differentials for children, households, communities, districts, and states are assumed to be normally distributed with a mean of 0 and a variance of \({\sigma}_{e0}^2\), \({\sigma}_{h0}^2\), \({\sigma}_{u0}^2\), \({\sigma}_{v0}^2\), and \({\sigma}_{f0}^2\), respectively. The variances in the four and five-level models are the parameters of interest and signify the between-child (\({\sigma}_{e0}^2\Big)\), between-household (\({\sigma}_{h0}^2\)), between-community (\({\sigma}_{u0}^2\)), between-district (\({\sigma}_{v0}^2\Big)\), and between-state (\({\sigma}_{f0}^2\)) variations in child i experiencing the outcome. Finally, we repeated the five-level analysis for a subsample of households with more than one child.
Binary outcomes
For the binary outcomes stunting, wasting, underweight, anemia, and low birthweight, we estimated four level models for the probability of a child i, in community j, in district k, in state l experiencing the outcome Yijkl = 1 as equation (3) logit(πijkl) = β0 + β1Xijkl + (u0jkl + v0kl + f0l), where πijkl is the log odds of the outcome in child i, Xijkl is a vector of covariates, and the random effects are the residual differentials for communities (u0jkl), districts (v0kl), and states (f0l). We then added households in order to decompose the proportion of variation in the same binary outcomes attributable to children at level one, nested in households at level two, communities at level three, districts at level four, and states at level five. For this five-level model, we estimated the probability of child i, in household y, in community j, in district k, in state l experiencing the outcome Yiyjkl = 1 as equation (4) logit(πiyjkl) = β0 + β1Xiyjkl + (hyjkl + u0jkl + v0kl + f0l). In this case, the random effects hyjkl, u0jkl, v0kl, and f0l are the residual differentials for households, communities, districts, and states, respectively. The same assumptions and parameter definitions used for equations (1) and (2) were applied to equations (3) and (4). However, the variance of the lowest levels cannot be estimated when considering binary outcomes. As such, the child-level random effect and variance is not shown. Again, we repeated the five-level analysis for households with more than one child.
We used the Monte Carlo Markov Chain (MCMC) method with a burn-in of 5000 cycles monitored over 50,000 chain iterations in MLwiN 3.05 software to conduct the analysis and produce the estimates for the continuous and binary outcomes in the four and five level models [29, 30].