Comparison of small-area deprivation measures as predictors of chronic disease burden in a low-income population

Background Measures of small-area deprivation may be valuable in geographically targeting limited resources to prevent, diagnose, and effectively manage chronic conditions in vulnerable populations. We developed a census-based small-area socioeconomic deprivation index specifically to predict chronic disease burden among publically insured Medicaid recipients in South Carolina, a relatively poor state in the southern United States. We compared the predictive ability of the new index with that of four other small-area deprivation indicators. Methods To derive the ZIP Code Tabulation Area-Level Palmetto Small-Area Deprivation Index (Palmetto SADI), we evaluated ten census variables across five socioeconomic deprivation domains, identifying the combination of census indicators most highly correlated with a set of five chronic disease conditions among South Carolina Medicaid enrollees. In separate validation studies, we used both logistic and spatial regression methods to assess the ability of Palmetto SADI to predict chronic disease burden among state Medicaid recipients relative to four alternative small-area socioeconomic deprivation measures: the Townsend index of material deprivation; a single-variable poverty indicator; and two small-area designations of health care resource deprivation, Primary Care Health Professional Shortage Area and Medically Underserved Area/Medically Underserved Population. Results Palmetto SADI was the best predictor of chronic disease burden (presence of at least one condition and presence of two or more conditions) among state Medicaid recipients compared to all alternative deprivation measures tested. Conclusions A low-cost, regionally optimized socioeconomic deprivation index, Palmetto SADI can be used to identify areas in South Carolina at high risk for chronic disease burden among Medicaid recipients and other low-income Medicaid-eligible populations for targeted prevention, screening, diagnosis, disease self-management, and care coordination activities.


Background
In the United States persons with chronic conditions are overrepresented in Medicaid [1], a publically funded social health insurance program for persons with low incomes and limited resources [2]. Policy and programming efforts to control spending and improve health outcomes among Medicaid enrollees must address the health care requirements of high-need, high-cost recipients with chronic diseases. Low-cost small-area assessment tools based on existing data may be especially valuable in geographically targeting limited resources to prevent, diagnose, and effectively manage chronic conditions in high-risk Medicaid populations.
Increasingly, small-area measures of social and material deprivation [3] are used to discern geographic patterns of morbidity [4,5] and mortality [6,7]. The utilization of these measures in health research is theoretically grounded in internationally recognized social determinants of health literature, which consistently identifies worse health outcomes in socioeconomically disadvantaged communities [8]. One such measure, the Townsend deprivation index, has been used widely in population health studies. Developed in the United Kingdom, this small-area deprivation measure consists of four census-based component indicators reflecting local levels of unemployment, home ownership, household crowding, and vehicle availability [9]. The Townsend deprivation index has been used to evaluate associations between community deprivation and such diverse health outcomes as bacteremic pneumonia [10], tuberculosis [5,11], sexually transmitted infections [5], infant mortality [7], and motor vehicle deaths [12]. Similarly, a single-variable poverty index (proportion of the population living below a designated poverty level) has been used extensively in studies exploring associations between community deprivation and poor health. Poverty rates have been employed, for instance, as neighborhood-level predictors of low birth weight [13], AIDS [14], tuberculosis [5,11], pneumonia [10], stroke mortality [15], and all-cause mortality [16]. Several investigators have noted worse health outcomes in areas lacking sufficient numbers of health care providers [17][18][19]. Two US Health Resources and Services Administration (HRSA) small-area health care resource deprivation designations-Primary Care Health Professional Shortage Area (PC-HPSA) and Medically Underserved Area/Medically Underserved Population (MUA/MUP) [20]-thus also might prove useful in identifying US communities at risk for poor health.
Although the Townsend deprivation index, single-variable poverty index, and health care resource deprivation designations are used widely in health planning and evaluation, these measures may not be optimally suited for purposes of community health need assessment in all geographic regions or across diverse population groups. Indeed, a marked trend exists in the development of region/population-specific small-area deprivation indexes for health research. Since 2000, for example, deprivation measures have been constructed and applied in health studies in Quebec, Canada [21]; Verona, Northern Italy [22]; France [23,24]; Australia [25]; Puerto Rico [26]; Switzerland [27]; Denmark [28]; Sweden [29]; Nova Scotia, Canada [30]; and Quito City, Ecuador [31]. Six of these measures were introduced in just four years between 2012 and 2015 [26][27][28][29][30][31].
To our knowledge, no socioeconomic deprivation measure has been developed specifically for assessment of a Medicaid population in the United States. To facilitate health policy and programming, we developed a censusbased small-area socioeconomic deprivation index optimized to predict chronic disease burden among Medicaid recipients in South Carolina, a largely impoverished Southern state where more than one in five residents are enrolled in the Medicaid system [32]. Based on the conceptual framework of Aday [33], this index measures communitylevel resource deprivation that puts low-income Medicaid enrollees and other vulnerable individuals at increased risk for poor health. Information derived from the index can help state agencies, health care providers, non-profit organizations and community groups better target limited social, economic, and health care resources to improve population health (Fig. 1). In this paper we describe the construction of the new index, the Palmetto Small-Area Deprivation Index (Palmetto SADI); compare its ability to predict Medicaid population chronic disease burden with that of four alternative small-area deprivation measures; and identify its potential to strengthen chronic disease prevention, screening, diagnosis, self-management, and care coordination activities for at-risk populations. Our study illustrates the development of a region/population-specific, census-based small-area deprivation measure and shows that such an optimized index can outperform other widely employed deprivation indicators in predicting region/population-specific health outcomes.

Deprivation index construction
The US Census Bureau provides detailed population and housing data at multiple geographic levels. US census and survey data products are updated regularly and are available online at no cost, making them especially valuable to state and local health planners with limited financial resources. We sought to create a census-based index of socioeconomic deprivation to predict chronic disease burden among South Carolina Medicaid enrollees at the ZIP Code Tabulation Area (ZCTA) level. Census-defined ZCTAs are comprised of whole census blocks and spatially approximate USPS five-digit ZIP Code mail delivery areas [34]. These small-area units have served as proxies for residential neighborhoods in previous health studies [11,[35][36][37]. ZCTAs are appropriate units of analysis when, as in our case, residential address limitations (missing, incomplete or invalid street address data) prevent the geolocation and evaluation of spatial data at finer scales (e.g., across census tracts or census block groups). There are 424 ZCTAs in South Carolina with an average population of about 10,800 persons [38].
Based on a literature review, we evaluated a range of Census 2000 population and housing indicators [39] for inclusion in the deprivation index (Table 1). We assessed two variables in each of five distinct socioeconomic domains: education (percentage of persons 25 years and older without a high school diploma, percentage of persons 16 to 19 years not enrolled in school and not a high school graduate); income (percentage of noninstitutionalized population below the federal poverty level, percentage of households with income less than $15,000); employment (percentage of persons 16 and older unemployed, percentage of persons 16 to 64 working part-time); social fragmentation (percentage of persons 15 and older unmarried or separated, percentage of families with own children under 18 years headed by a single female); and material deprivation (percentage of housing units that are renter-occupied, percentage of housing units with no vehicle available). These five domains have been identified previously as relevant dimensions of small-area socioeconomic deprivation and have been consistently operationalized by others using the same or similar census measures [4,9,40,41].
We evaluated chronic disease burden among South Carolina Medicaid recipients across five adverse chronic health conditions: cardiovascular disease (CVD); diabetes; end-stage renal disease (ESRD); hypertension; and obesity. These diagnostic categories are among the most common and costliest chronic conditions affecting South Carolina Medicaid enrollees. Chronic disease status for the state's approximately 1 million Medicaid recipients was determined using primary and secondary diagnosis codes contained in South Carolina Medicaid administrative data sets from fiscal year 2010 (July 2009 to June 2010) [42]. ZCTA-level prevalence rates per 1,000 Medicaid enrollees were calculated for each chronic condition (Table 1).
In developing the new socioeconomic deprivation index, we sought to minimize the total number of census-based predictor variables while maximizing correlation with ZCTA-level Medicaid chronic disease rates. We scaled each predictor (X i ) using Fisher's Z-transformation to create a set of Z-score variables (Z i ) defined for n i observations j = 1,…,n i based on the associated original variable where X i is the sample mean of the ith predictor. This transformation ensures that each of the Z-score variables is standardized to have mean 0 and variance 1. We then calculated the mean correlation of each transformed variable across the set of five chronic condition prevalence rates. The single predictor with the highest mean correlation was the first component X i 1 f g included in the index. Thus, the best single predictor index, S 1 X i 1 f g, was defined as Additional variables were included only if the new measure represented a domain not yet in the index g only if including the new variable X i kþ1 increased the resulting index's mean correlation with the set of five chronic conditions (Cond i ) In constructing the index we considered only ZCTAs with complete attribute data across all ten census variables evaluated and for which Medicaid chronic disease prevalence rates could be calculated (N = 392). Thus developed, the final deprivation index, Palmetto SADI, consisted of three component variables: percentage of persons 25 years and older without a high school diploma, percentage of noninstitutionalized persons below the federal poverty level, and percentage of housing units with no vehicle available. In a factor analysis of all predictors, the three variables comprising the new index loaded on a single factor. The component variable loading scores were nearly identical; we thus considered each of the components to be of equal weight in its contribution to the overall index score. ZCTA-level index scores were derived by summing ZCTA-specific Z-scores for each component variable. Additive Z-score methods have been employed in the construction of other socioeconomic deprivation measures [24], including the widely known Townsend index. Had the factor analysis identified multiple factors or had the components loaded differentially, component variable weighting might have been indicated. That there was a single factor with similar loadings is consistent with the summative Z-score approach used.
A number of alternative methods have been used to construct small-area socioeconomic deprivation measures [24,31]. We investigated the selection of deprivation index component variables using boosted regression methods based on regression forests. Boosted regression, or boosting, is a statistical learning algorithm that averages the results of large numbers of decision trees (forests) to derive predicted values. This data mining algorithm has proven valuable in wide-ranging health studies, including investigations of dengue transmission [43], gene expression [44], and complex epidemiologic interaction effects [45]. Using boosting methods, we estimated the relative influence of each of the ten socioeconomic covariates (two variables in five socioeconomic domains) in predictive models of each of the five chronic disease outcomes identified previously. Allowing for 20,000 possible models, we selected the three  [20]. ZCTAs with population centroids located within federally designated PC-HPSAs and/or MUAs/MUPs were classified accordingly. South Carolina Medicaid administrative data from fiscal year 2012 (July 2011 to June 2012) were used to identify chronic disease status for state Medicaid enrollees [48]. We first tested the capacity of Palmetto SADI to predict chronic disease burden among a random sample of Medicaid enrollees as measured across five selected conditions (CVD, diabetes, ESRD, hypertension, and obesity). Two chronic disease burden indicators-one reflecting the presence of at least one chronic condition and the other representing the presence of two or more conditions-were created for a random sample of 5,000 Medicaid recipients geocoded at the ZCTA level using recipient residential address data. Utilizing this sample, we performed logistic regression analyses to evaluate the ability of Palmetto SADI and four alternative measures of small-area deprivation to predict chronic disease burden among Medicaid enrollees based on their ZCTA of residence.
In these analyses Palmetto SADI, Townsend, and poverty were evaluated as continuous measures; PC-HPSA and MUA/MUP were modeled as binomial variables. We evaluated the performance of all models using the area under the Receiver Operating Characteristic curve (AUC). This statistic summarizes a model's discrimination, i.e., ability to correctly classify individuals' chronic disease status. AUC values close to 1 show near perfect discrimination. The model fit was evaluated using the corrected Akaike information criterion measure (AIC). This is a measure of the model's deviance or difference from a saturated (perfectly predicting) model. Lower values of AIC indicate a preferable model. Bootstrapping was used to estimate standard errors of the AUC and AIC values which allowed assessment of significant differences across models; by this approach, we generated 199 random samples (with replacement) from the original data and reestimated each of the five models. Approximate standard errors were given by the standard deviation of results from the bootstrap samples. For example, the standard error of the observed area under the curve AUCO o is where AUC i is the area under the curve of the model estimated from the ith bootstrap sample. Next, we derived ZCTA-level total Medicaid population and chronic disease counts for each of the five chronic conditions represented in logistic regression analyses, based on georeferenced data for the entire Medicaid population (N = 1,024,034). We further derived two ZCTA-level chronic disease burden counts (presence of at least one chronic condition and presence of two or more conditions). We calculated odds ratios to assess associations between high socioeconomic deprivation as measured by Palmetto SADI, the Townsend index, and the poverty measure (top versus bottom quartile of each continuous deprivation measure distribution) and each of the seven chronic condition indicators (five single conditions, presence of any condition, presence of two or more conditions). Similarly, we calculated odds ratios to evaluate associations between two binomial measures of health care provider resource deprivation (PC-HPSA, MUA/MUP) and each of the seven chronic condition measures.
We performed Ordinary Least Squares (OLS) and spatial regression analyses to further evaluate small-area deprivation measure associations with chronic disease burden at the ZCTA level, again based on georeferenced data for the entire Medicaid population. Chronic disease prevalence rates were calculated for five conditions (asthma, CVD, diabetes, ESRD, and hypertension). Two chronic disease burden prevalence rates (presence of at least one chronic condition and presence of two or more conditions) also were calculated. As in previous logistic regression analyses, Palmetto SADI and four alternative measures of small-area deprivation were modeled. Preliminary OLS regression analyses with spatial diagnostics (Moran's I) indicated statistically significant spatial autocorrelation in all models tested. Spatial regression models (spatial lag or spatial error models as indicated by Lagrange Multiplier test statistics) were employed to account for the spatial autocorrelation of modeled variables. Spatial regression results are reported. AIC and Schwarz Bayesian information criterion (BIC) values from spatial regressions were used to evaluate goodness of fit for each small-area deprivation model, with lower values indicating preferable models. To ensure greater prevalence rate stability and protect recipient confidentiality in mapped results, all ZCTA-level index validation analyses were restricted to ZCTAs with at least 30 Medicaid enrollees (N = 372). The operationalization of small-area deprivation measures for this set of ZCTAs is summarized in Table 2. Logistic regression modeling and bootstrapping procedures were performed using Stata software Version 12 [49]. OLS and spatial regressions were conducted using GeoDa version 1.6 [50]. All geoprocessing was performed using ESRI ArcGIS 10.2 [51].

Results
Approximately 15 % of all South Carolina Medicaid recipients had at least one of the five chronic conditions considered in the construction of the deprivation index; nearly 6 % had two or more conditions. Figure 2 illustrates a clear association between observed rates of chronic disease burden (as indicated by the presence of at least one select chronic condition) among a random sample of Medicaid enrollees and the predicted probability of chronic disease burden based on ZCTA-level socioeconomic deprivation as measured by Palmetto SADI (observed rates are depicted as dots with associated 95 % confidence intervals; a curved line represents the predicted probability). In logistic regression analyses based on a random sample of 5,000 Medicaid recipients, Palmetto SADI was a better predictor of chronic disease burden (presence of at least one chronic condition and presence of two or more conditions) than the Townsend index, poverty measure, PC-HPSA designation, and MUA/MUP designation. The Palmetto SADI model had a significantly higher AUC (P < 0.001) and a significantly lower AIC (P < 0.001) compared to all four alternative models (Table 3). In separately performed age category analyses, Palmetto SADI was the best predictor of chronic disease burden (at least one chronic condition, two or more chronic conditions) in adult Medicaid recipients and the overall best predictor of chronic disease burden among child Medicaid beneficiaries as measured across three chronic conditions affecting children-asthma, diabetes, and obesity (there was no statistical difference between the two best predictors of any chronic condition in children, Palmetto SADI and the Townsend index; nor was there any statistical difference between the two best predictors of comorbidity, Palmetto SADI and the poverty measure).
Unadjusted odds ratios indicated significantly higher levels of chronic disease in high-versus low-deprivation ZCTAs, regardless of the deprivation indicator used. For all chronic conditions but obesity, the observed odds ratios were highest when Palmetto SADI was used to identify high-deprivation areas. Likewise, odds ratios for both chronic disease burden indicators (at least one chronic condition, two or more chronic conditions) were highest when Palmetto SADI was used to identify high socioeconomic deprivation (Table 4).
Consistent with logistic regression results, spatial regression analyses identified Palmetto SADI as the best small-area deprivation predictor of chronic disease burden (at least one condition, two or more conditions)  (Table 5). Separate age category analyses showed Palmetto SADI was the best predictor of any chronic disease and multiple chronic conditions among adult Medicaid recipients. For child Medicaid beneficiaries, there was no substantial difference between the two best small-area deprivation measures, Palmetto SADI and the Townsend index, as predictors of childhood chronic disease burden. The lack of discrimination between these two deprivation indicators likely reflects the low prevalence of chronic disease measured among child enrollees. Figure 3 shows the geographic distribution of Palmetto SADI high deprivation ZCTAs (top quartile of ordered ZCTA-level Palmetto SADI scores) and high disease prevalence ZCTAs (top quartile of ordered ZCTA-level chronic disease burden rates, prevalence of at least one chronic condition) in South Carolina. Substantial spatial coincidence of high deprivation and high disease prevalence areas exists. If not geographically coincident, high disease prevalence areas typically adjoin Palmetto SADI high-deprivation areas.

Discussion
We found significantly higher levels of chronic disease in high-versus low-deprivation ZCTAs, regardless of the deprivation measure used, a result that is consistent with a growing international body of literature indicating higher rates of wide-ranging adverse health outcomes in resource-poor communities [4,5,8,11,29,30,52]. Notably, the highest odds ratios for chronic disease burden were associated with the Palmetto SADI operationalization of small-area socioeconomic deprivation. In both logistic and spatial regression analyses, the Palmetto SADI model was the best overall predictor of chronic disease burden (any condition and two or more conditions) among South Carolina Medicaid enrollees, compared to  four alternative small-area deprivation models. Our results indicate the widely used Townsend index and single-variable poverty index are not always the best small-area deprivation measures by which to identify atrisk populations for targeted health interventions. Similarly, we found HRSA PC-HPSAs and MUAs/MUPs less predictive of chronic disease burden than Palmetto SADI, a finding in line with calls in the United States to revise HPSA and MUA designation criteria to better reflect population health care need, in addition to provider supply and demand [53]. The ability of Palmetto SADI to accurately identify areas of high chronic disease burden is of value to policy and decision makers responsible for the geographic allocation of limited health care resources. Resource allocation efficiency, however, also requires that the inaccurate identification of high burden areas by the index be minimized (i.e., the measure's false positive rate should be low). Utilizing a model-specific cutoff value to ensure equality of means, we calculated the false positive rates of Palmetto SADI and the four alternative deprivation measures in identifying areas of high chronic disease burden (presence of any condition). Of the measures tested, Palmetto SADI had the lowest false positive rate (15.8 %); the Townsend index had the second lowest rate (17.6 %).
Although small-area deprivation measures have proven useful in geospatial assessments of population health and health inequality, such measures are subject to criticism, particularly in terms of variable selection and index construction [6]. We based our initial selection of ten candidate variables on a review of relevant literature. All of the variables we considered as index components represent widely recognized socioeconomic deprivation domains [4,9,40,41]. Our decision to weight each of the component variables equally in an additive Z-score index was based on the results of a factor analysis in which all three variables loaded on a single factor with nearly identical loading scores. Our exploration of an alternative construction method failed to yield a superior index. Ultimately, the construction of a deprivation index must be consistent with clearly defined planning and policy goals [54]. With this guideline in mind, we developed Palmetto SADI specifically to identify areas of high chronic disease burden among South Carolina Medicaid recipients. The high predictive validity [47] of the derived index established in logistic and spatial  Beyond the recognition of conceptual and methodological challenges associated with the construction of any socioeconomic deprivation measure, several limitations specific to the development, validation, and application of Palmetto SADI should be identified. First, chronic disease status was determined using diagnostic codes in Medicaid administrative data sets. Administrative data are widely used in health studies and the validity of such data sets has been established [55]. More accurate information about individual recipient health status, however, might be derived from patient clinical records. Second, behavioral health disorders were not considered in the development of the index. Further research is needed to evaluate the ability of Palmetto SADI to predict such chronic behavioral conditions as ADHD and depression. Third, index validation analyses only included ZCTAs with 30 or more Medicaid enrollees. The ability of the new index relative to other deprivation measures to predict chronic disease burden in very small Medicaid population areas thus remains uncertain. Fourth, the ZCTA-level Palmetto SADI does not permit evaluation of chronic disease burden at finer geographic scales. Residential address quality issues (missing, incomplete, or invalid street address information) prevented us from georeferencing Medicaid recipients at census tract or census block group levels. More than 98% of recipients, however, could be geocoded at the ZCTA level. Caution should be exercised in the use of ZCTAs in health systems research, particularly because postal ZIP Codes and census ZCTAs do not always correspond, either in nominal or spatial terms [56]. In this study we minimized potential ZCTA-level geocoding errors by using street address data whenever available and by using both ZIP and ZIP-plus-4 centroid coordinate data when street address information was missing or incomplete. Lastly, the new index was constructed specifically to predict chronic disease burden among South Carolina Medicaid enrollees. Further research is needed to evaluate the utility of the index for this or similar analytic purposes in neighboring Southern states and other geographic regions. As indicated by specific policy or programming requirements, the methodology described might be used to construct census-based socioeconomic deprivation measures for both smaller (e.g., census tract, census block group) and larger (e.g., hospital referral region, county) areas. "Tailored" deprivation indexes [22] also might be created to predict chronic disease burden or other health conditions among different subpopulations (e.g., children, older adults, or women). As this study illustrates, user-derived, census-based small-area deprivation measures can outperform such widely employed deprivation indicators as the Townsend index and single-variable poverty measure in predicting region/population-specific health outcomes.
The development of Palmetto SADI is consistent with calls for better measures of social and health deprivation that permit the identification and reduction of health disparities across time and space [57] and that inform decisions regarding the geographic allocation of health resources [53]. The derivation of the new index parallels the construction of other recent region/population-specific small-area deprivation measures for health research [26][27][28][29][30][31]. Palmetto SADI is the first socioeconomic deprivation index developed specifically to inform policy and programming for a US Medicaid population. The new index can be introduced to public health and health care stakeholders in South Carolina as regionally relevant and straightforward in interpretation, thereby encouraging support for-and actual utilization of-the information tool. Palmetto SADI can be used to identify areas at high risk for chronic disease burden among Medicaid recipients and other Medicaid-eligible lowincome populations for targeted prevention, screening, diagnosis, disease self-management, and care coordination activities. Our spatial visualization results suggest that in many instances such intervention efforts could appropriately be extended into areas immediately surrounding (adjacent to) high-deprivation neighborhoods. Geographically targeted interventions aimed at early diagnosis, appropriate disease management, and effective care coordination all can improve chronic disease outcomes and may yield health care cost savings by reducing patient emergency room visits, hospitalizations, hospital readmissions, and unnecessary prescription drug use [58,59]. Coordinated and continuous chronic disease management also may slow disease progression, allowing patients to maintain functional status [55] and thereby avoid or delay expensive long-term institutional care.
Decision making to prevent and more effectively manage chronic disease in vulnerable populations requires consideration of factors other than small-area socioeconomic deprivation. Palmetto SADI may be most valuable as a policy and program planning tool when combined with other small-area assessment strategies measuring such factors as healthy food availability [60], health care accessibility (remoteness) [25], health professional workforce supply [25], adequacy of health care provider education programs [61], health care utilization, and health care quality. The integration of Palmetto SADI with diverse data elements like these, especially in the context of a geographic information system (GIS), could strengthen efforts to locate at-risk populations, identify gaps between health need and available health care and other community resources, target program initiatives, and encourage stakeholder collaboration to promote population health and reduce health disparities over time and space.

Conclusions
As a predictor of chronic disease burden among South Carolina Medicaid recipients, Palmetto SADI outperformed all alternative small-area deprivation measures tested. Palmetto SADI can be used to identify areas in South Carolina at high risk for chronic disease burden among Medicaid recipients and other low-income Medicaid-eligible populations for targeted prevention, disease management, and care coordination activities.