A Model of Disparities: Clinical, Environmental, and Sociodemographic Risk Factors Associated with Likelihood of COVID-19 Contraction


 Background

By mid-May 2020, there were over 1.5 million cases of (SARS-CoV-2) or COVID-19 across the U.S. with new confirmed cases continuing to rise following the re-opening of most states. Prior studies have focused mainly on clinical risk factors associated with serious illness and mortality of COVID-19. Emerging risk factors in the U.S., including clinical, sociodemographic, and environmental variables associated with contraction of COVID-19 have not been widely studied to assess disparities across populations.
Methods

A multivariable statistical model was used to identify predictors associated with COVID-19 contraction in the study population of 34,503 patients, comparing laboratory confirmed positive and negative COVID-19 cases in the Providence Health System (U.S.) between February 28 and April 27, 2020. Publicly available data were utilized as approximations for social determinants of health, and patient-level clinical and sociodemographic factors were extracted from the electronic medical record.
Results

Higher risk of contraction was associated with older age (OR 1.69; 95% CI 1.41–2.02, p < 0.0001), male gender (OR 1.32; 95% CI 1.21–1.44, p < 0.0001), Asian race (OR 1.43; 95% CI 1.18–1.72, p = 0.0002), Black/African American race (OR 1.51; 95% CI 1.25–1.83, p < 0.0001), Latino ethnicity (OR 2.07; 95% CI 1.77–2.41, p < 0.0001), non-English language (OR 2.09; 95% CI 1.7–2.57, p < 0.0001), high school education or less (OR 1.02; 95% CI 1.01–1.14, p = 0.04), residing in a neighborhood with financial insecurity (OR 1.10; 95% CI 1.01–1.25, p = 0.04), low air quality (OR 1.01; 95% CI 1.0-1.04, p = 0.05), housing insecurity (OR 1.32; 95% CI 1.16–1.5, p < 0.0001) or transportation insecurity (OR 1.11; 95% CI 1.02–1.23, p = 0.03), and living in senior living communities (OR 1.69; 95% CI 1.23–2.32, p = 0.001).
Conclusions

Risks associated with COVID-19 contraction reflect disparities across age, race, ethnicity, language, socioeconomic status, and living conditions. Health promotion and disease prevention strategies should prioritize groups most vulnerable to contraction and address structural inequities that contribute to risk through social and economic policy.

maintain health and prevent disease. 9 Determinants of contraction for COVID-19, including employment, education level, income, and housing conditions, which could in uence the ability to practice physical distancing measures and/or shelter in place, remain understudied, especially among communities of lower socioeconomic status. 10 Social determinants can directly impact the rates of infectious disease mortality, as poverty and inadequate housing can in uence the burden of disease, including who becomes infected and who responds to treatment. 9 As deaths continue to rise, new data are emerging in the U.S. that show sociodemographic factors and other social determinants of health are critical to understanding the unequal burden of infection and attributed mortality due to COVID-19. Communities of color and/or low socioeconomic status are experiencing disproportionate rates of serious illness if infected, due to preexisting economic and health inequities. 11,12 Pursuing rigorous models that address the risk of bias found in prior models 1 and incorporating existing and emerging ndings, is critical to slow the transmission of COVID-19 in the communities most susceptible to contraction.
With limited publicly available data, healthcare systems are key contributors to understanding all relevant patient and population level characteristics to inform population health strategies. As new ndings emerge, more scienti c studies are needed to demonstrate the signi cance of disparities in risk of contraction between populations, distinct from mortality risk, and their relevance to persistent health disparities across race, ethnicity, socioeconomic status, language, age, and geography. 13 Public health approaches, including biologic, behavioral, political and structural interventions, should account for the sociocultural in uences and structural mechanisms that contribute to increased risk of contraction when developing targeted public health practices and policies at the community and national level.

Study Design and Setting
This study was conducted at Providence, the third largest not-for-pro t health system in the U.S., servicing more than ve million people across seven states located in the Western and Southwestern portion of the U.S.

Data Source
Data were collected from the Providence enterprise data warehouse, including patient demographic, social, and behavioral history information, chronic conditions documented in clinical history, current conditions, prescribed medications, laboratory testing, and acute and ambulatory healthcare utilization. Recognizing that social determinants of health account for a larger portion of health outcomes than medical care determinants, 14 electronic medical record (EMR) data was supplemented with publicly available data, including the U.S. Census Bureau's 2018 American Community Survey (ACS) and CDC air quality data. Patient addresses were geocoded, and social determinant of health information, at the census block group or tract level, were matched at the patient level.

Participants and Procedures
Patients residing in Alaska, Washington, Oregon, Montana, and California (Los Angeles and parts of Orange County) who were tested for acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection between February 28, 2020 and April 27, 2020 were included in the modeling. Testing mechanisms included swabs from respiratory specimens appropriate for viral RNA testing from eight testing platforms.

Outcomes and Predictors
The principle dependent variable for our model was COVID-19 contraction, as indicated by a positive lab test. Relative risk of contraction was calculated as a ratio of the individual predicted probabilities to the population mean. The score showed a patient's risk relative to an average population risk.
The examined risk factors were informed by a comprehensive review of prior scienti c studies that documented the risk factors of mortality and the CDC list of groups at higher risk for severe illness. 8 Distributions of all continuous variables including age, BMI, number of medications, and neighborhood nancial insecurity were examined for normality and transformed into categorical attributes. Comorbidities were determined by problem list documentation or clinical encounter diagnoses using standard International Classi cation of Diseases, Tenth Revision, Clinical Modi cation (ICD-10-CM) nomenclature and further summarized into a measure of disease severity using total number of chronic conditions. Substance, tobacco, and alcohol consumptions were captured from social history assessments and clinician documentation. We added covariates into the model, derived from patient level data to act as a proxy for experiencing prolonged close physical proximity to others. Covariates include transportation insecurity, relationship status, employment, housing insecurity, and age-strati ed communal living. To assess disparities in contraction, age, gender, race, ethnicity, and language, were added to the model. We used Glottolog, a repository for the world's languages, to assign language groups. Geographic regions and clinical symptoms were included for risk adjustment. Census data on educational attainment and nancial insecurity were used to assess socioeconomic status. Missing data was recoded as unknown and included in the analysis. Detailed covariate de nitions and data sources are shown in the supplement.

Statistical Methods and Modeling
Descriptive statistics were used to summarize study participants. Continuous variables were described by means and standard deviations, while categorical variables were described using frequencies and percentages. We conducted bivariate analysis to assess a signi cant effect of each factor on the outcome using the chi-square and the student t-test, when appropriate. All covariates with p<0.25 in the bivariate analysis were considered for model inclusion since use of a more traditional level of 0.05 often fails to identify variables whose association with the outcome could become stronger in the presence of other variables. 15 In addition, all variables of known clinical importance found in previous studies that could make an important contribution were included to improve upon previous models. 1 Risk of contraction for all independent predictors was quanti ed with odds ratios (OR) and 95% con dence intervals.
The multivariable logistic regression model was used to build a prediction model to estimate the likelihood of contraction. Stepwise selection with backward elimination was used to allow broader inclusion of variables of interest and determine joint predictive capability.
Initial parameters for the model were identi ed in the training set and then tested at the subsequent step, with data randomly partitioned into two independent data subsets: 80% for training and building the model and another 20% for testing. The model's predictive performance, discrimination, and calibration were evaluated using the area under the receiver operating characteristics curve (ROC) and Hosmer-Lemeshow (HL) goodness-of-t statistic. The observed and expected frequencies within each decile of risk was compared. 15 All data manipulation and modeling were completed in SAS EG (SAS Institute, Carry NC).

Results
A total of 34,503 patients were included in the study. The average age was 50 years old (SD 20), 59.6% (21,209) were female, 12% (4,183) were identi ed as non-white race, and 66% (22,610) had at least one comorbidity. Within the study population, 7.5% (2,578) tested positive and 92.5% (31,925) tested negative for COVID-19. Of patients testing positive, 36% (924) were hospitalized and 9% (240) died during the study period. Notable differences in testing rates of COVID-19 were observed for the number of chronic conditions and polypharmacy factors. Smaller variations were seen for employment, language, BMI, serious mental illness, substance use, age-strati ed communal living, and housing insecurity. Little to no variation existed for gender, race, ethnicity, religious a liation, and air quality (Table  1). Table 2 shows which of the twenty-nine sociodemographic, clinical, and environmental covariates were associated with the greatest odds of contraction in the multivariable model. Other factors were signi cant in the model, but associated with less risk, requiring further discussion and study.

Environmental Risk Factors
Individuals living in areas with low air quality (OR 1.01; 95% CI 1.0-1.04, p=0.05) were at higher risk of contraction, as were those experiencing nancial insecurity (OR 1.10; 95% CI 1.01-1.25, p=0.04), or living in areas with transportation insecurity (OR 1.11; 95% CI 1.02-1.23, p=0.03). Individuals living in senior living facilities were more likely than those living in non-communal environments to contract the virus (OR 1.69; 95% CI 1.23-2.32, p= 0.001). Individuals who live in a neighborhood with high rates of housing insecurity (OR 1.32; 95% CI 1.16-1.5, p< 0.0001) were at higher risk of contraction compared to those living in neighborhoods without housing insecurity.
The model performed consistently across training and testing data sets with a ROC of 0.78 and the HL chi-square of 4.4 (p=0.81). The probabilities of contraction partitioned into "deciles of risk" (i.e. equal groups from smallest to the largest) did not highlight any "underperforming" areas.

Discussion
To our knowledge, this is the rst study conducted that examines a multitude of clinical, sociodemographic, and environmental risk factors that can contribute to higher rates of contraction and applies the factors to develop a predictive model that can assess disparities of risk in populations. This retrospective risk of contraction study identi ed several risk factors also associated with serious illness in prior studies, including older age and greater risk progression with age, 3 male gender, 16 comorbidities of diabetes, 7 and chronic kidney disease, 17 higher BMI, 18 and immunosuppression. 19 However, factors found in previous studies for risk of mortality, including hypertension, 3 and other variables associated with groups at higher risk identi ed by the CDC including those with cardiovascular disease, liver disease, lung disease, or asthma, 8 were not signi cant factors associated with contraction. Being prescribed more than ten medications or having a greater number of chronic conditions was associated with less risk of contraction, suggesting behavioral differences between groups based on perceived risk. Further research is needed to understand differences between risks associated with serious illness and mortality and contraction, as well as factors that may facilitate or impede engagement in physical distancing or other preventative health behaviors, which may vary widely based on barriers, structural inequities, or personal choice.
Healthcare access through a relationship with a primary care provider was associated with a lower risk of contraction; however, this may be a result of higher rates of testing for COVID-19 compared to individuals with no primary care provider. Receiving secure electronic communication through the EMR suggests that access to health advice and education may reduce risk. Further research is needed to identify how healthcare access, utilization, and health communications could reduce risk for vulnerable groups. Serious mental illness and drug use were associated with lower risk; however further study is necessary to understand known mechanisms for risk of contraction. Variability in risk across regional geography necessitates continued study. The ndings of this study indicate that risk factors such as socioeconomic status, race, ethnicity, environmental living conditions, and healthcare access are intersecting variables across populations, and may collectively contribute to disparities in the risk of contraction among vulnerable groups.
Older age is associated with both higher risk of contracting COVID-19 and higher mortality 20 compared to younger cohorts. Older adults living in senior communities are also at higher risk of contraction, which could be due to dependency on caregivers to complete activities of daily living, which make physical distancing a challenge. Dementia was also associated with risk of contraction, likely due to a higher reliance on daily caregiving.
Higher risk of contraction among black, indigenous, and/or people of color may be associated with other sociodemographic and environmental characteristics found to also be signi cant in this study. African Americans and Latinos are more likely to live in communities with poor air quality, 21 work in jobs that cannot telecommute, 22 and lack access to healthcare 23 which may increase the risk of contraction and contribute to racial disparities in mortality. Chronic conditions such as obesity, stroke, and diabetes, and premature death also affect racial and ethnic groups disproportionately compared to whites, although differ comparatively between groups. 13 More research is needed to identify the risk and protective factors for contraction, including within-group variation and among indigenous communities. Communities of color are also more likely to experience lower socioeconomic status, 24 and be employed as essential workers. 10 For vulnerable groups, lack of personal transportation is both a barrier to healthcare access 25 and increases exposure to others, contributing to disparities in contraction.
Given the known mechanism for community transmission, variables selected as approximations for social and living conditions that might increase the risk of contraction, such as being in a married relationship or having a signi cant other, being employed, lacking access to a personal vehicle for transportation, and living in overcrowded housing were signi cant factors for increased risk also evident in disparities across socioeconomic status and race. Religious a liation was also associated with increased risk, which may be attributed to attendance of large religious services or other behaviors associated with religious identity.
Having limited English pro ciency (LEP) can be a barrier to accessing health services and understanding health information, which can be exacerbated when written translations and trained translators are not available. 26 Over the course of the pandemic, health information has changed rapidly, which can adversely affect indigenous and immigrant communities. During the Ebola epidemic in West Africa, language barriers were an obstacle to slowing the spread of the disease. 27 People with LEP are also more likely to have low health literacy compared to English speakers and are at a higher risk of poor health. 28 Anti-immigrant policies also impose barriers to accessing healthcare and discourage care seeking, particularly among undocumented immigrants. 29 Culturally and linguistically appropriate interventions are essential, including communication materials of varying formats and reading levels developed through transcreation, where native language speakers work in tandem with English speakers, as well as the use of community health workers that can engage with underserved groups. 30 People experiencing housing insecurity may experience challenges with physical distancing, especially when housing is crowded, or may be less able to engage in hand washing when facilities or running water may be limited. 31 Both factors could facilitate the spread of the virus. Additional research is needed to understand the impact of housing insecurity, living conditions, and environments on COVID-19 contraction.

Study Limitations
The model did not include any patient data outside the Providence Health System. Although the organization serves a diverse patient population across seven states, the generalizability of the study results may be limited to the entire U.S population. Furthermore, inconsistent availability and reliability of the testing could bias the results. With limited testing available and evolving screening guidelines, clinical discernment, and personal bias could impact which individuals received testing and thus, in uence rates of testing in certain populations. When developing this model, we intended for the study to include all major covariates; however, since COVID-19 research is changing, it is likely that there are other factors associated with the likelihood of contraction that are not well known yet and, thus not present in the observed data. We were not able to account for people's behaviors, which could bias the results. Additional research is needed to understand additional factors correlated with higher instances of COVID-19 related to inpatient utilization and risk of mortality.

Conclusions
The ndings of associated risk factors, as well as the models to predict risk, have important implications for healthcare systems, public health departments, and city and state governments to further reduce the risk of contraction and spread of COVID-19 in communities that may be disproportionately impacted. The ability to assess the risk of contraction can inform targeted public health approaches given known health outcomes, healthcare utilization patterns, social and cultural practices, and underlying social determinants of health that exist within those populations. Linguistically and culturally appropriate prevention education, healthcare access including routine care and COVID-19 testing, and efforts to address substandard housing and poor working conditions are essential to reducing risk among vulnerable groups, especially communities of lower socioeconomic status which experience a greater incidence of infectious diseases. 32 Now, and as the nation recovers, addressing the disparities in contraction that contribute to rates of serious illness and mortality among vulnerable communities are needed to alleviate the disproportionate burden of the pandemic and persisting health disparities. The Providence Institutional Review Board (IRB) approved this study for all gathered data and analysis. In accordance with 45 CFR 46.116(d), a waiver of informed consent a Waiver of Authorization were approved in accordance with 45 CFR 164.512(i)(2)(ii) on 4/2/2020 under Expedited Review Procedures. The IRB was satis ed that the use or disclosure of protected health information involved no more than a minimal risk to the privacy of individuals.

Consent for Publication
Not applicable

Availability of Data and Materials
The datasets generated and analyzed during the current study are not publicly available as stipulated by the Providence IRB that all patient level data would reside within Providence secured computer network, only accessible to the study investigators, and locked up on Providence property. The publicaly available data source was accessed via a proprietary data vendor, which cannot be shared publicaly due to their contractual agreement. The underlying publicaly available data sources include the 2018 American Community Survey and the Centers for Disease Control and Prevention Air Quality.

Competing Interests
The authors declare that they have no competing interests.

Funding
This was an internally funded study, with no external nancial interest. The study was aimed to improve patient and population outcomes and support the healthcare system's response to COVID-19. The corresponding authors had full access to all data in the study and had nal responsibility for the decision to submit for publication.
Author's Contributions YR and JB were responsible for study design, data collection, data management, and data analysis. All authors were responsible for data interpretation. YR, HM and WH wrote the rst draft of the manuscript. HM, and JC were responsible for the scienti c literature review. All authors contributed to the nal draft. All authors read and approved the nal manuscript.