### Study population

This study is based on the LOMAS (Longitudinal Multilevel Analysis in Scania) - a record linkage database that includes all the individuals living in Scania, Sweden, during the period 1968 to 2006. Scania is the most southern part of Sweden and contains approximately 12% of the Swedish population. The project has been approved by the Regional Ethical Committee in South Sweden. LOMAS was assembled with the allowance and assistance of Statistics Sweden, The National Board of Health and Welfare (Centre for Epidemiology), and the Region of Scania (Unit of Social Medicine). A unique ten-digit personal identification number assigned to each person in Sweden was used by the Swedish authorities to link the different registers. However, the research database does not contain the real personal identification number of the individuals but rather an encrypted number that ensures the anonymity of the individuals. Our investigation uses information from the 1970 Swedish Census, the 1970 Population Register, and the Mortality Register for the period 1970-2000.

In the present analysis we defined a baseline cohort composed of all 49,154 individuals aged 64 to 69 and residing in Skåne by 31^{st} December 1970. The dataset consists of 42,838 households in 402 parishes within 69 municipalities of the Scania region; thus providing four levels of hierarchy. We followed individuals from baseline until death or end of follow-up by 31^{st} December 2000. Over 90% of individuals in this cohort had completed the follow-up with outcome events (death) observed, with 7.6% of males and 5.4% of females censored.

Since our study is a pure cohort study, migrants into the study region after 1970 are not part of the study sample. We assumed a very low mobility of the elderly population in general, and thus assumed that the number of such migrants would be very small. Initial cohort members who migrated within the study region, or even within the country, are included because their death should have been registered and kept in the regional or national databases. Those who emmigrated to other countries after 1970 are treated as censored at the time of study. The proportion of censoring data was 1.2% to 4.7% for men and 3.4% to 12.3% for women among five age groups. The proposed survival time models are designed to handle such data.

### Predictors of life expectancy

Age and gender are individual level variables. We categorized age at baseline in 1970 into five categories of one year (i.e. 65, 66, 67, 68 and 69) and used 65 years as reference in the comparisons.

Socio-economic status was defined at the household and municipality level. The categorisation at each level was based on descriptive pattern analysis of the data to find threshold of variables that more or less maximised differences between categories. At the household level, we defined a variable of 4 category levels by combining household monthly disposable income per head of family members and family size to represent household socioeconomic position (Household SEP). The poorest were families that had more than 4 members without any disposable income. The next poorest were families that had 1-4 members without any income. Moving on to the higher level were families with income between 1 - 1000 SEK per family member, and the top group were those with more than 1000 SEK per family member. For the socioeconomic characteristics of the municipalities we used the median individual income of each municipality in 1970 as a municipality level variable.

### Statistical methods

Multilevel accelerated failure time models fit time of death to individuals, taking into account attributes of the time until death at both individual and context levels [15, 22]. Let *T* indicate time to death, the simplest form of a 2-level model can be expressed as *y* = log (T) = *β*_{0} + *v*_{2} + *v*_{1}, with a fixed parameter estimate *β*_{0} = log (*t*_{0}) the logarithm of the median life expectancy without adjusting for confounding and two random variables *v*_{2}, *v*_{1} to reflect those between and within individual variance around the median life expectancy respectively. In other words, the median life expectancy varies between areas where individuals resided and between individuals within an area. The variance of the two terms is the estimate of between and within individual variance and respectively. The model with log link assumes independence of the two random variables in log scale and hence their variances are additive and sum to total variance of log(LE) [15]. For data of more than 2 levels in our case with municipalities at level 4, parishes at level 3, household at level 2 and individuals at level 1, the total variance can be further disentangled explicitly to estimate the variance at each level. The model is extended by adding two more random parameters so that the total variance of individual life expectancy in log scale has four components and sums to . In the situation where a large proportion of households consists of only one individual, variance components can be constructed to reflect the data structure so that the level 2 variance measures only variability across those households with more than one individual.

To adjust for confounding or estimate average inequality of subgroups such as age, gender or income level; the model can be further extended in any regression model by adding these variables as covariates. Further more, distributions of life expectancy by gender for example, can be specified by the mean and variance together for men and women respectively, using a variance partitioning model [22]. This provides us with a full picture of the health inequality of a subgroup at both the average level and variability at the individual level.

Given the observed lifetime of individuals, the outcome variable, the estimated median life expectancy of the study population would be and for the *i*^{th} individual with covariate X. The survival probability of the *i*^{th} individual is the cumulative probability function of the distribution, for example, for log-Normal distribution.

Indices of so called relative and absolute health inequalities [2, 12] for life expectancy can be derived based on model estimates. For Individual-Mean differences, IM [a,b] = , where *a* is the power order of the numerator and *b* that of the denominator, μ the population mean of the inequality variable *y* and *n* the population size. When both *a* and *b* are equal to 2, i.e. the quadratic term for both the numerator and denominator, the classical Coefficient of Variance (CV), an index of relative health inequality, applies, and , where n is the total number of individuals, the parameter estimate for population mean in association with a subgroup in the model, and the standard error estimate at the individual level 1. The IM can be calculated for individuals in different groups such as gender or household socioeconomic groups. Replacing with , with , we obtain an estimate of the relative inequalities among parishes.

At the individual level, we can consider Inter-Individual differences, indicated as , and the specific index of so called absolute inequalities but relative to the population mean is in theory comparable with the traditional Gini coefficient in the single level case. When the numerator takes a cubic term, i.e. *a* = 3 or II[3,1], the indices weigh towards the extreme individuals. Here is the survival probability estimated from the multilevel model including the level 1 random effects of *v*_{1}. For comparison purpose we also calculated the traditional Gini coefficient [13] for inter individual differences in their survival probability and inter area differences in parish LE based on both raw data and model estimates. For the latter, the estimated median life expectancy of parish mean is the y variable (*M*_{v 3}). For example, for the inter-parish inequalities, the median life expectancy is estimated as .

To address the four objectives of the study, three steps of analysis were carried out. At Step 1 for objective (i), we fitted a series of models with random effects included for 2 or 3 or all 4 levels in order to identify the variance components of random effects that best fitted the pattern in data. All models included sex and age of individuals in 1970 as covariates. For each model we tested the significance of the variance components using the Wald statistic [15]. This analysis enabled us to assess how much of the variation in individuals life expectancy was attributable to each of the four levels and to establish a basic variance component model in which the variation of life expectancy can be estimated at each level.

At Step 2 for objective (ii), we examined the extent to which the household socio-economic group affects individuals' health by adding the household socio-economic variable into the basic model established in Step 1. The differences in the median life expectancy between individuals from different family socio-economic groups are captured by the new regression coefficients in the model, controlled for age and sex. The differences in the estimated variances at each level between the new model including the household SES variable and the baseline model without the variable reflected the variability in life expectancy explained by the variable only at either individual level or higher levels. Other risk factors potentially confounding the relationship between household SES and individual mortality could have been included in the model to illuminate relationships further, but in the interests of presenting a simple model for illustrative purposes, we have not pursued such an analysis.

For objective (iii), we then estimated variance in life expectancy for men and women and for each household SES group in order to compare the distribution of LE among the subgroups.

Finally, in Step 3 for objective (iv), we calculated the indices of relative and absolute health inequalities based on our models, and compared them with traditional coefficient of variance (CV) and Gini coefficient.

We also assessed the validity of our models by comparing the age-gender specific life expectancy in remaining years estimated by our models to those by the Kaplan-Meier (KM) estimator based on the raw data.