Based on census data, we divided electoral areas into metro/capital, urban and rural. Proportional to the population in each resulting stratum, we drew a random sample of enumeration areas. From a list of all registered schools, we matched schools to each enumeration area, identifying a total of 1194 schools in this way. In three of the nine provinces, additional funding allowed over-sampling to increase the local relevance.
The Provincial Departments of Education in all nine provinces gave permission for the study as part of curricular activity, typically in the context of Life Skills classes. The facilitator explained to each class that the questionnaire was voluntary and could be stopped at any time. Facilitators also explained that no questionnaire would be marked with an identity, and they arranged classroom logistics to permit each learner some privacy.
Concept development and pilot
Because there is no word for rape in several of the South African languages, we used the expression "forced sex without consent" in nine of the 11 official languages. We arrived at this through feedback from results of a pilot study that included 9000 youth in urban, rural and remote communities in the nine provinces (27 pilot sites, in nine languages). The pilot questionnaire used "rape" or its equivalent in three languages, and a variety of phrases in other languages, intended to communicate the same meaning. In each site, separate focus groups for male and female youth considered the pilot results and discussed the wording in their own language. After translation and back-translation of the resulting phrase by someone not associated with the study, the final formulation went through two to five rounds of questionnaire piloting in the nine languages of implementation. Focus groups in urban and rural areas in each province discussed and validated the outcomes, as the design team tried to be sure we were measuring what we intended to measure.
The anonymous, facilitated self-administered instrument included questions on attitudes and experience regarding sexual violence and HIV risk. In each classroom, a facilitator read each question and explained its meaning following a pre-tested script in English, Sesotho, Sepedi, Setswana, Setsonga, Tshivenda, IsiZulu, IsiXhosa and Afrikaans, depending on the needs of the class.
Schoolchildren answered questions about the following outcome measures: did you suffer forced sex without consent in the last year; have you ever been forced to have sex without your consent by a learner, a teacher, another adult, a family member; at what age were you first forced to have sex without your consent; were you forced to have sex without your consent by a male, female, both. In addition to their age and sex, learners also provide information on HIV risk-related knowledge, attitudes and practices, their exposure and preferences towards national intervention programmes, and perceived HIV status – we share findings concerning those measures elsewhere .
Data collection and management
Data collection took place from 7 October to 22 November 2002. Teams visited a total of 5162 classes in 1191 schools. We employed several measures to reduce bias. Facilitators asked educators to leave the class prior to the survey, and asked participants not to write their names or any identifying marks on the questionnaires. Facilitators made serious efforts to prevent viewing of questionnaire responses by nearby students, instructing children to cover questionnaire responses with exercise books. They arranged for the provision of "shield" books for pupils who did not have one. Respondents completed questionnaires on their own, turning them facedown once completed. Facilitators collected questionnaires from learners and placed them in an envelope which they immediately sealed. The sealed envelopes were only opened again at data entry. We informed learners of this process prior to handing out questionnaires, to assure them their responses would remain anonymous. Four scanners read and verified data from the questionnaires.
We rebalanced unequal representation of provinces by weighting estimates of national occurrence indicators of forced sex in the last year and "ever". The full sample and the raising factors applied to estimate national prevalence rates are reflected elsewhere . Risk analysis used the Mantel-Haenszel procedure  which stratifies the main contrast by other factors to make sure the finding cannot be explained by covariants (age, sex, HIV risk-related knowledge, attitudes and practices, exposure and preferences towards national intervention programmes, and perceived HIV status).
We adjusted for the dependency between reports from participants from the same cluster, using the adjusted Mantel-Haenszel chi-square statistics of Zhang and Boos . This reduces chi-square estimate, increasing the confidence intervals roughly in proportion to the intra-cluster correlation coefficient. We opted for 99% confidence intervals to offset the effect of multiple testing in the principal contrasts. We then examined the mutual influence of factors that affected forced sex using logistic regression (stepping down from a saturated model) using CIETmap, which derives odds ratios for each determinant, taking into account the others in the final model . The saturated initial model included urban/rural, type of school, province, age, attitudes about sex (need to have sex to show love, girls have the right to refuse sex, girls like sexually violent guys), age at sexual debut, how often they talk about sex, ever forced sex with someone else, believe condoms prevent HIV/AIDS, belief about personal HIV status and other abuse (verbal, beating).