# Data Appraisal

## Estimates of Sampling Error

Estimates from a sample survey are affected by two types of errors: 1) non-sampling errors and 2) sampling errors. Non-sampling errors are the results of mistakes made in the implementation of data collection and data processing. Numerous efforts were made during implementation of the 2005-2006 MICS to minimize this type of error, however, non-sampling errors are impossible to avoid and difficult to evaluate statistically. Sampling errors can be evaluated statistically. The sample of respondents to the 2005-2006 MICS is only one of many possible samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differe somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability in the results of the survey between all possible samples, and, although, the degree of variability is not known exactly, it can be estimated from the survey results. The sampling erros are measured in terms of the standard error for a particular statistic (mean or percentage), which is the square root of the variance. Confidence intervals are calculated for each statistic within which the true value for the population can be assumed to fall. Plus or minus two standard errors of the statistic is used for key statistics presented in MICS, equivalent to a 95 percent confidence interval. If the sample of respondents had been a simple random sample, it would have been possible to use straightforward formulae for calculating sampling errors. However, the 2005-2006 MICS sample is the result of a multi-stage stratified design, and consequently needs to use more complex formulae. The SPSS complex samples module has been used to calculate sampling errors for the 2005-2006 MICS. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. This method is documented in the SPSS file CSDescriptives.pdf found under the Help, Algorithms options in SPSS. Sampling errors have been calculated for a select set of statistics (all of which are proportions due to the limitations of the Taylor linearization method) for the national sample, urban and rural areas, and for each of the five regions. For each statistic, the estimate, its standard error, the coefficient of variation (or relative error -- the ratio between the standard error and the estimate), the design effect, and the square root design effect (DEFT -- the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used), as well as the 95 percent confidence intervals (+/-2 standard errors). Details of the sampling errors are presented in the sampling errors appendix to the report and in the sampling errors table presented in te external resources.

## Other forms of Data Appraisal

A series of data quality tables and graphs are available to review the quality of the data and include the following: Age distribution of the household population Age distribution of eligible women and interviewed women Age distribution of eligible children and children for whom the mother or caretaker was interviewed Age distribution of children under age 5 by 3 month groups Age and period ratios at boundaries of eligibility Percent of observations with missing information on selected variables Presence of mother inthe household and person interviewed for the under 5 questionnaire School attendance by single year age Sex ratio at birth among children ever born, surviving and dead by age of respondent Distribution of women by time since last birth Scatterplot of weight by height, weight by age and height by age Graph of male and female population by single years of age Population pyramid The results of each of these data quality tables is shown in the appendix of the final report and is also given in the external resources section. The generral rule for presentation of missing data in the final report tabulations is that a column is presented for missing data if the percentage of cases with missing data is 1% or more. Cases with missing data on the background characteristics (e.g. education) are included in the tables, but the missing data rows are suppressed and noted at the bottom of the tables in the report (not in the SPSS output, however).