Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

5. Census data processing

5.1 Introduction

This chapter discusses the processing of all the completed questionnaires, which encompasses everything from the reception of the questionnaires through the creation of an accurate and complete census database. Described below are the steps of questionnaire registration, questionnaire imaging and data capture, editing, error correction, failed edit follow-up, coding, dwelling classification and non-response adjustments and imputation.

Automated processes, implemented for the 2011 Census, had to be monitored to ensure that all Canadian residences were enumerated once and only once. The Master Control System (MCS) was built to control and monitor the process flow. The MCS held a master listing of all the dwellings in Canada where each dwelling was identified with a unique identifier. This system was updated on a daily basis with information about each dwelling's status in the Census process flow (i.e., delivered, received, processed, etc.). Reports were generated and made accessible online to the managers to ensure that census operations were efficient and effective.

5.2 Receipt and registration

Respondents completing paper questionnaires mailed them back to a data processing centre. Canada Post registered their receipt automatically by scanning the barcode on the front of the questionnaire through the transparent portion of the return envelope. The envelopes were then delivered to the Data Operations Centre. Each day, Canada Post would send a file listing the census questionnaires received at each regional processing plant, by date of receipt.

Responses received through the Internet or Census Help Line telephone interview were received directly by the Data Operations Centre and their receipt registered automatically.

The registration of each returned questionnaire was flagged almost in real time on the MCS at Statistics Canada. A list of all of the dwellings for which a questionnaire had not been received was generated by the MCS and then transmitted to field operations for follow‑up. Registration updates were sent to field operations on a daily basis to prevent follow‑up on households which had subsequently completed their questionnaire, either by telephone or through the Internet.

5.3 Imaging and keying from images

In 2011, the census questionnaires imaged were the three questionnaires (2A, 2C, 3A). The image quality has improved relative to 2006 with the replacement of black and white scanners with color scanners. The following steps are part of the imaging process:

  • Document preparation: mailed-back questionnaires were removed from envelopes and foreign objects, such as clips and staples, were detached in preparation for scanning. The questionnaires were batched by questionnaires type. Questionnaires that were in a booklet format were separated into single sheets by cutting off the spine.
  • Scanning: converted the questionnaires to digital images
  • Automated image quality assessment: an automated system analyzes the images for errors or anomalies. Images failing this process were sent to be reviewed by a document analysis operator.
  • Document analysis: presents images containing anomalies to an operator for review. The operator may accept the image as is, send it directly to key entry, or send it to be rescanned.
  • Automated recognition: attempts to automatically recognize hand-written responses and marks on the questionnaire.
  • Key entry: operators enter responses that automated recognition could not determine with sufficient confidence.
  • Check-out: as soon as the questionnaires were processed successfully through all of the above steps, the paper questionnaires were checked out of the system. Check-out is a quality assurance process that ensures the images and captured data are of sufficient quality that the paper questionnaires are no longer required for subsequent processing. Questionnaires that had been flagged as containing errors were pulled at check-out and reprocessed.

5.4 Coverage edits

At this stage, a number of automated edits were performed on respondent data. These edits were designed to detect cases where invalid persons may have been created, either due to respondent error or data capture error. Examples include data erroneously entered in a blank person column, crossed off data that was captured in error, or data provided for the same person more than once, usually due to the receipt of duplicate questionnaires (e.g., a husband or wife completed the Internet version and their spouse filled in the paper questionnaire and mailed it back). The edits were also designed to detect the possible absence of usual residents, when data are not provided for every household member listed at the beginning of the questionnaire.

About 45% of edit failure cases were resolved by the system. The remainder were forwarded to processing clerks for resolution. An interactive system enabled the clerks to examine the captured data and compare them with the image, if available (online questionnaires would not have an image). Edit failures were resolved by deleting invalid or duplicate persons and adding missing ones (i.e., creating blank person records), as necessary, and appropriate or by conducting a follow-up with the respondents.

5.5 Completion edits and failed edit follow-up

Following the coverage edits, another set of automated edits was run on census questionnaires to detect cases where there were either too many missing responses, or there were indications that data may not have been provided for all usual residents in the household. Households failing these edits were sent for follow-up. An interviewer telephoned the respondent to resolve any coverage issues and to fill in the missing information, using a computer-assisted telephone interviewing application (CATI). The data were then sent back to the Data Operations Centre for reintegration into the system for subsequent processing.

5.6 Coding

The census questionnaire contained questions for which answers could be checked off against a list, and there was a space for a write-in if none of the choices in the list applied. These written responses underwent automated coding to assign each one a numerical code, using Statistics Canada reference files, code sets and standard classifications. Reference files for the automated match process were built using actual responses from past censuses, as well as administrative files. Specially-trained coders and subject-matter specialists resolved cases where a code could not be automatically assigned. The following questions required coding: relationship to Person 1, language spoken at home and mother tongue.

Overall 93% of the answers were coded automatically.

5.7 Classification and non-response adjustments for unoccupied and non-response dwellings

The Dwelling Classification Survey (DCS) was used to estimate the error rates in classifying dwellings in the self-enumerated collection areas of the census as occupied or unoccupied in the field. Based on this information, adjustments were made to the census database. The DCS selected a random sample of 1,729 self-enumeration CUs that were revisited in July and August 2011 to reassess the occupancy status as of Census Day for each dwelling for which no response had been received. The DCS estimated that 13.8% of the 1,099,156 dwellings classified as unoccupied were actually occupied and that 30.8% of the 317,976 dwellings with no responses that were classified as occupied or with occupancy status classified as unknown were actually unoccupied. Estimates based on the DCS sample were used to adjust the occupancy status for individual dwellings. This resulted in an increase of 3.3% in the number of occupied dwellings, and a decrease of 5.0% in the number of unoccupied dwellings at the Canada level.

After this adjustment of the occupancy status by the DCS, occupied dwellings with total non-response had the number of usual residents (if not known) and all the responses to the census questions imputed by borrowing the unimputed responses from another household within the same CU. This process, called whole household imputation (WHI), imputed 99% of the total non-response households. Utilizing a single donor under WHI was more efficient computationally and was less likely to produce implausible results than using several donors as part of the main E & I process. Nevertheless, the other 1% of the total non-response households where no donor household was found under the WHI process was imputed as part of the main edit and imputation process.

More details on the DCS can be found in Section 6.

5.8 Edit and imputation

The data collected in any survey or census contains some omissions or inconsistencies or invalid responses. For example, a respondent might be unwilling to answer a question, fail to remember the right answer, or misunderstand the question. Other possible mistakes such as incorrect coding can also occur.

The final clean-up of data, done in the edit and imputation process, was for the most part fully automated. Two types of imputation were applied. The first type, called 'deterministic imputation,' involved assigning specific values under certain conditions when the resolution of the problem is clear and unambiguous. Detailed edit rules were applied to identify these conditions, and then the variables involved in the rules would be assigned a pre-determined value. The second type of imputation, called 'minimum-change nearest-neighbour donor imputation,' applied a series of detailed edit rules that identified any missing or inconsistent responses. When a record with missing or inconsistent responses is identified, another record with most characteristics in common with the record in error was selected. Data from this donor record were borrowed and used to make the minimum number of changes to the variables in order to resolve all missing or inconsistent responses. The Canadian Census Edit and Imputation System (CANCEIS) was the automated system used for nearly all deterministic and minimum-change nearest-neighbour donor imputation in the 2011 Census.

Date modified: