Correspondence Analysis

Page 1 on 8 | Table of contents | Last | Next

.

1. Introduction
.

This paper is an introduction to correspondence analysis, a statistical method allowing to analyze and describe graphically and synthetically big contingency tables, that is tables in which you find at the intersection of a row and a column the number of individuals who share the characteristic of the row and that of the column.

.

A.

Data Analysis

.

In the statistical courses taught in economics cursus, we usually try to estimate parameters for an a priori model : that is the aim of econometrics. It is always a deductive approach, where we first establish a model and then use the data to estimate the "true" parameters of this model, and eventually quantify the adequation of our model with the data.The institutionnalist school of statistics criticize this approach of reality through the narrow window of the R2.

Another approach is possible, neglected by economist but not everywhere. Here we would go from the data itself (with a little a priori thinking to decide which one are worth being collected) and then try by a progressive abstraction to see the patterns in the data, to find out how series of figures organize, which variables or group of variables are correlated. This is the inductive approach. In statistics, this branch has seen a lot of developments recently (thanks to the computer), using mathematical tools more complicated that means, variance and empirical correlation coefficients that our descriptive statistics use. This new branch took the name of Exploratory Data Analysis, or Data Analysis.

One could define data analysis as a group of statistical methods designed to describe synthetically huge data files. Theoreticians of data analysis are often very critic toward econometrics : "Under the name of mathematical statistics, some authors have built a pompous science, rich in hypothesis that are never satisfied in practice" (Benzecri: 1980, p.3)

.

.

B.

Factor Analysis
.

A flourishing branch of the data analysis family is constituted by the different methods of factor analysis. What is factor analysis ?

From a table of n observations on p variables, describing a p-dimensional cloud of points (if p<n), factor analysis will determine the first k axis of an orthogonal system of axis that describe the most variance from the cloud. The underlying structure of the data so revealed will maybe allow the researcher to interpret it intuitively, like : data are separated by a first dimension (axis) that may be interpreted as the level of income, or culture, or rurality/urbanity etc.... depending on the data we are looking at. The fundamental idea is to eliminate the redundancy in the original data thanks to a reduced number of variables (the factors) which are a combination of the original variables. This is a classic inductive method, used as an exploratory tool to unearth fundamental empirical regularities of a data file. This is a very powerful describing machine. Correspondence analysis is a factor analysis method that uses categorical variables (that is, non continuous or discretized ones).

.

Next page : An Overview of the Method


Correspondence Analysis
François Micheloud's Homepage