Correspondence Analysis

Page 7 on 8   | Table of contents | Last | Next

.

7. Extensions and Limitations

.

A.

Supplementary points

.

You could wish to add some points in a graph which you think may help you in the interpretation. But that's not to say that you would like those points to enter in the factorial axes construction, which would modify our analysis with elements foreign to our specific work. In this case, you can add points who have a profile (that is, they belong to the table, in a row or a column) but with a zero weight (they don't exist for the axes). Such points are known as supplementary points. They are projected after the construction of the factorial axes in the new axes set. Their contribution to inertia is thus nil, though good softwares will give you the same indications as for the active points, but those parameters are of course to be used "as if my supplementary points were active, they would have such and such CTR, COR, MAS etc..."

These points are not a curiosity, and you will encounter them very frequently in practice. We may give three rules that may lead you to make a point supplementary :

..

(1) A point inherently different from the rest, which do not interest us as such, but could help us interpret the others.

.

(2) An outlier of small weight whose excentrated position can change your graph's shape sufficiently to hide the more interesting contrasts between more interesting points, particularly more important ones (i.e those with a greater MAS)

 

(3) You may want to subdivide a category (for example we might have wanted to represent the Universite point and show at the same time the difference between men and women in University) but we would not want to count individuals several times ! So in this case we make supplementary variables out of the partitions.

 

 

I could have added some supplementary points like the revenue classes or the professions ( in rows) to help interpretation of the second axis.

 

.

B.

Multiple Correspondence Analysis

.

The method presented here can work on only two categorical variables, I and J. However, there is a multivariate version of this method, less pure, but widely used in analyzing big questionnaires, where the synthetic description made possible by the computer is unavoidable. I won't explain it here, but we can say two things : (1) the general principles for interpretation are the same as those explained here (2) the percentage of inertia explained by the first axis is very low compared to that of the simple analysis (about 3 to 5%), but this measures gives here a very pessimistic idea of the share of inertia described by the graph.

The mean used as a very off-putting name but is in fact quite simple, is known as Burt's table or the complete disjunctive encoding.

.

C.

Extensions

.

Correspondence analysis is often applied to the analysis of "crosstabs" (tables that cross two categorical variables) which are not true contingency tables. In fact, one can with some sacrifice in terms of mathematical rigor, use tables whose cells contain something else than statistical individuals. If you respect the criterion of homogeneity (all cell contents are measured in the same unity) and exhaustively (each case is classifiable in one and only one category of each of the two variables). Statisticians, with their long studies of mathematics, will grind their teeth here to remind us this when we "abuse" the method so. But, after all, if this method so used helps us in our research, why not ?

.

.

D.

Examples

.

As an illustration of the wonderful applications of this method, I give you a short list of researches have been led using correspondence analysis.

.

  • A table indicating the spending of 37 social categories on 126 items of consuming.
  • A table whose columns gave the 210 possible answers to a questionnaire given to Iranian peasant, with 240 individuals in rows.
  • Study of a table giving the UNO vote of the delegates of 127 countries in 1967 on 13 important votes. The voting were coded as 1 for yes, 0 for no and 1/2 for abstentions. The first factor opposed clearly a group centered around the US to a very dense cloud around the USSR. Other factors could be interpreted as isolation factor, abstention factor, etc...
  • A table giving the value of Brazil machine imports for 1971, divided in 128 machine categories in rows and 16 exporter countries in columns.
  • A typology study by paleontologists on a sample of 349 equidae skulls (horses, zebras, etc....) to try to classify them. The table gave for each skull ( in rows) the results of 25 skullometric measures.
  • More recently, the New-Yorker asked a linguist to study an anonymous book about he Clinton Presidential campaign. The paper gave the researcher 15 possible authors of the book, as well as a sample of text written by each. I think that the author analyzed a table with the 15 texts plus the anonymous book in rows, with each cell counting the number of times that word j was found in text i. The true author was found.

.

You see that all kind of data is usable for correspondence analysis, although its favorite data are the "true" contingency tables and, for multiple correspondence analysis, the questionnaires analysis.

To end, a remark that is worth for most exploratory multivariate statistical method : the number of decisions that the analyst must make at the different stages of the analysis, and the cumulative effects of those decisions, will lead to a number of different outcomes. Though we can give general indications to guide the researcher at each step, a good dose of subjectivity will always be part of the final result.

..

 

E.

Softwares

.

Correspondence analysis is made solely with computers. Several softwares exist, like SPSS ( in the data reduction module) and other softwares for the "soft" social sciences (honni soit qui mal y pense!).Please note that SPSS could not, at the time I used it, use supplementary points, and that it gives small and not easily resizable graphs. The solution is supposed to be : export the factor scores on a graphical software, and make a nice graph from there, but personally I did neither succeed in making the points labels under the points, nor even two series of points with different colors. I finally resorted to use the .. Xerox and then .. oh well, in this digital age I won't say more. Suffice is to say that software designer still have a lot to do. A last hint on using SPSS . always tick constant aspect ratio otherwise the program will resize your graph to whatever size the window is.

Interested readers can find a complete discussion (although already dated) in (Greenacre 1993).

.

.

Next page : Bibliography


Correspondence Analysis
François Micheloud's Homepage