- Hofmann H.: Multivariate Categorical
Data: Mosaic Plots. (3.5Mb) in Graphics of Large Datasets, Springer,
New York, 2006.

- Hofmann H.: Interactive biplots
for visual modelling. (3.7Mb) in Proceedings in Computational
Statistics 16th Symposium held in Prague, Czech Republic, Antoch,
Jaromir (Ed.), pp. 235–250, 2004.

- Ahn J.S., Cook D., Hofmann H.: A Projection
Pursuit Method on the multidimensional squared Contingency Table
(400k) Computational Statistics, Vol 18 (4), pp. 605–626,
2003.

This paper is originally based on a collaboration of Di and Ju Sun - I only got involved right after I joined ISU in 2002. I helped re-write section 2 and fixed some of the proofs.

- Hofmann H.: Constructing and
reading mosaicplots Computational Statistics and Data Analysis
(250k), Vol 43, No. 4, pp. 565-580, 2003.

- Hofmann H.: Generalised
Odds Ratios for Visual Modelling. (1.6Mb) In Journal of Computational
and Graphical Statistics, 10, pp 628-640, 2002.

- Unwin A., Hofmann H., Wilhelm A.: Direct
Manipulation Graphics for Data Mining. (380k) International
Journal of Image and Graphics, Vol. 2, No. 1, pp. 49-65, 2002.

The first half of the paper is part of a `real' collaboration - those ideas stem from intensive discussions how best to visualize association rules. It's therefore hard to tell afterwards who first came up with which idea;

sections 3.2.2 - end directly comes from my own research.

- Graphics of Large Datasets, with Antony Unwin and Martin Theus,
Springer, New York, 2006.

- Graphical Tools for the Exploration of Multivariate Categorical
Data, ISBN 3831116601, 2001.

in Google Books

- MANET. MANET is for exploring data, whether
raw data, transformed data or model residuals. MANET provides a
range of graphical tools specially designed for studying multivariate
features. Anyone involved in analysing data will find MANET useful
for gaining insights into the structure and relationships of their
data sets.

- GGobi. GGobi is software for data exploration using highly interactive statistical graphics. My main contribution is a first implementation of area charts, such as barcharts and histograms.

- Info Vis Contest 2005:"Boom and Bust
of Technology Companies at the Turn of the 21st Century"

with Hadley Wickham, Dianne Cook, Junjie Sun, Christian Röttger; our submission received a first prize and is now part of the Information Visualization Benchmarks Repository.

- ASA Data Expo 2006."Glaciers Melt
as Mountains Warm"

with Hadley Wickham, Jonathan Hobbs, Dianne Cook; our poster won the second prize.

- Presidential Address WNAR 2003: Graphics
- an Ace in the Sleeve of a Statistician (3.3 MB PDF)

Statistical Graphics have a long tradition, dating back to the late 1700s when William Playfair primped up his Commercial and Political Atlas with plots. Success stories attest to the fact thatlives have literally been saved by statistical graphics. When for example cholera struck London the source was found using graphics.

Graphical displays give the data analyst a unique framework for exploration, especially as we understand more about the possibilities and limits of human visual perception. A statistical framework underlying graphics helps determine whether what we see is actually there probabilistically.

Technically, the capabilities of computing systems are very much improved from twenty years ago. The approaches are very different too, but the demands grow at the same rate at least, if not faster.Challenges for modern visualization are ever increasing data sets of growing complexity. New sources of data emerge such as in the developing genomics and proteomics communities. Graphical displays provide a vehicle for matching experts' knowledge with statistical tools, and communicating to a wider audience.

Producing good graphics is an art - as a good magician's trick. Unlike the magician a good statistician does not want to produce an illusion but reveal the hidden qualities of the data. We will show numerous famous statistical graphics examples - from the early beginnings of statistical graphics up to modern visualization.

- DIV 2006: Mosaicplots and their Variations
(19.3 MB PDF)

Mosaicplots have been introduced by Hartigan and Kleiner (1981) as a way of visualizing contingency tables. Named for their resemblance to the art form, mosaicplots represent cells of a contingency table by a composition of rectangles. Both size and position of these rectangles are meaningful for the interpretation of mosaicplots, making them one of the more advanced plots around. With a little practice they become an invaluable tool in the representation and exploration of multivariate categorical data. We will be discussing ways of constructing mosaicplots Hofmann (2000). Mosaicplots have the huge advantage of preserving all information of multivariate contingency tables while presenting an overview at the same time. As mosaicplots follow the hierarchy of their counterpart contingency tables exactly, the order of variables in the tables is crucial. Finding the “right” or at least “good” ordering is commonly found to be one of the main difficulties first time users experience with mosaicplots. We will discuss effects of changes in the order and give recommendations how to obtain “good plots”. Modelling of multivariate categorical models is usually done with loglinear models. It can be shown (Hofmann, 2001; Theus and Lauer, 1999; Friendly, 1992) that mosaicplots have excellent mathe matical properties, which allow visual assessments of the strength of interaction e tools for checking residuals and modelling assumptions. We will discuss relationships between mo saicplots and loglinear models. Close relatives of the mosaicplot such as fluctuation diagrams and double decker plots (Hofmann et al., 2000) have been found very useful in practice. We are going to have a look into those and other important variations of mosaicplots. All of these variations are essentially simplifications of the default construction of mosaics. While losing some information these plots put additional emphasis on a specific aspect of the data. From a visualizer’s point of view, both treemaps, introduced by Shneiderman (1992), and trellis plots (Becker et al., 1994) are gener alizations of two di same structure, trellis plots are more flexible by not necessarily displaying numbers as rectangles. Treemaps on the other hand do show the data by rectangles, but are able to deal with more general partitions than mosaicplots. These generalizations do not come without losses, though. We will compare mosaicplots to these other forms of displays in section 4 and comment on strengths and weaknesses of each of them. Existing implementations of mosaicplots are becoming more frequent. An implementation in R was done by Emerson (1998). Mosaicplots in JMP (John Sall, 1989) have some limited interactive features. Fully interactive mosaicplots are implemented e.g. in MANET (Unwin, Hawkins, Hofmann, and Siegl, 1997), Mondrian (Theus, 2002) and KLIMT (Urbanek, 2002).

- Stat 330: Probability and Statistics for Computer Science

lecture notes (11.1MB PDF), syllabus, and sample homework assignments (PDF)

- Stat 511: Statistical Methods

lecture notes (1MB PDF), syllabus, and sample homework assignments (PDF)

This course was held at a distance, sample tapes are:- Nested models in R, Split-Plot Designs:Real Media Stream Windows Media Stream
- Introduction to the Bootstrap: Real Media Stream Windows Media Stream