Heike Hofmann

Tenure case

Selected Publications

Books

  • Graphics of Large Datasets, with Antony Unwin and Martin Theus, Springer, New York, 2006.
  • Graphical Tools for the Exploration of Multivariate Categorical Data, ISBN 3831116601, 2001.
    in Google Books

Creative Research

Software

  • MANET. MANET is for exploring data, whether raw data, transformed data or model residuals. MANET provides a range of graphical tools specially designed for studying multivariate features. Anyone involved in analysing data will find MANET useful for gaining insights into the structure and relationships of their data sets.
  • GGobi. GGobi is software for data exploration using highly interactive statistical graphics. My main contribution is a first implementation of area charts, such as barcharts and histograms.

Prizes

Selected Talks

  • Presidential Address WNAR 2003: Graphics - an Ace in the Sleeve of a Statistician (3.3 MB PDF)
    Statistical Graphics have a long tradition, dating back to the late 1700s  when William Playfair primped up his Commercial and Political Atlas with plots.  Success stories attest to the fact thatlives have literally been saved by statistical graphics. When for example cholera struck London the source was found using graphics.
    Graphical displays give the data analyst a unique framework for exploration, especially as we understand more about the possibilities and limits of human visual perception.  A statistical framework underlying graphics helps determine whether what we see is actually there probabilistically.
    Technically, the capabilities of computing systems are very much improved from twenty years ago. The approaches are very different too, but the demands grow at the same rate at least, if not faster.Challenges for modern visualization are ever increasing data sets of growing complexity. New sources of data emerge such as in the developing genomics and proteomics communities.  Graphical displays provide a vehicle for matching experts' knowledge with statistical tools, and communicating to a wider audience.
    Producing good graphics is an art - as a good magician's trick. Unlike the magician a good statistician does not want to produce an illusion but reveal the hidden qualities of the data.  We will show numerous famous statistical graphics examples - from the early beginnings of statistical graphics up to modern visualization.
  • DIV 2006: Mosaicplots and their Variations (19.3 MB PDF)
    Mosaicplots have been introduced by Hartigan and Kleiner (1981) as a way of visualizing contingency tables. Named for their resemblance to the art form, mosaicplots represent cells of a contingency table by a composition of rectangles. Both size and position of these rectangles are meaningful for the interpretation of mosaicplots, making them one of the more advanced plots around. With a little practice they become an invaluable tool in the representation and exploration of multivariate categorical data. We will be discussing ways of constructing mosaicplots Hofmann (2000). Mosaicplots have the huge advantage of preserving all information of multivariate contingency tables while presenting an overview at the same time. As mosaicplots follow the hierarchy of their counterpart contingency tables exactly, the order of variables in the tables is crucial. Finding the “right” or at least “good” ordering is commonly found to be one of the main difficulties first time users experience with mosaicplots. We will discuss effects of changes in the order and give recommendations how to obtain “good plots”. Modelling of multivariate categorical models is usually done with loglinear models. It can be shown (Hofmann, 2001; Theus and Lauer, 1999; Friendly, 1992) that mosaicplots have excellent mathe matical properties, which allow visual assessments of the strength of interaction e tools for checking residuals and modelling assumptions. We will discuss relationships between mo saicplots and loglinear models. Close relatives of the mosaicplot such as fluctuation diagrams and double decker plots (Hofmann et al., 2000) have been found very useful in practice. We are going to have a look into those and other important variations of mosaicplots. All of these variations are essentially simplifications of the default construction of mosaics. While losing some information these plots put additional emphasis on a specific aspect of the data. From a visualizer’s point of view, both treemaps, introduced by Shneiderman (1992), and trellis plots (Becker et al., 1994) are gener alizations of two di same structure, trellis plots are more flexible by not necessarily displaying numbers as rectangles. Treemaps on the other hand do show the data by rectangles, but are able to deal with more general partitions than mosaicplots. These generalizations do not come without losses, though. We will compare mosaicplots to these other forms of displays in section 4 and comment on strengths and weaknesses of each of them. Existing implementations of mosaicplots are becoming more frequent. An implementation in R was done by Emerson (1998). Mosaicplots in JMP (John Sall, 1989) have some limited interactive features. Fully interactive mosaicplots are implemented e.g. in MANET (Unwin, Hawkins, Hofmann, and Siegl, 1997), Mondrian (Theus, 2002) and KLIMT (Urbanek, 2002).

Sample lecture notes