Thursday, July 17, 2008

Starplot


A Starplot allows the user to easily visualize and compare multiple variables across observations. Each observation is shown as a star shape, with each ray representing a variable. The variables are plotted on the radii, which represents different characteristics. The length of the ray is related to the size of the variable. The resulting shape of the starplot is then compared to the shape of other starplots formed from separate observations, which are easiest to compare when all of the variables have their scales aligned in the same direction, so that increasing and decreasing values represent the same measure. This starplot shows a number of variables relating to automobiles. Each starplot is a different car model. Each ray is a variable such as price, mpg, headroom, trunk space, etc.

Tuesday, July 15, 2008

Correlation Matrix


A Correlation Matrix is a statistical visualization that organizes pairs of variables by correlation. Correlation is the strength of the relationship between the two variables. When it is necessary to examine the correlation between more than one pair of variables in a single analysis, a correlation matrix is often utilized. The correlation matrix shows all possible paired correlations symmetrically. This correlation matrix represents the relationships between various investment markets to assist investors diversify by highlighting highly correlated funds.

Saturday, July 12, 2008

Similarity Matrix


A Similarity Matrix shows how similar two variables are along a scale. They are often used in genetic visualizations. The matrix often is a square made up of a series of smaller squares, with the color of the square indicating the level of similarity between the data points. The diagram shown above is a differential expression signature similarity matrix from a study on gene expression profiling of long-lived dwarf mice that compares longevity-associated genes and relationships with diet, gender and aging. Dark red colors indicate high similarity, pairs with non-significant similarity have no coloring.

Monday, July 7, 2008

Stem and Leaf Plot


A Stem and Leaf Plot is designed to show the shape and distribution of data. It is similar to a histogram placed on its side but in addition to showing the frequency of the intervals with the stem, it gives the individual values within the interval. The leaf of each data value in a stem and leaf plot is determined by the digit that is furthest to the right. The digits to the left of this leaf are the stem. This simple stem and leaf plot shows distribution of weight and was generated by the statistical software program Minitab.

Sunday, July 6, 2008

Box Plot


A Box Plot, or Box and Whisker Diagram, is a summarizing visualization used to show the median, upper quartile, lower quartile, smallest observation and largest observation of statistical data, and sometimes indicates outliers with a point. The box plot was created by John Tukey. Box plots are frequently used to compare multiple data sets, and are oriented either horizontally or vertically. This vertical Box Plot compares employee salary across grades by gender, the blue symbols representing men and the red women in 2004. This diagram allows the user to see that the median salary is lower for women at all grade levels, but widens significantly as higher levels are achieved.

Saturday, July 5, 2008

Histogram


A Histogram is a type of graphic visualization that uses vertical bars to represent tabulated frequencies. The bars which represent categories are adjacent and the area of the bar represents the value of the category, as opposed to the height of the bar in a true bar graph. The area of each bar is proportional to the frequency of the corresponding interval. When the class intervals are of equal size the height of each bar is proportional to the frequency. The shape of the graph describes how the categories are distributed about the mean and they allow the graph user to analyze large datasets on a single graph by showing the primary, secondary and tertiary peaks along with a visual representation of the peaks’ statistical significance. It is often necessary to condense the data into ranges or classes defined by intervals, as in the example above. This histogram shows the frequency distribution of the liveweights of 150 chickens selected randomly from a market.

Friday, July 4, 2008

Parallel Coordinate Graph



In a Parallel Coordinate Graph each of the variables is plotted on a vertical axis, and a data element is connected by a series of points on each axis. Parallel coordinate plots are a kind of visualization that is used to analyze relationships and correlations between multiple variables. The parallel coordinate graph above plots genes that fit a model of the heat shock gene from a study on a biological microarray data set of gene expression levels. The lines are the ribosomal protein genes plotted on vertical axes that record the log ratio of expression for each experiment.