Graphics

Introduction

This section shows how to make R graphics from rpy2, using some of the different graphics systems available to R users.

The purpose of this section is to get users going, and be able to figure out by reading the R documentation how to perform the same plot in rpy2.

Graphical devices

With R, all graphics are plotted into a so-called graphical device. Graphical devices can be interactive, like for example X11, or non-interactive, like png or pdf. Non-interactive devices appear to be files.

By default an interactive R session will open an interactive device when needing one. If a non-interactive graphical device is needed, one will have to specify it.

Note

Do not forget to close a non-interactive device when done. This can be required to flush pending data from the buffer.

The module grdevices aims at representing the R package grDevices*. Example with the R functions png and dev.off:

from rpy2.robjects.packages import importr
grdevices = importr('grDevices')

grdevices.png(file="path/to/file.png", width=512, height=512)
# plotting code here
grdevices.dev_off()

The package contains an Environment grdevices_env that can be used to access an object known to belong to that R packages, e.g.:

>>> palette = grdevices.palette()
>>> print(palette)
[1] "black"   "red"     "green3"  "blue"    "cyan"    "magenta" "yellow"
[8] "gray"

Getting ready

To run examples in this section we first import rpy2.robjects and define few helper functions.

from rpy2 import robjects
from rpy2.robjects import Formula
from rpy2.robjects.vectors import IntVector, FloatVector
from rpy2.robjects.lib import grid
from rpy2.robjects.packages import importr

# The R 'print' function
rprint = robjects.globalenv.get("print")
stats = importr('stats')
grdevices = importr('grDevices')
base = importr('base')

Package lattice

Introduction

Importing the package lattice is done the same as it is done for other R packages.

lattice = importr('lattice')

Scatter plot

We use the dataset mtcars, and will use the lattice function xyplot to make scatter plots.

xyplot = lattice.xyplot

Lattice is working with formulae (see Formulae), therefore we build one and store values in its environment. Making a plot is then a matter of calling the function xyplot with the formula as as an argument.

datasets = importr('datasets')
mtcars = datasets.mtcars
formula = Formula('mpg ~ wt')
formula.getenvironment()['mpg'] = mtcars.rx2('mpg')
formula.getenvironment()['wt'] = mtcars.rx2('wt')

p = lattice.xyplot(formula)
rprint(p)
_images/graphics_lattice_xyplot_13.png

The display of group information can be done simply by using the named parameter groups. This will indicate the different groups by color-coding.

p = lattice.xyplot(formula, groups = mtcars.rx2('cyl'))
rprint(p)
_images/graphics_lattice_xyplot_23.png

An alternative to color-coding is to have points is different panels. In lattice, this done by specifying it in the formula.

formula = Formula('mpg ~ wt | cyl')
formula.getenvironment()['mpg'] = mtcars.rx2('mpg')
formula.getenvironment()['wt'] = mtcars.rx2('wt')
formula.getenvironment()['cyl'] = mtcars.rx2('cyl')

p = lattice.xyplot(formula, layout = IntVector((3, 1)))
rprint(p)
_images/graphics_lattice_xyplot_33.png

Box plot

p = lattice.bwplot(Formula('mpg ~ factor(cyl) | gear'),
                   data = mtcars, fill = 'grey')
rprint(p, nrow=1)
_images/graphics_lattice_bwplot_13.png

Other plots

The R package lattice contains a number of other plots, which unfortunately cannot all be detailled here.

volcano = datasets.volcano
p = lattice.wireframe(volcano, shade = True,
                      zlab = "",
                      aspect = FloatVector((61.0/87, 0.4)),
                      light_source = IntVector((10,0,10)))
rprint(p)
_images/graphics_lattice_wireframe_13.png

Splitting the information into different panels can also be specified in the formula. Here we show an artifial example where the split is made according to the values plotted on the Z axis.

reshape = importr('reshape')
dataf = reshape.melt(volcano)
dataf = dataf.cbind(ct = lattice.equal_count(dataf.rx2("value"), number=3, overlap=1/4))
p = lattice.wireframe(Formula('value ~ X1 * X2 | ct'), data = dataf, shade = True,
                      aspect = FloatVector((61.0/87, 0.4)),
                      light_source = IntVector((10,0,10)))
rprint(p, nrow = 1)
_images/graphics_lattice_wireframe_23.png

Package ggplot2

Introduction

The R package ggplot2 implements the Grammar of Graphics. While more documentation on the package and its usage with R can be found on the ggplot2 website, this section will introduce the basic concepts required to build plots. Obviously, the R package ggplot2 is expected to be installed in the R used from rpy2.

The package is using the grid lower-level plotting infrastructure, that can be accessed through the module rpy2.robjects.lib.grid. Whenever separate plots on the same device, or arbitrary graphical elements overlaid, or significant plot customization, or editing are needed, some knowledge of grid will be required.

Here again, having data in a DataFrame is expected (see DataFrame for more information on such objects).

import math, datetime
import rpy2.robjects.lib.ggplot2 as ggplot2
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
base = importr('base')

datasets = importr('datasets')
mtcars = datasets.mtcars
rnorm = stats.rnorm
dataf_rnorm = robjects.DataFrame({'value': rnorm(300, mean=0) + rnorm(100, mean=3),
                                  'other_value': rnorm(300, mean=0) + rnorm(100, mean=3),
                                  'mean': IntVector([0, ]*300 + [3, ] * 100)})

Plot

gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point()

pp.plot()
_images/graphics_ggplot2mtcars3.png

Aesthethics mapping

An important concept for the grammar of graphics is the mapping of variables, or columns in a data frame, to graphical representations.

Like it was shown for lattice, a third variable can be represented on the same plot using color encoding, and this is now done by specifying that as a mapping (the parameter col when calling the constructor for the AesString).

gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg', col='factor(cyl)') + \
     ggplot2.geom_point()

pp.plot()
_images/graphics_ggplot2mtcarscolcyl3.png

The size of the graphical symbols plotted (here circular dots) can also be mapped to a variable:

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg', size='factor(carb)',
                 col='factor(cyl)', shape='factor(gear)') + \
     ggplot2.geom_point()

pp.plot()
_images/graphics_ggplot2aescolsize3.png

Geometry

The geometry is how the data are represented. So far we used a scatter plot of points, but there are other ways to represent our data.

Looking at the distribution of univariate data can be achieved with an histogram:

gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='wt') + \
     ggplot2.geom_histogram()

#pp.plot()
_images/graphics_ggplot2geomhistogram3.png
gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='wt', fill='factor(cyl)') + \
     ggplot2.geom_histogram()

pp.plot()
_images/graphics_ggplot2geomhistogramfillcyl3.png

Barplot-based representations of several densities on the same figure can often be lacking clarity and line-based representation, either geom_freqpoly() (representation of the frequency as broken lines) or geom_density() (plot a density estimate), can be in better.

pp = gp + \
     ggplot2.aes_string(x='value', fill='factor(mean)') + \
     ggplot2.geom_density(alpha = 0.5)
_images/graphics_ggplot2geomfreqpolyfillcyl3.png

Whenever a large number of points are present, it can become interesting to represent the density of “dots” on the scatterplot.

With 2D bins:

gp = ggplot2.ggplot(dataf_rnorm)

pp = gp + \
     ggplot2.aes_string(x='value', y='other_value') + \
     ggplot2.geom_bin2d() + \
     ggplot2.opts(title =  'geom_bin2d')
pp.plot(vp = vp)

With a kernel density estimate:

gp = ggplot2.ggplot(dataf_rnorm)

pp = gp + \
     ggplot2.aes_string(x='value', y='other_value') + \
     ggplot2.geom_density2d() + \
     ggplot2.opts(title =  'geom_density2d')
pp.plot(vp = vp)

With hexagonal bins:

gp = ggplot2.ggplot(dataf_rnorm)

pp = gp + \
     ggplot2.aes_string(x='value', y='other_value') + \
     ggplot2.geom_hex() + \
     ggplot2.opts(title =  'geom_hex')
pp.plot(vp = vp)
_images/graphics_ggplot2geombin2d3.png

Box plot:

gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='factor(cyl)', y='mpg') + \
     ggplot2.geom_boxplot()

pp.plot()
_images/graphics_ggplot2geomboxplot3.png

Boxplots can be used to represent a summary of the data with an emphasis on location and spread.

gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='factor(cyl)', y='mpg', fill='factor(cyl)') + \
     ggplot2.geom_boxplot()

pp.plot()
_images/graphics_ggplot2aescolboxplot3.png

Models fitted to the data are also easy to add to a plot:

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.stat_smooth(method = 'lm')
pp.plot()
_images/graphics_ggplot2addsmooth3.png

The method can be one of {glm, gam, loess, rlm}, and formula can be specified to declared the fitting (see example below).

_images/graphics_ggplot2addsmoothmethods3.png

The constructor for GeomSmooth also accepts a parameter groupr that indicates if the fit should be done according to groups.

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.geom_smooth(ggplot2.aes_string(group = 'cyl'),
                         method = 'lm')
pp.plot()
_images/graphics_ggplot2smoothbycyl3.png

Encoding the information in the column cyl is again only a matter of specifying it in the AesString mapping.

pp = ggplot2.ggplot(mtcars) + \
     ggplot2.aes_string(x='wt', y='mpg', col='factor(cyl)') + \
     ggplot2.geom_point() + \
     ggplot2.geom_smooth(ggplot2.aes_string(group = 'cyl'),
                         method = 'lm')
pp.plot()
_images/graphics_ggplot2_smoothbycylwithcolours.png

As can already be observed in the examples with GeomSmooth, several geometry objects can be added on the top of each other in order to create the final plot. For example, a marginal rug can be added to the axis of a regular scatterplot:

gp = ggplot2.ggplot(mtcars)

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.geom_rug()

pp.plot()
_images/graphics_ggplot2geompointandrug3.png
gp = ggplot2.ggplot(dataf_rnorm)

pp = gp + \
     ggplot2.aes_string(x='value', y='other_value') + \
     ggplot2.geom_point(alpha = 0.3) + \
     ggplot2.geom_density2d(ggplot2.aes_string(col = '..level..')) + \
     ggplot2.opts(title =  'point + density')
pp.plot()
_images/graphics_ggplot2geompointdensity2d3.png

Polygons can be used for maps, as shown in the relatively artificial example below:

map = importr('maps')
fr = ggplot2.map_data('france')

# add a column indicating which region names have an "o".
fr = fr.cbind(fr, has_o = base.grepl('o', fr.rx2("region"),
                                     ignore_case = True))
p = ggplot2.ggplot(fr) + \
    ggplot2.geom_polygon(ggplot2.aes(x = 'long', y = 'lat',
                                     group = 'group', fill = 'has_o'),
                         col="black")
p.plot()
_images/graphics_ggplot2map_polygon3.png

Facets

Splitting the data into panels, in a similar fashion to what we did with lattice, is now a matter of adding facets. A central concept to ggplot2 is that plot are made of added graphical elements, and adding specifications such as “I want my data to be split in panel” is then a matter of adding that information to an existing plot.

For example, splitting the plots on the data in column cyl is still simply done by adding a FacetGrid.

pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.facet_grid(ro.Formula('. ~ cyl')) + \
     ggplot2.geom_smooth(ggplot2.aes_string(group="cyl"),
                         method = "lm",
                            data = mtcars)

pp.plot()
_images/graphics_ggplot2smoothbycylfacetcyl3.png

The way data are represented (the geometry in the terminology used the grammar of graphics) are still specified the usual way.

pp = gp + \
     ggplot2.aes_string(x='wt') + \
     ggplot2.geom_histogram(binwidth=2) + \
     ggplot2.facet_grid(ro.Formula('. ~ cyl'))

pp.plot()
_images/graphics_ggplot2histogramfacetcyl3.png
pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.geom_abline(intercept = 30)
pp.plot()
_images/graphics_ggplot2_qplot_43.png
pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.geom_abline(intercept = 30) + \
     ggplot2.geom_abline(intercept = 15)
pp.plot()
_images/graphics_ggplot2_qplot_53.png
pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.stat_smooth(method = 'lm', fill = 'blue',
                         color = 'red', size = 3)
pp.plot()
_images/graphics_ggplot2smoothblue3.png
pp = gp + \
     ggplot2.aes_string(x='wt', y='mpg') + \
     ggplot2.geom_point() + \
     ggplot2.stat_smooth(method = 'lm', fill = 'blue',
                         color = 'red', size = 3)
pp.plot()
_images/graphics_ggplot2smoothblue3.png

Complex example

Section author: John Owens

This example uses ggplot2 to plot 4 datasets (all combinations of 2 colors and 2 linetypes/marker types). It contains examples of mapping attributes to colors, line types, and labels; using python to manipulate data in an R DataFrame; the melt operator (from the reshape package) to transform the data from a “vertical” representation to a “horizontal” representation suitable for ggplot2; and numerous customizations of the output graph, including axis labels, colors, line types, line and marker sizes, and log-scale major and minor lines.

# set up data structures for mapping attributes to colors, line types, and 
#   labels
colormap_raw = [['red', '#ff0000'],
                ['green', '#76b900']]
colormap_labels = [['red', 'RED'],
                   ['green', 'GREEN']]
colormap = ro.StrVector([elt[1] for elt in colormap_raw])
colormap.names = ro.StrVector([elt[0] for elt in colormap_raw])

linemap_raw = [['Perf2', 'dashed'],
               ['Perf1', 'solid']]
linemap = ro.StrVector([elt[1] for elt in linemap_raw])
linemap.names = ro.StrVector([elt[0] for elt in linemap_raw])

# input data, which may normally come from an external csv file
# note the use of base.I which makes R interpret these as explicit data
#   rather than store them as factors; since we're manipulating them 
#   directly, we need them stored explicitly
input_dataframes = { 
   'red' : ro.DataFrame({ 'Date' : base.as_Date(ro.StrVector(("2008-06-25", "2009-09-23"))),
                          'Perf1' : ro.FloatVector((1090,2500)),
                          'Perf2' : ro.FloatVector((215,500))
                          }),
   'green' : ro.DataFrame({ 'Date' : base.as_Date(ro.StrVector(("2008-06-15", 
                                                                "2010-04-15"))),
                            'Perf1' : ro.FloatVector((922,1030)),
                            'Perf2' : ro.FloatVector((78,515))
                            })
   }

# create empty data frame df ...
df = ro.DataFrame({})
for color in ['green', 'red']:
  # ... then for each input data frame, read that data frame (perhaps
  # from a file), append column of color names, then append to df
  df = df.rbind(input_dataframes[color].
                cbind(ro.DataFrame({'color' : 
                                    base.I(ro.StrVector([color]))})))

# now do some data processing

# read out 'Date' column, convert using python dateutil parser, put
#   back into 'Date' column
# example of taking data in R dataframe, changing it in python, then
#   putting it back

#df[tuple(df.colnames).index('Date')] = \
#    base.as_Date(df.rx2('Date'))

# what is the range of Perf1 and Perf2? we use this for custom log plot lines
perfs = df[tuple(df.colnames).index('Perf1')] + \
        df[tuple(df.colnames).index('Perf2')]
gflops_range = [ round(math.log10(min(perfs))), 
                 round(math.log10(max(perfs))) ]

# we have data that looks like this:
# [date, perf1, perf2, color]
# note there's two measurements per line.
# instead we want data that looks like this:
# [date, perf, color, perftype] where perftype is perf1 or perf2
# the right operator for this is "melt" in the "reshape" package

# melt from horizontal into vertical format
df = ro.r.melt(df, 
               id_var=['Date','color'], 
               measure=['Perf1','Perf2'], 
               variable_name='PerfType')
# rename resulting value column to Performance
df.names[tuple(df.colnames).index('value')] = 'Performance'

# now we have 4 datasets: {red, green} x {perf1, perf2}
# plot the colored datasets in their respective colors
# plot the PerfTypes as solid (circle markers) and dashed (triangle markers) 
#   lines

# plot with both log and linear y scales
# aes_string: set the axis labels and what we're plotting
# opts: set the title and the thickness of the lines
#   note the use of **{} to allow setting "legend.key.size" as a keyword
# scale_colour_manual: associate color datasets with actual colors and names
# geom_point and geom_line: thicker points and lines
# scale_linetype_manual: associate perf types with linetypes
for col_i, yscale in enumerate(['log', 'linear']): 
  vp = grid.viewport(**{'layout.pos.col':col_i+1, 'layout.pos.row': 1})
  pp = ggplot2.ggplot(df) + \
      ggplot2.aes_string(x='Date', y='Performance', color='color', 
                         shape='PerfType', linetype='PerfType') + \
      ggplot2.opts(**{'title' : 
                      'Performance vs. Color',
                      'legend.key.size' : ro.r.unit(1.4, "lines") } ) + \
      ggplot2.scale_colour_manual("Color", 
                                  values=colormap,
                                  breaks=colormap.names,
                                  labels=[elt[1] for elt in 
                                          colormap_labels]) + \
      ggplot2.geom_point(size=3) + \
      ggplot2.scale_linetype_manual(values=linemap) + \
      ggplot2.geom_line(size=1.5)

  # custom y-axis lines: major lines ("breaks") are every 10^n; 9
  #   minor lines ("minor_breaks") between major lines
  if (yscale == 'log'):
    pp = pp + \
        ggplot2.scale_y_log10(breaks = ro.r("10^(%d:%d)" % (gflops_range[0], 
                                                            gflops_range[1])),
                              minor_breaks = 
                              ro.r("rep(10^(%d:%d), each=9) * rep(1:9, %d)" %
                                   (gflops_range[0] - 1, gflops_range[1], 
                                    gflops_range[1] - gflops_range[0])))

  pp.plot(vp = vp)
_images/graphics_ggplot2perfcolor_both3.png

Package grid

The grid package is the underlying plotting environment for lattice and ggplot2 figures. In few words, it consists in pushing and poping systems of coordinates (viewports) into a stack, and plotting graphical elements into them. The system can be thought of as a scene graph, with each viewport a node in the graph.

>>> from rpy2.robjects.lib import grid

Getting a new page is achieved by calling the function grid.newpage().

Calling layout() will create a layout, e.g. create a layout with one row and 3 columns:

>>> lt = grid.layout(1, 3)

That layout can be used to construct a viewport:

>>> vp = grid.viewport(layout = lt)

The created viewport corresponds to a graphical entity. Pushing into the current viewport, can be done by using the class method grid.Viewport.push():

>>> vp.push()

Example:

grid.newpage()
# create a rows/columns layout
lt = grid.layout(2, 3)
vp = grid.viewport(layout = lt)
# push it the plotting stack
vp.push()

# create a viewport located at (1,1) in the layout
vp = grid.viewport(**{'layout.pos.col':1, 'layout.pos.row': 1})
# create a (unit) rectangle in that viewport
grid.rect(vp = vp).draw()

vp = grid.viewport(**{'layout.pos.col':2, 'layout.pos.row': 2})
# create text in the viewport at (1,2)
grid.text("foo", vp = vp).draw()

vp = grid.viewport(**{'layout.pos.col':3, 'layout.pos.row': 1})
# create a (unit) circle in the viewport (1,3)
grid.circle(vp = vp).draw()
_images/graphics_grid3.png

Custom ggplot2 layout with grid

grid.newpage()

# create a viewport as the main plot
vp = grid.viewport(width = 1, height = 1) 
vp.push()

p = ggplot2.ggplot(datasets.rock) + \
    ggplot2.geom_point(ggplot2.aes_string(x = 'area', y = 'peri')) + \
    ggplot2.theme_bw()
p.plot(vp = vp)

vp = grid.viewport(width = 0.6, height = 0.6, x = 0.37, y=0.69)
vp.push()
p = ggplot2.ggplot(datasets.rock) + \
    ggplot2.geom_point(ggplot2.aes_string(x = 'area', y = 'shape')) + \
    ggplot2.opts(**{'axis.text.x': ggplot2.theme_text(angle = 45)})

p.plot(vp = vp)
_images/graphics_ggplot2withgrid3.png

Classes

class rpy2.robjects.lib.grid.Viewport(o)

Bases: rpy2.robjects.robject.RObject

Drawing context. Viewports can be thought of as nodes in a scene graph.

classmethod current()

Return the current viewport in the stack.

classmethod default(**kwargs)
classmethod down(name, strict=False, recording=True)

Return the number of Viewports it went down

classmethod pop(n)

Pop n viewports from the stack.

push(recording=True)
classmethod seek(name, recording=True)

Seek and return a Viewport given its name

classmethod up(n, recording=True)

Go up n viewports

classmethod viewport(**kwargs)

Constructor: create a Viewport

class rpy2.robjects.lib.grid.Grob(o)

Bases: rpy2.robjects.robject.RObject

Graphical object

draw(recording=True)

Draw a graphical object (calling the R function grid::grid.raw())

classmethod grob(**kwargs)

Constructor (uses the R function grid::grob())

class rpy2.robjects.lib.grid.GTree(o)

Bases: rpy2.robjects.lib.grid.Grob

gTree

classmethod grobtree(**kwargs)

Constructor (uses the R function grid::grobTree())

classmethod gtree(**kwargs)

Constructor (uses the R function grid::gTree())

Class diagram

Inheritance diagram of rpy2.robjects.lib.grid