RPy

A simple and efficient access to R from Python

Sections
rpy rpy2
Exits
Acknowledgement 
This site is hosted by
SourceForge.net Logo

A Demonstration of RPy: Tim Churches

RPy, written by Walter Moreira and maintained by Gregory Warnes, is a Python extension module for using the R programming environment for data analysis and graphics from within Python.

RPy is available from the RPy project web page. As Walter notes, RPy was inspired by RSPython by Duncan Temple Lang. RSPython allows R to be called from Python and vice-versa (i.e. Python can be embedded in R), as well as providing more general facilities for exploiting the object-oriented aspects of both Python and R. However, at least for me, RPy is a lot easier to use. It is my sincere hope that RPy and RSPython can be merged in a display of el Norte/el Sur co-operation, so we can have the best of both.

The following example provides a small taste of both the power of the R environment and the ease with which RPy allows this power to be used from within Python.

The data are eruption times for the Old Faithful geyser which, along with Yogi Bear, is located in the Yellowstone National Park in Wyoming, USA. The data file faithful.dat was exported from the faithful example dataset which comes as part of R. The R code in this example was borrowed directly from Section 8.2 of "An Introduction to R, Version 1.4.1" by W.N. Venables, D.M. Smith and the R Development Core Team. Minimal changes to the orginal R code were required to make it work from within Python, thanks to RPy.

The following Python code (faithful.py):

from rpy import *

faithful_data = {"eruption_duration":[],
                 "waiting_time":[]}
		 
f = open('faithful.dat','r')

for row in f.readlines()[1:]: # skip the column header line
    splitrow = row[:-1].split(" ")
    faithful_data["eruption_duration"].append(float(splitrow[0]))
    faithful_data["waiting_time"].append(int(splitrow[1]))

f.close()

ed = faithful_data["eruption_duration"] 
edsummary = r.summary(ed)
print "Summary of Old Faithful eruption duration data"
for k in edsummary.keys():
    print k + ": %.3f" % edsummary[k]
print    
print "Stem-and-leaf plot of Old Faithful eruption duration data"
print r.stem(ed)

r.png('faithful_histogram.png',width=733,height=550)
r.hist(ed,r.seq(1.6, 5.2, 0.2), prob=1,col="lightgreen",
       main="Old Faithful eruptions",xlab="Eruption duration (seconds)")
r.lines(r.density(ed,bw=0.1),col="orange")
r.rug(ed)
r.dev_off()

long_ed = filter(lambda x: x > 3, ed)
r.png('faithful_ecdf.png',width=733,height=550)
r.library('stepfun')
r.plot(r.ecdf(long_ed), do_points=0, verticals=1, col="blue",
       main=paste("Empirical cumulative distribution function",
                  " of Old Faithful eruptions longer than 3 seconds")
x = r.seq(3,5.4,0.01)
r.lines(r.seq(3,5.4,0.01),r.pnorm(r.seq(3,5.4,0.01),mean=r.mean(long_ed), 
        sd=r.sqrt(r.var(long_ed))), lty=3, lwd=2, col="red")
r.dev_off()

r.png('faithful_qq.png',width=733,height=550)
r.par(pty="s")
r.qqnorm(long_ed,col="blue")
r.qqline(long_ed,col="red")
r.dev_off()

r.library('ctest')
print
print("Shapiro-Wilks normality test of Old Faithful eruptions" +\
      " longer than 3 seconds")
sw = r.shapiro_test(long_ed)
print "W = %.4f" % sw['statistic']['W']
print "p-value = %.5f" % sw['p.value']

print
print("One-sample Kolmogorov-Smirnov test of Old Faithful eruptions" +\
      " longer than 3 seconds"
ks = r.ks_test(long_ed,"pnorm", mean=r.mean(long_ed), 
               sd=r.sqrt(r.var(long_ed)))
print "D = %.4f" % ks['statistic']['D']
print "p-value = %.4f" % ks['p.value']
print "Alternative hypothesis: %s" % ks['alternative']
print

produces the following output:

Summary of Old Faithful eruption duration data
Mean: 3.488
Median: 4.000
3rd Qu.: 4.454
1st Qu.: 2.163
Min.: 1.600
Max.: 5.100

Stem-and-leaf plot of Old Faithful eruption duration data

  The decimal point is 1 digit(s) to the left of the |

  16 | 070355555588
  18 | 000022233333335577777777888822335777888
  20 | 00002223378800035778
  22 | 0002335578023578
  24 | 00228
  26 | 23
  28 | 080
  30 | 7
  32 | 2337
  34 | 250077
  36 | 0000823577
  38 | 2333335582225577
  40 | 0000003357788888002233555577778
  42 | 03335555778800233333555577778
  44 | 02222335557780000000023333357778888
  46 | 0000233357700000023578
  48 | 00000022335800333
  50 | 0370

None

Shapiro-Wilks normality test of Old Faithful eruptions longer than 3 seconds
W = 0.9793
p-value = 0.01052

One-sample Kolmogorov-Smirnov test of Old Faithful eruptions longer than 3 seconds
D = 0.0661
p-value = 0.4284
Alternative hypothesis: two.sided

and these graphs:

Impressed? I was.

Cheers,

Tim Churches

News

Last update on 10-2012