R for Ecologists

Data Frames

It's tempting to think of data.frames as glorified matrices which we can access by field name without knowing which column it is. Actually, data.frames are lists where every list item is required to be exactly the same length.

The distinction becomes important when we want to perform some operations on the data frame, especially mathematical operations. For example, if our data on community abudance are in a data.frame called comm and we want the mean abundance of all species in our plots we cannot do

mean(comm)

and get an answer. What we will get is

[1] NA Warning message: In mean.default(taxon) : argument is not numeric or logical: returning NA

because data.frames are not numerical, even if all the data in the data.frame ARE numerical. Instead we have to do

mean(as.matrix(comm))

[1] 0.05693787

Actually, it's quite maddening. sum(comm), min(comm) and max(comm) all work, but mean(comm) and many other functions do not. So, you have two options.

convert your data.frame to a matrix permanently, and give up the option to address columns simply by field name, or
get used to surrounding your data.frame name with as.matrix() if you want to do math on the whole data.frame.

You won't run into ths problem too often, because usually you're oerating on a single column (field), but here's a common example.

table(taxon)

Error in table(bryceveg) : attempt to make a table with >= 2^31 elements

table(as.matrix(taxon))

0 0.2 0.5 1 2 3 4 5 24851 3 1818 202 104 36 18 8

It's actually strongly preferable to keep the data in a data.frame if they may used as explanatory variables in a statsitical model. Many model functions allow a data= clause in the model statement to tell the function where to find the variables of interest and a data.frame will operate more smoothly.