Data Frames
It's tempting to think of data.frames as glorified matrices
which we
can access by field name without knowing which column it is. Actually,
data.frames are lists where every list item is required to be
exactly the same length.
The distinction becomes important when we want to perform some operations on the
data frame, especially mathematical operations. For example, if our data on
community abudance are in a data.frame called comm and we want the mean
abundance of all species in our plots we cannot do
mean(comm)
and get an answer. What we will get is
[1] NA
Warning message:
In mean.default(taxon) :
argument is not numeric or logical: returning NA
because data.frames are not numerical, even if all the data in the data.frame ARE
numerical. Instead we have to do
mean(as.matrix(comm))
[1] 0.05693787
Actually, it's quite maddening. sum(comm), min(comm) and max(comm) all work,
but mean(comm) and many other functions do not.
So, you have two options.
- convert your data.frame to a matrix permanently, and give up the option to
address columns simply by field name, or
- get used to surrounding your data.frame name with as.matrix() if you
want to do math on the whole data.frame.
You won't run into ths problem too often, because usually you're oerating on a
single column (field), but here's a common example.
table(taxon)
Error in table(bryceveg) : attempt to make a table with >=
2^31 elements
table(as.matrix(taxon))
0 0.2 0.5 1 2 3 4 5
24851 3 1818 202 104 36 18 8
It's actually strongly preferable to keep the data in a data.frame if they may used as
explanatory variables in a statsitical model. Many model functions allow a data=
clause in the model statement to tell the function where to find the variables of interest
and a data.frame will operate more smoothly.