![]() The argument can be a variable, the output from a cbind() command, or a data frame. Alternatively, use the colnames function after combining the data with the cbind() function.Īnother function that gives basic information on variables is the summary() command. To avoid this, make a proper data frame before running the sapply function. Note that we have lost the variable labels. It can easily be changed to a data frame: sapply(ame(cbind(Veg$R, Veg$ROCK, Veg$LITTER, Veg$ML, Veg$BARESOIL)), FUN = mean) It will produce one long vector of data, because the output of the cbind command is not a data frame. This will not work: sapply(cbind(Veg$R, Veg$ROCK, Veg$LITTER, Veg$ML, Veg$BARESOIL), FUN = mean) The variable that contains the data in lapply and sapply needs to be a data frame. The choice depends on the format in which you would like the output. The output of lapply is presented as a list, whereas sapply gives it as a vector. So, what is the difference between sapply() and lapply()? The major differences lie in the presentation of output, as can be seen in the following example. Instead of the mean, you can use any other function as an argument for FUN, and you can write your own functions. The word FUN stands for function, and must be written in capitals. It is important to realise that tapply calculates the mean (or any other function) for subsets of observations of a variable, whereas lapply and sapply calculate the mean (or any other function) of one or more variables, using all observations. ![]() ![]() R provides other functions similar to the tapply to address this situation: lapply() and sapply(). However, we do not need to type in the mean command 20 times. There are 20 numerical variables in the vegetation dataset, columns 5–25 of the data frame Veg. We specifically say 'numerical' as one cannot calculate the mean of a factor. This is laborious if we wish to calculate the mean of a large number of variables such as all the numerical variables of the vegetation data. To calculate the mean, minimum, maximum, standard deviation, and length of the full series, we still need to use mean(Veg$R), min(Veg$R), max(Veg$R), sd(Veg$R), and length(Veg$R). Le <- tapply(Veg$R, Veg$Transect, length)Įach row in the output gives the mean richness, standard deviation, and number of observations per transect. The following lines of code calculate some of these functions for the vegetation data. To each subgroup of data, it applies a function, in this case the mean, but we can also use the standard deviation (function sd()), variance (function var()), length (function length()),and so on. ![]() The tapply function splits the data of the first variable (R),based on the levels of the second variable (Transect). Or tapply(X = Veg$R, INDEX = Veg$Transect, FUN = mean) The R function tapply performs the same operation as the code above (for m1 through m8 ), but with a single line of code tapply(Veg$R, Veg$Transect, mean) Mean species richness per transect - a better way! It is not a matrix hence there is no need for a comma between the square brackets. Note that the mean command is applied to Veg$R, which is a vector of data. The variable m contains the mean richness of all 8 transects, and m1 through m8 show the mean richness values per transect. Head(Veg) Species richness overall & per transect - the long way! m <- mean(Veg$R) Verify that it is read in correctly names(Veg) Simple Functions tapply() Data: species richness in different vegetation transects Veg <- read.table(file = "Vegetation2.txt", header = TRUE) ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |