Common Problems And Solutions In R

Common Problems And Solutions In R


Last edit: dombennett (14 Oct 2016 11:24) | Revisions: 2 | Created by: dombennett | Rating: 0


The idea of this page is to present common (and often confusing) grievances encountered with R…. and how to resolve them!

Extracting a vector from a data frame of characters

Problem

I often want to loop through a data frame and extract a vector from it. This is fine if that data frame contains numbers. For example, the following code generates a data frame of numbers, extracting the first row creates a vector equal to a vector of 1 to 5.

numbers.df <- data.frame (c1=rep (1, 5),
                          c2=rep (2, 5),
                          c3=rep (3, 5),
                          c4=rep (4, 5),
                          c5=rep (5, 5))
row.element <- 1:5
all (row.element %in% numbers.df[1,])  # TRUE

But if I switch to characters, it doesn't work.

letters.df <- data.frame (c1=rep ('a', 5),
                          c2=rep ('b', 5),
                          c3=rep ('c', 5),
                          c4=rep ('d', 5),
                          c5=rep ('e', 5))
row.element <- c ('a', 'b', 'c', 'd', 'e')
all (row.element %in% letters.df[1,])  # FALSE!

How should I resolve this? Printing letters.df[1,] returns this:

> letters.df[1,]
  c1 c2 c3 c4 c5
1  a  b  c  d  e

So perhaps if I convert the vector into characters it will work.

all (row.element %in% as.character (as.vector (letters.df[1, ])))  # FALSE

No, because this has simply converted the vector into a vector of 1s.

> as.character (as.vector (letters.df[1, ]))
[1] "1" "1" "1" "1" "1"

Solution 1

The problem is to do with levels in the data frame. When we extract our row we also need to drop the data frame class. (By default, [] preserve the data.frame class even though its a single row). This returns a list so we need to then convert this into a vector using unlist(). Now there are still levels but these refer to levels within the vector rather than the original data.frame. To drop these we can use as.character().

res <- as.character (unlist (letters.df[1, ,drop=TRUE]))
all (row.element %in% res)  # TRUE

Solution 2

The above solution is complicated. The problem is to do with levels. Therefore, another solution is to force all strings not to be made into factors when we create our data frame.

letters.df <- data.frame (c1=rep ('a', 5),
                          c2=rep ('b', 5),
                          c3=rep ('c', 5),
                          c4=rep ('d', 5),
                          c5=rep ('e', 5),
                          stringsAsFactors=FALSE)
all (row.element %in% letters.df[1,])  # TRUE!

Now the data.frame acts in the same way that numbers.df did. This is the easiest and most straightforward solution, however, it does come at the cost of losing factor functionality (e.g. tapply()) on the data frame.


rating: 0+x
Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License