Extract a specified word from a vector using R

I have text like

text<- "i am happy today :):)"

I want to extract :) from a text vector and report its frequency

+3
source share
3 answers

Here is one idea that could be easily generalized:

text<- c("i was happy yesterday :):)",
         "i am happy today :)",
         "will i be happy tomorrow?")

(nchar(text) - nchar(gsub(":)", "", text))) / 2
# [1] 2 1 0
+5
source

I assume you only want to count, or do you also want to remove :)from the row?

For the account you can do:

length(gregexpr(":)",text)[[1]])

which gives 2. A more generalized solution for a row vector:

sapply(gregexpr(":)",text),length)

Edit:

Josh O'Brien noted that this also returns 1 out of nonexistent :), since gregexprreturns -1in this case. To fix this, you can use:

sapply(gregexpr(":)",text),function(x)sum(x>0))

Which becomes a little less beautiful.

+3
source

, :

mytext<- "i am happy today :):)"

# The following line inserts semicolons to split on
myTextSub<-gsub(":)", ";:);", mytext)

# Then split and unlist
myTextSplit <- unlist(strsplit(myTextSub, ";"))

# Then see how many times the smiley turns up
length(grep(":)", myTextSplit))

> 1, :

mytext<- rep("i am happy today :):)",2)
myTextSub<-gsub(":\\)", ";:\\);", mytext)
myTextSplit <- strsplit(myTextSub, ";")

sapply(myTextSplit,function(x){
  length(grep(":)", x))
})

.

+1

All Articles