Regular expression on R gregexpr

I'm trying to count the instances of three consecutive "and" events "aaa".

The string will contain the lower alphabet, for example. "abaaaababaaa"

I tried the following code snippet. But the behavior is not quite what I am looking for.

x<-"abaaaababaaa";
gregexpr("aaa",x);

I would like the match to return 3 instances of the "aaa" event, not 2.

Assume indexing starts at 1

  • The first appearance of "aaa" is index 3.
  • The second appearance of "aaa" is at index 4. (this is not captured by gregexpr)
  • The third appearance of "aaa" is at index 10.
+5
source share
3 answers

To catch matching matches, you can use lookahead as follows:

gregexpr("a(?=aa)", x, perl=TRUE)

"", , .

+6

, , ,

your.string <- "abaaaababaaa"
nc1 <- nchar(your.string)-1
x <- unlist(strsplit(your.string, NULL))
x2 <- c()
for (i in 1:nc1)
x2 <- c(x2, paste(x[i], x[i+1], x[i+2], sep="")) 
cat("ocurrences of <aaa> in <your.string> is,", 
    length(grep("aaa", x2)), "and they are at index", grep("aaa", x2))
> ocurrences of <aaa> in <your.string> is, 3 and they are at index 3 4 10

R-help Fran.

+1

gregexpr.

x<-"abaaaababaaa"
# nest in lookahead + capture group
# to get all instances of the pattern "(ab)|b"
matches<-gregexpr('(?=((ab)|b))', x, perl=TRUE)
# regmatches will reference the match.length attr. to extract the strings
# so move match length data from 'capture.length' to 'match.length' attr
attr(matches[[1]], 'match.length') <- as.vector(attr(matches[[1]], 'capture.length')[,1])
# extract substrings
regmatches(x, matches)
# [[1]]
# [1] "ab" "b"  "ab" "b"  "ab" "b" 

, lookahead. gregexpr , capture.length, , . match.length ( , lookahead), regmatches .

As outlined by the type of end result, with several modifications, this can be vectorized for when it xis a list of strings.

x<-list(s1="abaaaababaaa", s2="ab")
matches<-gregexpr('(?=((ab)|b))', x, perl=TRUE)
# make a function that replaces match.length attr with capture.length
set.match.length<-
function(x) structure(x, match.length=as.vector(attr(x, 'capture.length')[,1]))
# set match.length to capture.length for each match object
matches<-lapply(matches, set.match.length)
# extract substrings
mapply(regmatches, x, lapply(matches, list))
# $s1
# [1] "ab" "b"  "ab" "b"  "ab" "b" 
# 
# $s2
# [1] "ab" "b" 
0
source

All Articles