I want to know how to vectorize and memoize a user-defined function in R. It seems my way of thinking is not consistent with the R-mode of operation. So, I gladly welcome any links to good reading material. For example, R inferno is a good resource, but this did not help to define memoirization in R.
More generally, can you provide an appropriate use case for memoise
or R.cache?
I could not find other discussions on this. a search for "memoise" or "memoize" on r-bloggers.com returns zero results. a search for these keywords http://r-project.markmail.org/ does not return discussions. I emailed a mailing list and did not receive the full Response.
I'm not only interested in the memory of the GC function, and I know the Bioconductor and the various packaging available there.
Here is my details:
seqs <- c("","G","C","CCC","T","","TTCCT","","C","CTC")
Some sequences are missing, so they are empty "".
I have a function to calculate the contents of the GC:
> GC <- function(s) {
if (!is.character(s)) return(NA)
n <- nchar(s)
if (n == 0) return(NA)
m <- gregexpr('[GCSgcs]', s)[[1]]
if (m[1] < 1) return(0)
return(100.0 * length(m) / n)
}
Works:
> GC('')
[1] NA
> GC('G')
[1] 100
> GC('GAG')
[1] 66.66667
> sapply(seqs, GC)
G C CCC T TTCCT
NA 100.00000 100.00000 100.00000 0.00000 NA 40.00000 NA
C CTC
100.00000 66.66667
I want to perpetuate it. Then I want to vectorize it.
Apparently, I should have the wrong setup for using memoiseeither
R.cacheR packages:
> system.time(dummy <- sapply(rep(seqs,100), GC))
user system elapsed
0.044 0.000 0.054
>
> library(memoise)
> GCm1 <- memoise(GC)
> system.time(dummy <- sapply(rep(seqs,100), GCm1))
user system elapsed
0.164 0.000 0.173
>
> library(R.cache)
> GCm2 <- addMemoization(GC)
> system.time(dummy <- sapply(rep(seqs,100), GCm2))
user system elapsed
10.601 0.252 10.926
, memoized .
hash, , , -
, . C
100, NULL.
, has.key(s, cache) exists(s, cache)
. , cache[s] <<- result
cache[[s]] <<- result .
> cache <- hash()
> GCc <- function(s) {
if (!is.character(s) || nchar(s) == 0) {
return(NA)
}
if(exists(s, cache)) {
return(cache[[s]])
}
result <- GC(s)
cache[[s]] <<- result
return(result)
}
> sapply(seqs,GCc)
[[1]]
[1] NA
$G
[1] 100
$C
NULL
$CCC
[1] 100
$T
NULL
[[6]]
[1] NA
$TTCCT
[1] 40
[[8]]
[1] NA
$C
NULL
$CTC
[1] 66.66667
, , :
> GCv <- Vectorize(GC)
> GCv(seqs)
G C CCC T TTCCT
NA 100.00000 100.00000 100.00000 0.00000 NA 40.00000 NA
C CTC
100.00000 66.66667
stackoverflow: