How to correctly count occurrences of a string in a string with Ruby

I have a 300 MB text file, I want to count the occurrences of every 10,000 substrings in the file. I want to know how to do it quickly.

Now I use the following code:


content = IO.read("path/to/mytextfile")
Word.each do |w|
  w.occurrence = content.scan(w.name).size
  w.save
end

A word is an ActiveRecord class.

It took me almost 1 day to finish the count. Is there any way to make this faster? Thank.

Edit1: Thanks again. I run the rails 2.3.9. The table namecontained in the word table contains what I am looking for and contains only unique values. Instead of using, Word.eachI use loading (1000 lines per second). This should help.

I rewrote all the code with an idea from bpaulon. Now it took only a few hours to complete the count.

, utf8 encode

def truncate(n)
  self.slice(/\A.{0,#{n}}/m)
end

def utf8_length
  self.unpack('U*').size
end

?

+3
3

, -.

, db, , mongo mysql, db , "counter".

, " , ". , , , , IO, , .


: ? , , Word.name, ( ) . \n? , , , , , .

, 20 , 0 30000 . 0 40 , 20 60, 40 80 ..

, , .

, , , , , , , , Words.count 300Mb.

+1

scan , . , , , , 300 .

Word ActiveRecord, , , . , , , . , Word, , .

, , , , ..

, .


:

/, , . , , ?

, , , , , , , , , 10 000 , 10 000 , DB, .

Ruby , , , Ruby 1.9+. RVM Ruby. , rvm notes .

Word ? ?


: , , id, . - Qaru https://dba.stackexchange.com/ , . , , .

: " ".

, , SQL Word.each. - "select * from word"? , Rails 10 000 , . - "select * from word where id=1", , . , " ".

, , content - , , . , , ? , , unique , .

, , Ruby ? , 100 1000 . -r profile. , , , .

Rails ?

+3

"Word" Trie, , , .

, , Trie . , . " " :

  • node. ( , - )
  • node. ( )
  • node. ( - "" )

- , , , "" Trie, . , , , Trie.

, , , , , .

, , , , , .

0

All Articles