RT parallel processing in Rails

I'm developing a kind of personalized search engine in Ruby on Rails, and I'm currently trying to find a better way to sort the results based on a user record in real time.

Example: items that are searched can have tags (separate objects with identifiers), for example item has tags = [1, 5, 10, 23, 45].

The user, on the other hand, may tag some tags with particular interest, so say that the user has tags = [5, 23].

The score used to sort the results should take into account the number of element tags that “look” at the user. For example, an item’s rating will be 50% based on the attributes of the items and 50% on the ranking depending on the user (the number of tags looks).

One idea was to insert this into the sort function in the information retrieval system. But in Sphinx, which I will probably use, it would be very inconvenient to implement (when the user vector is large). I don't know about Lucene / solr, but they don't seem to have the advanced non-text search capabilities that I need anyway (distance, date, time, etc.).

- -, . , , 100-1000 , Rails .

, , - 1000 , , .

, , , skynet .., , ( ?).

, MR, ? , , , ?

(sidenote: , , Google, , " Google: - ". ( ) , )

+3
1

Map/Reduce , SQL, .

, :

users (id, ...)
items (id, ...)
tags (id, ...)
users_tags (user_id, tag_id)
items_tags (item_id, tag_id)

, :

users_items_tags (user_id, item_id, tag_id)

" ".

:

  select item_id, count(tag_id) as score
    from users_items_tags
   where user_id = <USER_ID>
group by item_id
order by score desc

, users_items_tags :

insert into users_items_tags (user_id, item_id, tag_id)
     select <USER_ID>, item_id, <TAG_ID>
       from items_tags
      where tag_id = <TAG_ID>

. , /.

. , , , . , . .

+1

All Articles