Graphical database (neo4j) and relational database. Need help with design

I need to work with an open source project ( biojava ), but I am not satisfied with some performance, d wanted to spend some time to improve it.

For example, I have a text database encoded this way:

chrX    Cufflinks   exon    65175856    65175971    .   .   .   gene_id "XLOC_002576"; transcript_id "TCONS_00004217"; exon_number "1"; gene_name "RP6-159A1.2"; oId "CUFF.3698.1"; nearest_ref "ENST00000456392"; class_code "p"; tss_id "TSS3873";    
chrX    Cufflinks   exon    128986006   128986088   .   .   .   gene_id "XLOC_002577"; transcript_id "TCONS_00004218"; exon_number "1"; oId "CUFF.3750.1"; class_code "u"; tss_id "TSS3874";

Not every field is required, each gene_idcan be associated with several transcript_id(1..n), and each transcript_idhas 1 or more exon.

The behavior of the library is to load the entire text file into ArrayList, and for each search, all lists must be iterated. This works well with small lists, but in my case I have 10 ^ 10 queries with a really big list, and it takes a couple of days on a good computer.

Would Neo4j be a good choice? What would be a good way to implement it? For example, is it bad to create a String object only and establish relationships between them? Or is it better to use Hsqldb with a single table?

Please note that I do not need perseverance, but speed and synchronization are required.

EDIT: if you want, you can see the project here .

+3
source share
2 answers

, "", "". , " ", RDBMS . , neo4j, , , -, "", " "

, , Hsqldb, , 3 (, , exon) hashmaps .

0

Neo4J , , .. , , . , :

(gene) -> (transcript) -> (exon)

Neo4J , " XLOC_002576, , ". ( , , , , , ).

, Neo4J . ( ), , , , , hadoop HDFS.

, . ? ""?

0

All Articles