I am a kind of DWH project (not really, but still). And there is this problem that we are constantly confronted with, and I was wondering if there would be a better solution. Is following
We get several large files with records containing all the states the user was included in, for example:
UID | State | Date
1 | Active | 20120518
2 | Inactive | 20120517
1 | Inactive | 20120517
...
And we are usually interested in the last state of each user. So far so good, with a little sorting, and we could get the way we want it. The only problem is that these files are usually large .. like 20-60gb, sorting these guys is sometimes a pain, since the sorting logic is usually not so simple.
As a rule, we load everything into our Oracle and use intermediate tables and materialized representations for this. However, sometimes performance bites us.
20-60gb can be big, but not so big. I mean, there should be a slightly more specialized way to handle these records, right?
I present two main ways to solve the problem:
1) Programming outside the DBMS, scripts and compiled things. But perhaps this is not very flexible unless more time is spent on developing something. In addition, I may have to deal with the administration of mailbox resources, while I don’t want to worry about it.
2) Download everything into the DBMS (in our case, Oracle) and use any tools that it provides for sorting and writing data. That would be my business, although I'm not sure if we use all the tools or just do it right, as for Oracle 10g.
Question:
60gb , , .
, , ?
!