We are developing a table for ad-hoc analysis, which will cover many value fields over time for the received applications. The table structure essentially (pseudo-code):
table_huge (
claim_key int not null,
valuation_date_key int not null,
value_1 some_number_type,
value_2 some_number_type,
[etc...],
constraint pk_huge primary key (claim_key, valuation_date_key)
);
All value fields are all numeric. Requirements: in the table should be recorded at least 12 last years (I hope more) of the stated requirements. Each claim must have an evaluation date for each end of the month between the beginning of the claim and the current date. Typical volumes of applications for claims range from 50 thousand to 100 thousand per year.
Adding all this, I am designing a table with row counts of the order of 100 million and can grow to 500 million over the years, depending on the needs of the business. The table will be rebuilt every month. Consumers will choose only. With the exception of monthly updates, no updates, insertions, or deletions will occur.
I come to this from the side of the business (consumer), but I have an interest in reducing IT costs while maintaining the analytical value of this table. We are not particularly concerned about the quick return from the table, but sometimes you need to drop a couple of dozen queries on it and get all the results in a day or three.
For the sake of argument, suppose that the technology stack, I do not know, is in the 80th percentile of modern equipment.
I have the following questions:
- , , ?
- SO + 100M
, ?
- ,
- ( ?)?
, , , , .
, , - . !