I need to store telemetry data that is generated every few minutes from more than 10,000 nodes (which may increase), each of which transfers data via the Internet to a server for logging. I will also need to request this data from a web application.
I'm having trouble resolving the issue of what the best storage solution would be ...
Each node has a unique identifier, and there will be a timestamp for each variable package. (you probably need to create a server).
Telemetry data has all the variables in one package, so conceptually this can be easily stored in a single database table with a column per variable. Serial number + time stamp will be sufficient as a key. The size of each telemetry packet is 64 bytes, including the device identifier and timestamp. So about 100 GB + per year.
I would like to be able to query data for variables in time ranges, and also store summary reports of this data so that I can draw graphs.
Now, what is the best way to handle this? I am very familiar with using MySQL, so I am inclined to this. If I went to MySQL, would it make sense to have a separate table for each device identifier? - Will it make queries much faster or will there be a problem with 10,000 tables?
I don’t think that the request for variables from all devices at one time will be needed, but maybe. Or should I just paste all this into one table and use a MySQL cluster if it gets really big?
Or is there a better solution? I have been looking at some non-relational databases, but I don’t see anything that is perfect for an account or looks very mature. For example, MongoDB would have too much row size overhead, and I don’t know how efficient it would be to query the value of one variable over a large time range compared to MySQL. In addition, MySQL has been around for a while and is reliable.
I would also like to easily replicate data and back up.
Any ideas, or if someone did something similar, you will be very grateful!