Columnar storage technology is the most valuable feature of this solution.
We can get the SLS/SLAs in our daily processes.
Some improvements can be brought about in:
Restore table:
I would like to use this option to move data across different clusters. Right now, you can only restore a table from the same cluster.
Right now, the feature only permits bringing the table back in the same cluster, based on the snapshot taken. I would like to have a similar option to move data across different clusters, right now I have to UNLOAD from cluster A and then COPY in cluster B. I would like to use the snapshots taken to bring the data in the cluster I need.
Maybe current design cannot be used, because it is based on nodes and data distribution.
But, our real scenario is: if we lose the data and we need to recover it in other cluster, we have to do:
1) Restore table in current table with a different name
2) Unload data to s3
3) Copy data to a new cluster. When we are talking about billions of records is complex to do.
Vacuum process: The vacuum needs to be segmented. For example, after 24 hours of execution, I had to cancel the process and 0% was sorted (big table).
Vacuum process:
The vacuum needs to be segmented, example after 24 hr of execution, I had to cancel the process and 0 % was sorted (big table)"
For big tables (billions of records). if the table is 100% unsorted, the vacuum can take more than 24hrs. If we don't have this timeframe, we have to work around taking out the data to additional tables and run vacuum by batches in the main table.
Why, because If I run the vacuum directly over the main table, and I stop it after 5 hrs, 0 records will be sorted. I would like to run the vacuum over the main table, stop when I need but get vacuumed some records. Like incremental process.
I have used this solution for around three years.
We did encounter stability issues, i.e., if you are using more than 25 nodes (ds2.xlarge), the cluster is totally unstable.
I have not experienced any scalability issues.
I would rate the technical support a 9/10 for normal issues.
However, for advanced issues, I would give it a 5/10 since I had to go directly with the AWS engineers support.
Initially, we were using the Microsoft SQL solution. We decided to move over to this product due to the DWH volume and performance.
In my opinion, the setup was normal.
Based on quality of the product and its price, it is the one of the best options available in the market now.
We also looked at the Oracle solution.
You need to make sure that the space used in DWH has to be a maximum of 50% of the total space.
You must create processes to vacuum and analyze tables frequently. Also, before creating the tables, you should choose the right encoding, DISTKEY and sort keys.