Detail out cluster capacity, performance, and concurrency expectations.
When users run queries in Amazon Redshift, the queries are routed to query queues. Each query queue contains several query slots. Each queue is allocated a portion of the cluster's available memory. A queue's memory is divided among the queue's query slots.
We use Amazon Redshift Dense Compute (DC2) nodes. Each node comes with 160 GB of storage.
What is the ETL strategy? Is it a full or incremental load?
In Analytics, when we set up a customer for the first time, we do a full sync and copy over all the masters and transactional data from the CORE database to date. We support the current year minus three years of data. After completing the full sync, the ETL picks up changes from the last successful sync and processes them in the current cycle.
How does it handle upserts, deletes, lost data recovery, etc.?
We don't maintain versions in Redshift, we always delete and insert changes. Hard deletes are possible.
What is stored in fact tables vs. dimension tables?
We store material values in fact tables and master attributes in dimensions. Each fact table is linked to dimensions by using WIDs. WID is the PK in the master table (Parent-Child relationship is established).
What is the data growth or purging strategy? What is the current capacity and how can we scale?
As of today, there is no purging strategy. It will change going forward and we will start maintaining only the current year minus three years data in Redshift.
We will scale whenever the cluster capacity reaches 60% or the query performance degrades.
What are the sync cycles? Are they on-demand syncs or regular syncs?
There is no on-demand sync. The ETL is deployed as a service scheduled to run continuously except for a daily three-hour maintenance window. The maintenance window is scheduled during non-production hours. In the maintenance window- we write CT's afresh every time and this needs the data to be sorted or distributed and vacuumed for better query performance.
Once we start writing our CT's incrementally, the need for a maintenance window will go away. Whenever additional data needs to be synced for an existing customer, we use batch mechanism, which ensures that the refresh is not delayed and that additional data gets synced in increments.
Do we allow any writes beyond Zenoti data into the cluster?
We don't support this.