The versatility of Snowflake has changed how data landscapes are architected.
Now all the data can live in Snowflake’s highly flexible data stores no matter how many sources exist. For this reason, the solution architectures of late have all the data porting into Snowflake to onboard data into a landing zone then centralizing into a Snowflake Data Lake, then graduating to further analytical maturity in Data Marts or Warehouses. This graduated architecture is part of many best practice deployments.
The flexibility of Snowflake’s DevOps makes this process very straightforward to implement. However, the elegance of this solution is disrupted when marketing and sales organizations need to consolidate duplicate customer records landing in the Snowflake Data Lake. To alleviate this issue, the solution is to ship data out of Snowflake and onto another party’s compute platform. Once your duplicate data has been identified, the vendor then has to port the data back to you in order to make it useful to the rest of the organization. This isn’t just inconvenient but it also has massive data security risks.
code to the
the data to
Truelty allows for the Snowflake architecture to stay in tact by keeping the processing of code within the 4 walls of the customer’s Snowflake instance. The Truelty application communicates only at a control layer, and has no access to actual customer data. When the code is ready to execute it happens all within the customer’s Snowflake instance. This powerful approach allows the client to stay within the analytical applications they are accustomed to using without requiring further integration of a third party data transfer.
Your value as a Systems Integrator is in developing a solution that is both gracefully extensible and future proof. Snowflake has done a great job of handling the future proofing, but when it comes to Identity Resolution the past market approaches are just a logistics nightmare. Truelty resolves this by generating cluster keys for unique identities right within Snowflake. The process leverages Truelty’s asynchronous deep chain computation which it has built specifically for Snowflake.
What is required to get started?
Snowflake instanceRunning Enterprise or higher editions
Load dataData from enterprise applications, 3rd party data, cloud applications etc loaded into Snowflake
Run the set-upPython application and Snowflake permissions. All pre-scripted, takes about an hour. This creates a secure area for Truelty to process and avoids any conflicts with existing databases of schemas
Landing zone or ViewLoad the data in the Truelty landing zone or point Truelty to a view. This is your Snowflake instance. Load new or changed data at any interval you choose
Audit auto-mapped columns in the Control planeTruelty runs native Snowflake profiling on the columns to match it to Truelty’s Semantic Categories. Here you can audit the output to validate any required changes and flag columns in meaningful ways
Schedule the resolution process to runThe control plane creates a table in Snowflake which the Truelty python application uses to auto-execute the ID resolution code