Feature comparison Databricks, Snowflake, Synapse and Redshift

Potential Future Options
Feature	Description	Microsoft Azure Databricks	Snowflake	Azure Synapse SQL Pools	Redshift
Billing
Compute	How compute billing works in each platform.	Pay-as-you-go model and a pre-purchased capacity model.Pricing is based on the total amount of computing resources used by the Databricks workspace.	Pay-as-you-go (Minimum 60 seconds and per second increments)	Hourly for non-dedicated SQL pool For dedicated SQL pools, which are required to use T-SQL, while billing can be hourly the main deployment model is a consistent run model (like IaaS) and it makes more sense to deploy RI’s.	Hourly or Monthly depending on whether On-Demand or Reserved instances are used
Auto-suspend / Auto-Resume for Compute	Ability to pause and resume billing for compute programmatically through the platform and save cost	Yes	Yes	No. Needs to be Manually controlled either from GUI or API. If suspended will not be reachable by any downstream systems.	No. Manually controlled and platform will not be reachable by downstream systems.

Concurrency isolation	Ability to run many queries through multiple business users without affecting each other and not introduce a performance impact.	Yes – via Workspace Level Permissions, Cluster Access Controls, Job Scheduling and Job Isolation	Yes – via separate Virtual Warehouses for different workloads	Yes – dedicated SQL Pool. Requires ongoing administration to monitor capacity limits.	Yes via a clustered approach to deployment
Downtime during Scaling	Interruption to live business users when scaling resources up or down e.g. no outage.	Yes – recommended during off-peak hours	No. Seamless scaling	Yes. stops all connections, rolls back existing transactions.	No. Scaling within defined parameters is automatic but will have a ‘lag’ time
Administration (Manual effort intensive)
Partition strategy	Data distribution strategy between compute nodes(maintenance overhead – lower TCO)	Yes	Not required	Yes	Not required
Index Maintenance	Creation and maintenance of Indexes for tuning(maintenance overhead – lower TCO)	Yes	Not required	Yes	Not required
Materialized View Maintenance	Periodic refresh of materialized views(maintenance overhead – lower TCO)	Yes	Not required	Yes	Yes
Statistics Collection	Refresh of table statistics(maintenance overhead – lower TCO)	Yes	Not required	Yes	Yes, can be automated
Working with data
Structured data support	Support for basic data in tables which has rows and columns e.g. traditional database.	Yes	Yes	Yes	Yes
Semi-structured data support	Support for advanced data structures e.g. semi structured data.	No	Yes	No – Additional services required	Yes
Unstructured data support	Support for complex data structures e.g. data from our buildings which provide information on sustainability and building occupancy stats.	No	Yes	No – Additional services required	No – Additional services required
Enterprise Readiness
Secure Data Sharing	Secure sharing of data with external business partners such as auditors and major lease clients such as EY.	Yes – via Access Controls, Encryption, Data Masking, VNet Peering and Private Endpoints	Yes – Built In (Sharing is multi-cloud and multi region with zero copy movement within the same cloud region)	Yes – Azure Data Share	Yes
Data Cloning	The ability to easily replace product data for the business to perform scenario modelling and analysis.	Yes	Yes – Zero Copy Clone	No	No
Time Travel	The ability to roll data back in seconds to a previous version.	Yes – via Delta Lake Time Travel, Data Versioning, Backup and Restore	Yes – up to 90 days	No	No
Connectors & Drivers	Supported programming languages and clients (consider removal)	Python, SQL, Scala, R	Spark, Python, .NET, Kafka, ODBC, JDBC, PHP, Go and Node.js	Spark, Python, .NET, Spark SQL, ADO.NET, ODBC, PHP, and JDBC	Spark, Python, .NET, Kafka, ODBC, JDBC, PHP, Go and Node.js
Language Support	Supported languages for in-database processing(consider removal)	Python, SQL, Scala, R	SQL, Scala/Java (Dataframes) and Python in the future	SQL	SQL
Geospatial Data Support	Native support for geospatial data such as location-based data from our field devices.	Yes – using GeoPandas and Magellan	Yes	No	Yes
Security
Azure AD integration	Support for Federated login, Active Directory Users and Groups	Supported	Supported	Supported	Supported
OAuth / MFA		Yes	Yes	Yes	Yes
Role Based Access Control	Ability to create a role which has access to a number or tables / data.	Yes	Yes	Yes	Yes
Data Encryption	Ability to store data in an encrypted format	Yes – Default	Yes – Default	Yes – Default	Yes – Default
Data Masking	Ability to mask/hide sensitive information such as Credit Card details or Personally identifiable information.	Yes	Yes	Yes	Yes
Data Compression	Ability to store data in a compressed format to reduce costs.	Yes – through Snappy Compression / Gzip Compression	Default	Yes – Through compression codecs / file formats	Yes – through compression codecs / file formats
Data Access Audit	Ability to look up who has historically accessed specific data within the platform	Yes	Yes	Yes	Yes
Built in Auto Classification & Anonymisation	Ability to automatically classify and tag sensitive data attributes such as personally identifiable information and subsequently automatically anonymise data.	No- Requires additional tooling	Yes – Currently in private preview	No- Requires additional tooling	No- Requires additional tooling
Object Tagging	Ability to add custom tags to our data such as cost centre or custom tag sensitive data attribute.	Yes	Yes	No	No

Deployment Model	How is this service consumed by us	Software As a Service (SaaS)	Software As a Service (SaaS)	Platform As a Service (PaaS)	Platform As a Service (PaaS)

Breaking

Feature comparison Databricks, Snowflake, Synapse and Redshift

By Vibhor Goyal

Related Post

Leave a Reply Cancel reply

You Missed

My SQL Database Migration To Azure SQL Databse

SQL Server To Azure SQL Migration Doc

ChatGPT is just the tip of the iceberg

Pull data from a Web API and store the data in an excel file

Archives

Categories