Potential Future Options | |||||
Feature | Description | Microsoft Azure Databricks | Snowflake | Azure Synapse SQL Pools | Redshift |
Billing | |||||
Compute | How compute billing works in each platform. | Pay-as-you-go model and a pre-purchased capacity model.Pricing is based on the total amount of computing resources used by the Databricks workspace. | Pay-as-you-go (Minimum 60 seconds and per second increments) | Hourly for non-dedicated SQL pool For dedicated SQL pools, which are required to use T-SQL, while billing can be hourly the main deployment model is a consistent run model (like IaaS) and it makes more sense to deploy RI’s. | Hourly or Monthly depending on whether On-Demand or Reserved instances are used |
Auto-suspend / Auto-Resume for Compute | Ability to pause and resume billing for compute programmatically through the platform and save cost | Yes | Yes | No. Needs to be Manually controlled either from GUI or API. If suspended will not be reachable by any downstream systems. | No. Manually controlled and platform will not be reachable by downstream systems. |
Concurrency isolation | Ability to run many queries through multiple business users without affecting each other and not introduce a performance impact. | Yes – via Workspace Level Permissions, Cluster Access Controls, Job Scheduling and Job Isolation | Yes – via separate Virtual Warehouses for different workloads | Yes – dedicated SQL Pool. Requires ongoing administration to monitor capacity limits. | Yes via a clustered approach to deployment |
Downtime during Scaling | Interruption to live business users when scaling resources up or down e.g. no outage. | Yes – recommended during off-peak hours | No. Seamless scaling | Yes. stops all connections, rolls back existing transactions. | No. Scaling within defined parameters is automatic but will have a ‘lag’ time |
Administration (Manual effort intensive) | |||||
Partition strategy | Data distribution strategy between compute nodes(maintenance overhead – lower TCO) | Yes | Not required | Yes | Not required |
Index Maintenance | Creation and maintenance of Indexes for tuning(maintenance overhead – lower TCO) | Yes | Not required | Yes | Not required |
Materialized View Maintenance | Periodic refresh of materialized views(maintenance overhead – lower TCO) | Yes | Not required | Yes | Yes |
Statistics Collection | Refresh of table statistics(maintenance overhead – lower TCO) | Yes | Not required | Yes | Yes, can be automated |
Working with data | |||||
Structured data support | Support for basic data in tables which has rows and columns e.g. traditional database. | Yes | Yes | Yes | Yes |
Semi-structured data support | Support for advanced data structures e.g. semi structured data. | No | Yes | No – Additional services required | Yes |
Unstructured data support | Support for complex data structures e.g. data from our buildings which provide information on sustainability and building occupancy stats. | No | Yes | No – Additional services required | No – Additional services required |
Enterprise Readiness | |||||
Secure Data Sharing | Secure sharing of data with external business partners such as auditors and major lease clients such as EY. | Yes – via Access Controls, Encryption, Data Masking, VNet Peering and Private Endpoints | Yes – Built In (Sharing is multi-cloud and multi region with zero copy movement within the same cloud region) | Yes – Azure Data Share | Yes |
Data Cloning | The ability to easily replace product data for the business to perform scenario modelling and analysis. | Yes | Yes – Zero Copy Clone | No | No |
Time Travel | The ability to roll data back in seconds to a previous version. | Yes – via Delta Lake Time Travel, Data Versioning, Backup and Restore | Yes – up to 90 days | No | No |
Connectors & Drivers | Supported programming languages and clients (consider removal) | Python, SQL, Scala, R | Spark, Python, .NET, Kafka, ODBC, JDBC, PHP, Go and Node.js | Spark, Python, .NET, Spark SQL, ADO.NET, ODBC, PHP, and JDBC | Spark, Python, .NET, Kafka, ODBC, JDBC, PHP, Go and Node.js |
Language Support | Supported languages for in-database processing(consider removal) | Python, SQL, Scala, R | SQL, Scala/Java (Dataframes) and Python in the future | SQL | SQL |
Geospatial Data Support | Native support for geospatial data such as location-based data from our field devices. | Yes – using GeoPandas and Magellan | Yes | No | Yes |
Security | |||||
Azure AD integration | Support for Federated login, Active Directory Users and Groups | Supported | Supported | Supported | Supported |
OAuth / MFA | Yes | Yes | Yes | Yes | |
Role Based Access Control | Ability to create a role which has access to a number or tables / data. | Yes | Yes | Yes | Yes |
Data Encryption | Ability to store data in an encrypted format | Yes – Default | Yes – Default | Yes – Default | Yes – Default |
Data Masking | Ability to mask/hide sensitive information such as Credit Card details or Personally identifiable information. | Yes | Yes | Yes | Yes |
Data Compression | Ability to store data in a compressed format to reduce costs. | Yes – through Snappy Compression / Gzip Compression | Default | Yes – Through compression codecs / file formats | Yes – through compression codecs / file formats |
Data Access Audit | Ability to look up who has historically accessed specific data within the platform | Yes | Yes | Yes | Yes |
Built in Auto Classification & Anonymisation | Ability to automatically classify and tag sensitive data attributes such as personally identifiable information and subsequently automatically anonymise data. | No- Requires additional tooling | Yes – Currently in private preview | No- Requires additional tooling | No- Requires additional tooling |
Object Tagging | Ability to add custom tags to our data such as cost centre or custom tag sensitive data attribute. | Yes | Yes | No | No |
Deployment Model | How is this service consumed by us | Software As a Service (SaaS) | Software As a Service (SaaS) | Platform As a Service (PaaS) | Platform As a Service (PaaS) |