Mission critical data within an organization is often decentralized, siloed, and inaccessible to those individuals charged with making strategic business decisions. Getting at data that is locked in transactional systems, or siloed in business units, can have a negative impact on the day-to-day operations of serving customers. The data warehouse was developed as the best way to aggregate your most mission-critical data, to make it actionable so that you can make better business decisions.
Cloud-based vendors such as Amazon and Azure have increasingly matured and are now offering more advanced data warehouse services. If you’re looking to start your data warehouse, or if you're looking to move your existing data warehouse to a cloud-based vendor, there are many options to choose from. We will look at four vendors who can provide the features you need for an effective data warehouse solution.
Snowflake®, www.snowflake.net, is a fully-relational SQL data warehouse that provides support for both structured and semi-structured data (JSON, Avro, XML), and implements comprehensive support for the SQL language. It allows corporate users to store and analyze data using cloud-based hardware and software. Data is stored in Amazon S3 does not rely on Hadoop. Snowflake Computing was founded in 2012 by Benoit Dageville, Thierry Cruanes and Marcin Zukowski and is based in San Mateo, California.
The company states that no administration is required and is delivered as a turn-key cloud service. It offers support for ETL and BI tools enabling developers to build modern data applications. Snowflake’s design physically separates but logically integrates storage, compute and services such as security and metadata. They call it multi-cluster, shared data and it consists of three components:
- Storage: the persistent storage layer for data stored in Snowflake;
- Compute: a collection of independent compute resources that execute data processing tasks required for queries;
- Services: a collection of system services that handle infrastructure, security, metadata, and optimization across the entire Snowflake system
The storage layer is built on Amazon S3 cloud storage, it is engineered to scale independent of compute resources. As a result, Snowflake delivers such capabilities as processing data loading or unloading without impacting running queries.
Some of Snowflake customers include CapitalOne, Adobe, and Sony.
Amazon Redshift®, aws.amazon.com/redshift, is an Internet hosting service and data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. Amazon Redshift is built on massive parallel processing (MPP) developed by ParAccel to handle large-scale data sets and database migrations. The key differences between AWS’ other hosted databases such as Amazon RDS, is the ability to handle analytics workloads on big data sets stored by a columnar database management system (DBMS).
Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the cost of building, maintaining, and growing self-managed, on-premise data warehouses is very high. As your data grows, you have to constantly trade-off what data to load into your data warehouse and what data to archive in storage so you can manage costs, keep ETL complexity low, and deliver good performance. Amazon says that Redshift can deliver on this promise.
Some key features that make Amazon Redshift a viable option include:
- Extensibility. Allowing you to run queries against exabytes of data.
- Scalable. Clusters can be sized up, or down, depending on needs.
- Secure. Built-in security by data encryption at rest, or in transit. Also supports Amazon’s Key Management Services (KMS) and Hardware Security Modules (HSM).
- Compatibility. Supports standard SQL; custom JDBC and ODBC drivers; and PostgreSQL JDBC and ODBC drivers.
A sampling of Amazon Redshift customer include, Pinterest, Nokia, and Johnson & Johnson.
BigQuery®, cloud.google.com/bigquery, is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) that can be used along with MapReduce. Crunching massive datasets can be time consuming, especially without the right hardware and infrastructure to support it, Google BigQuery hopes to solve that problem by providing a no-ops data warehouse for you to move your data to.
Two major components are key to driving BigQuery. First, is Dremel the query engine, Google’s “cloud-powered massively parallel query service;” and Google’s cluster management system. These two are joined together to form a solution that can run ad hoc queries across petabyte-scale databases.
BigQuery is optimized to work well with all Google cloud services such as Google Analytics, Data Studio, and Cloud Storage. Other key features that Google highlights include:
- Serverless. Eliminates need to provision hardware;
- Scalability. Reduces need to plan for data warehouse capacity;
- Real time analytics;
- High availability;
- Data Encryption and Security.
Some customers using BigQuery include Spotify, Smyte, and Wolters Kluwer.
Azure SQL Data Warehouse
Azure SQL Data Warehouse® is a cloud-based Enterprise Data Warehouse (EDW) that leverages Massively Parallel Processing (MPP) to run complex queries across petabytes of data. Companies can import their big data into SQL Data Warehouse using PolyBase T-SQL queries, you can then leverage the power of MPP to run high-performance analytics. With the use of relational tables with columnar storage, data-storage costs can be reduced while providing improved query performance.
Some key features that Microsoft has promised with Azure SQL Data Warehouse include the following:
- Elasticity. Compute and storage components are separated so each can scale based on workload needed;
- Scalability. Horizontal/Geographical; Service-tier; Readable Secondaries(Always On Availability);
- Security. Data masking, row-level security, Always Encrypted;
- Key Vault Integration. Integration with Extensible Key Management;
- V12 Portability. Eliminates need for long, detailed data migrations strategies;
- Query Store. Database performance monitoring capabilities;
Azure customers using SQL Data Warehouse include, Toshiba, LG Electronics, and Carnival.
These four vendors are only a subset of the data warehouse technologies available today that enable you to centralize and “un-silo” your data. Once you establish your data warehouse, your most mission-critical, transactional data, will become more actionable allowing you to make better business decisions.