How much data do you need to invest in a Data Warehouse?

Published August 10, 2014   |   
Christopher Hebert

As companies collect more data and leave it stored in their source locations, whether it’s a CRM, ERP, or POS, they may reach a point where they want to consolidate all that data into a single, consistently structured location.

The question, then, is at what point is a full-blown data warehousing solution necessary?

The primary function of a data warehouse is to bring data from multiple sources into one system, either to be analyzed in that centralized location or to be transferred more efficiently between systems. The amount of data that warrants a data warehouse is less dependent on the sheer volume of data in giga, tera, or petabytes than on the number of sources that need to be integrated and managed.

Data warehouse as a central repository

You may want to start collecting all your data in one central location for several reasons, including security concerns, backup, and business intelligence.

With security concerns, bringing all of the reporting functionality of your data into one centralized location such as a data warehouse allows for greater control over who has access to what information. Data warehouses typically log the actions taken by users, which allows you to keep a closer eye on users who may, when given access to a less monitored system, perform ill-intended queries.

If the security demands of monitoring user access across various systems is spreading your information security team too thin, a data warehouse can help.

Data warehouses can also serve as a centralized backup tool, for your company’s disparate systems. By design, all modifications, additions, and removals in a data warehouse can be recorded and backed-up per your specifications. This is beneficial when the backup systems provided by your various data sources are either insufficient or too difficult to keep up individually.

In short, if the backup process for your data sources is fractured and difficult to maintain, consolidating that data in a data warehouse allows for complete backup in one location.

If new business intelligence (BI) efforts are on your horizon, it may be easier for the sake of consistent analysis to bring all your data into one format under one roof. This may also open up options when selecting business intelligence software, as it’s typically easier to integrate your BI software with one source than with your multiple, disparate sources. Many popular data warehousing tools also have prebuilt connectors that allow for easy integration with media services and CRM platforms.

If a BI tool you’re looking to use does not integrate with your various data sources, it may be cheaper and more efficient in the long run to bring all data into one reservoir, so at most only one integration is necessary, and you can change your sources without changing your BI tool.

You don’t need a data warehouse as a central repository if the number of systems between which data needs to be aggregated is a mere handful, and the quantity of data in them is only tens of gigabytes. In this case it may be more appropriate to have a simple database with the data from each system stored in tables. The volume of data is small enough to be analyzed in a single database (or even spreadsheet perhaps), and the transfer of data into that system can be done manually without the large investment of a data warehouse.

Data warehouse as a transactional platform

Some data warehouses exist not only for the purpose of having all data in one location but also to ease the transfer of data between various business systems.

The Extract, Transform, Load (ETL) process of a data warehouse means that once the data from one system is extracted, transformed into a normalized format and structure, then loaded into the data warehouse, it is consistent with all other data of its kind. This consistency allows data from different sources to be exported together out into another system.

For example, patient records from an EHR may enter a data warehouse via ETL, and then from the data warehouse to be exported to accounting software. This would be beneficial if there are multiple EHR systems in use. If one of those EHR systems were switched out for a new one (which research suggests happens often among medical organizations), then only the schema for transforming data from that EHR to the data warehouse would have to be redone. The schema for transforming data for export to the accounting software would not be affected.

A data warehouse as a transactional platform is unnecessary if there are few enough systems that the transfer of data between systems can be reasonably done either manually or with automated processes developed by analysts. Such solutions are less capital intensive than the implementation of a data warehouse.

Beyond situational descriptions of typical business needs, quantifying a minimum data requirement is difficult. Data warehouses unique to each scenario, and it’s best to thoroughly research your needs and requirements before committing to an installation.

About the author: Christopher Hebert is a content writer at TechnologyAdvice. He covers business intelligence, project management, and other emerging technologies. Connect with him on Google+.

This article recently appeared on the eleventh edition of our content partner Big Data Innovation Magazine. To download the magazine, click here.