siliconindia | | August 20169you know what questions to ask of your data; they are not so good when you want questions to emerge from the data itselfwhich is what machine learning and artificial intelligence initiatives require.Figure 1: Typical data warehousing and data marts workflowWhy we Need Lambda Architecture, Data Streams, and Data Lakes Healthcare data, like in so many other industries, is now more unstructured and varied than ever. From socio-economic data to genomics to imaging and telemedicine, traditional data warehouses cannot handle the variety, volume, or velocity of data coming into many business units and departments within a single institution, let alone between multiple institutions that might be partners or competitors. Figure 2: Types of data that today's systems need to integrate are more varied than everDMW initiatives need to morph by learning how Lambda Architectures work. Next generation data streams (different sources of data) pour into data lakes (similar to a warehouse but not as structured or pre-processed). Instead of being pre-structured or pre-defined, data is usually kept in its original format and can then be converted on the fly when being read from the database. This is called late-binding data. Late-bound data is typically formatted, cleansed, or otherwise processed upon reading of data and this is sometimes referred to as "schema on read". While this seems like it would be time-consuming or slow, modern Big Data systems can process and analyze data at pretty high speeds.Instead of the typical DMW driven ETL (extract-transform-load) process that is time-consuming and expensive, Lambda Architectures employ the ELT (extract-load-transform) approach. By having data extracted from source locations and then immediately loaded into highly efficient and flexible staging and pre-processing areas, transformations of data can occur when necessary and change more easily when analysts and data scientists discover new requirements. The cost of storage is the same but the agility in building applications, doing data science, or even traditional reporting is often significantly improved.Figure 3: Typical Lambda Architecture based data streams & lakes workflowHow CIOs should incorporate Lambda Architectures into their data integration roadmapsReplacing data marts and warehouses with Lambda Architectures immediately is neither simple nor fast. However, it's not too hard to build a go-forward strategy that accommodates current data infrastructures while migrating to a more flexible one. CIOs shouldn't focus on replacing their existing architecturethe Lambda Architecture can help form a "meta architecture" that incorporates their marts and warehouses initially and then over time those DMWs can be phased out when more lakes are created. As shown in Figure 4, there's a place for existing transactional systems, existing warehouses, and existing approaches to be used while a transition is taking place. Figure 4: How the Lambda Architecture differs from typical data warehouse architecture
<
Page 8 |
Page 10 >