The secret to big data is using it right: You only benefit from this technology if you can derive useful conclusions from the stored information, and connect it in the right ways. But many companies seem unaware of this. In a general approach of "let's just collect everything and then see what happens," they pile up data from all possible sources – and lose any overview or control.
Often this is because the content is stored following strict hierarchical data warehouse rules. However, what made good sense for corporate data until now is not necessarily the best solution for big data. It is difficult to impossible to make previously unknown connections between clusters of information under this older paradigm.
Trying and failing for Industry 4.0
Data lakes are a new kind of data storage that can help with this dilemma. Unlike the traditional data warehouse approach, here information is much freer and also more scalable: "Effective data management needs to be much more experimental these days," says Nasry Angel of Forrester consulting firm . He is specialized in examining companies' data architecture.
"You have to be able to try and fail quickly." Data warehouses are mainly about quality, where everything must be precise to the nth decimal point, according to Angel. "That means there's only one truth." But big data requires a different approach. "Today's approach is much more scientific: You have a hypothesis and you play around with all kinds of information to test it out. And sometimes you might find something," says Angel.
This concept has been adopted by Deutsche Bahn , for example. Together with startup Zero.One.Data, DB subsidiary Systel gathers the group's data in a data lake and makes it available upon request – and for a corresponding fee – to other companies or divisions in the group. Geographic and weather information, data from rail operations sites, train numbers and other data sources are processed into a single usable format, and can then be made available for analyses and identifying new connections. This puts the same raw data at the fingertips of all DB participants, but allows it to be used in completely different ways, making all kinds of applications possible based on data for wear and energy consumption to vehicle availability and predictive maintenance.
Data quality isn't everything
Data lakes use a flat architecture to store data. Every data element in the lake is assigned a specific, unique marker and also includes a whole set of metadata. If a business question is posed, the data lake can be searched for relevant data, and the resulting smaller data set can then be analyzed separately to contribute to finding a solution for the problem.
Many new and innovative products – from hybrid in-memory data warehouses to data warehouse appliances and cloud offers – are available to address user needs. When everyone receives information in the necessary quality, changes in the market environment can be depicted with more agility.
Flexibility comes first
Data management thus transitions from "let's see" to "here's what we need." "The focus of the data lake is not to collect data, but to use it. The great flexibility of this concept supports the modernization of the existing analysis landscape, as well as completely new, data-based business models," says Stephan Reimann, IT Specialist Big Data at IBM .
However, this new technology in no way means that existing data warehouses must be cleared out and replaced with data lakes. On the contrary: both technologies offer different advantages that come into play based on the intended purpose and use . So the system a company chooses largely depends on the intended purpose. What is clear is that brand new possibilities are available to business for using big data – with or without a data lake.
New technologies such as big data, digitization and smart materials are transforming industry and trade. Learn about the new possibilities offered by Industry 4.0 in presentations, demos and conferences at HANNOVER MESSE.