I read a fascinating article today on the topic of ETL, specifically its impact on corporate agility, and it made me think of a question which frequently arises when discussing building analytical applications at enterprise scale: “Isn’t this mainly a job for ETL tools?” Which leads to the follow up question: “Since I’m already doing lots of ETL work, can’t I just add this to that workload?”.
Given the many topical similarities between the Lavastorm platform and traditional ETL tools, this initial reaction makes plenty of sense. But if we dig a little deeper, we find many reasons why ETL – at least as ETL is traditionally defined – is only part of the tool set required for building analytical applications, and complementary approaches are needed to build analytical solutions with the speed, agility, and scope which modern businesses desire.
I think it helps to frame this topic with an analogy to which everyone can relate: Transportation, specifically mass transit for commuting to work. In most major cities, there is a robust public transportation system. Large amounts of money and time have been invested in moving loads of people from any point A or B to any point C or D. If you’ve chosen to live near one of these points, you can walk over to the train and get where you need to go – so long as you go where most other people go, and when most other people want to go there. If you’re not within walking distance, or if your schedule isn’t aligned with the masses, then things get complicated. Maybe you can muddle through and wait until they build tracks out to you, but then what happens when you want to move? So if you need some flexibility, you’re typically going to use a personal automobile to supplement the train.
A car may not be designed to transport more than a handful of people at a time, and it may follow a less direct route than train tracks to get to point X, but it can pretty much go anywhere, anytime. It’s ideal for the majority of transportation scenarios, and over time if demand coalesces around certain driving routes, perhaps public transit will be built to facilitate some of these routes at scale and right near your door.
Until then, your car is part of your solution. Maybe you drive your car to the train station each day – relying on the heavy infrastructure for part of the journey but not being locked into it. Or maybe you decide that the train isn’t right for your commute, and just drive in all the way, maybe carpooling with others to keep costs down. Whatever your solution, the point is that it’s your decision when you have a car.
Back to the topic at hand: building enterprise applications. Traditional ETL obviously conforms to the public transportation element in our scenario. ETL technologies are exceedingly good at moving large quantities of data from one place to another repeatedly, reliably, and efficiently. But building the ETL infrastructure can take a long time, require considerable expertise and expense, and then be quite inflexible once it’s done. Just like train tracks.
When a new business initiative comes along, the required data are often located outside of the current ETL estate, either because the data hasn’t been deemed valuable enough for inclusion or it is beyond the reach of internal ETL (e.g. cloud CRM). The solution to getting this data to work cannot be the equivalent of “we’ll get our highly specialized civil engineers to scope the project, determine if there’s enough demand to make it worth a build out, and then start laying down tracks and someday run trains on them”. But that’s often what occurs with new projects, because the mindsets and skillsets often haven’t adopted to the expanded toolset which is now available for agile data management.
Which brings us to the personal automobile analogue in analytical application infrastructure: self-service data preparation like Lavastorm’s native data management capabilities. Modern technologies like Lavastorm provide the equivalent of just getting in the car and going. They enable users to quickly build connectors to a wealth of data sources, prepare these data for myriad different purposes, apply rich analytical logic to the data, and produce outputs which are accurate and actionable, all of which can be automated for continuous value delivery. But they don’t require you to build database schema or to be versed in the arcana of database tuning, meaning that more types of users can participate in the activity of authoring sophisticated data-driven applications. And they are designed with agility in mind – when data sources and analytical routines change, applications can be modified quickly to accommodate a dynamic business environment.
Sometimes, the data routes which are designed in Lavastorm will be folded into the core ETL infrastructure, but for many use cases the Lavastorm platform will scale up from ad hoc data discovery to production-grade operational analytics and only call out to external ETL tools for a subset of feeds. As with owning an automobile, there’s a tremendous freedom that Lavastorm provides with many creative uses of data which wouldn’t have been found on the one-track path of legacy ETL. With the right tool set, we can unlock new ways of working which are far more effective, much more quickly, than waiting in line behind all of the other infrastructure projects, all using the skills our people already have.
So sometimes data takes the train to work and sometimes it takes the car. It really depends on how much flexibility is required in the data transit process. The good news is that a range of technologies are available to serve the variety of data management requirements, we just need to be very deliberate in how we apply different tools through the journey of building analytical applications.
While I’m certainly an advocate for adopting the nimble approach powered by Lavastorm’s technology, this isn’t to say that we abandon or replace ETL infrastructure altogether, just as we wouldn’t do away with our rail systems. There remains an essential place for ETL infrastructure in the enterprise application landscape. But we need to adopt complementary approaches which leverage data extract, transform, and load capabilities without the overhead and rigidity that comes along with “ETL” platforms.
To listen to the webinar recording click here.