ETL Technology is Central to the Big Data World

It allows organisations to efficiently manage multiple operations with limited resources and improve customer data analytic

By IBM
Updated: Feb 14, 2014 03:21:12 PM UTC

IBM has always been a company in a state of constant renewal and reinvention. Through economic upheavals and natural disasters, tech bubbles and recessions, it continues to engage clients, governments, local communities and universities to improve how the world works. Differentiated by values, strengthened by collaboration and experienced through the IBMer, today their solutions in Cloud, Big Data, Analytics, Mobility, Predictive Intelligence and others are making the world smarter. Through its Blog posts, IBMers will explore some essential areas of business and life that are deeply interlinked with technology and would like to invite all to share experiences and comments as it continues on this journey of discovery and innovation.

How many of us have been levied a late payment charge on our credit card and ended up haggling with customer service to get it waived off?  I am sure many of us.

I happened to pride myself for being punctual and always paying on time. That was until the inevitable happened. I was once travelling overseas, when I realised that my credit card payment was due in a day. I could still pay online. I went to the bank’s site but failed to remember my net banking password. And in order to reset the password, the bank wanted me to visit its nearest branch which was 9,000 miles away then. I informed the bank of my dilemma, but received a canned reply via mail.

Cut to the present – the age of Big Data. I was narrating my aforementioned experience to the CIO of a leading bank. He asked me how Big Data could help him solve such a problem and uncover more issues so as to improve customer experience. There is a huge amount of valuable information captured in customer emails and phone calls that a bank receives on a daily basis – a classic example of Big Data. Today, we have all the tools to extract insights from the available information.

However, the CIO mentioned three requirements:

(1)    The bank wants to do deeper analytics for their high-value customers only

(2)    It wants to bring down training costs involving people who execute Big Data projects.

(3)    It wants to reuse existing business processes (for marketing campaigns for instance) and not design new ones.

These points resonate across industries that can leverage Big Data. One way to achieve all this is through the Extract-Transform-Load (ETL) technology that they have been investing in over the years.  ETL technology is used to integrate data from multiple sources and combines three database functions into one tool: to pull data out of one database/source, transform it and place it into another database/target.

Requirement 1
A customer is unlikely to know that he/she is a high value customer and hence is unlikely to mention this in email communication with the bank. The information about high-value customers is present in the data warehouse from where it needs to be extracted and loaded into the Big Data environment– something that can be easily done using ETL tools.

Once the high-value customer information is available, email and call analytics can be restricted to only those customers in the high value customer list. The bank implemented such an approach and one of the biggest pain points turned out to be the inability of high-value customers was the inability to reset net banking password online!

Requirement 2
Fortunately, ETL tools have evolved of late and can easily reduce training costs for banks. ETL job developers who design data transformation workflows can use tools and translate their workflows to Hadoop jobs.  Hadoop is an open source framework that is used extensively to process BigData using MapReduce (which is another open source technology that helps to process large amounts of data on Hadoop). However, finding resources skilled in Hadoop and MapReduce can be challenging.

If an ETL developer has to find the IP addresses that have made more than a million requests on the bank’s website, he needs to write a MapReduce job which processes the web-log data stored in Hadoop.  However, with the advancement in ETL technology, a job developer can use the standard ETL design tools to create an ETL job which can read data from multiple sources in Hadoop (Files, Hive, HBase), join, aggregate and filter them to find an answer to the query on IP addresses.

ETL tools have added the capability to “translate” an ETL job to a MapReduce job using Jaql technology. Thus, the ETL job is rewritten as a JAQL query which gets executed as a MapReduce job on Hadoop.  This is a key innovation which helps to reduce entry barriers in Big Data technology and allows ETL job developers to carry out Bug Data analytics.

Requirement 3
Banks have already invested heavily in technology for running marketing campaigns. Their systems use the data warehouse as the source of information. However, the output of Big Data analytics resides in Hadoop. This is where ETL tools play a crucial role—they help take the output of the analytics and move it to the data warehouse so that it can be acted upon by the existing business processes of the enterprise.

An example: In case a bank wants to identify customers who have been unhappy in their last three interactions (with the bank) via email or phone, it could be done using analytics on Hadoop. The output of the analytics would be a simple list of customer names. ETL tools can help read this data from Hadoop and update the data warehouse. This would also enable the customer retention marketing campaign to act upon this additional insight without requiring any change in the campaign management tools, thereby saving significant costs for the enterprise.

In summary, ETL tools are far from being passé. They are central to the Big Data ecosystem and play a crucial role in enabling data analytics.

 - By Manish Bhide, PhD, is an STSM for InfoSphere Information Server at India Software Labs, IBM India

 Disclaimer: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.”

Post Your Comment
Required
Required, will not be published
All comments are moderated