Thursday, January 7, 2010

Hexaware's Data Washing Machine

Today's Eloqua Artisan post is a a second guest post from colleague and friend, Amit Varshneya. In this post, he looks at a topic we've touched on very generally - the contact data washing machine - and shows specifically how he has configured it for his company, Hexaware. As part of his contact washing machine, he also uses a third party service provider to perform some manual steps, which adds an interesting dimension to the process.

Amit is VP, Marketing at Hexaware Technologies ( In this role he oversees the company's marketing initiatives globally to create a preference and demand for Hexaware's service offerings. Amit is the driver and evangelist of the sales marketing funnel measurement process at Hexaware and is a passionate champion of Eloqua.


Aah! The unmistakably fresh scent of neatly scrubbed data!

It’s an accepted fact that data management is an important requirement to be able to run proper segmenting and targeting for marketing campaigns - unclean data reduces the effectiveness of marketing campaigns. Recognizing this, and taking cue from Steve Woods’ concept of the Data Washing Machine, the marketing organization at Hexaware took a comprehensive approach to cleaning our data and then keeping it clean. And what a difference that has made! We have improved:

• Our ability to reach prospects with highly targeted campaigns
• Our time to put together these customized campaigns
• Our relationship with our sales teams

What is clean data? Chris Petko explains this very well in his vlog post with the 3C framework – clean data needs to be Consistent, Complete, Correct.

The Hexaware Data Washing Machine is a mix of automated and manual steps (well, you do need to give the cuffs and collars some extra attention!) that ensures that our data is Consistent, Complete and Correct. Let me share briefly how we did this:

  • Identified a list of required mandatory fields – we laid a scope for ourselves; what fields do we absolutely need for good segmenting? These fields need to be filled in; no matter what the source of data – form submittals, list uploads, CRM integration - these fields need to be complete. Once this is done, it becomes a lot easier to measure “completeness”.

  • Decided on a standard list of values for some of these fields - Industry, Country, Salutation, lead source etc. These fields can have only one of the standard values. We then published these. (When Marketing Managers request segment lists, campaigns or reports they can check off the published values they need). Consistency

  • Data Templates with these required mandatory fields and standardized values were also defined and published – this helped guide all incoming data uploads, as well as form creation and CRM integration activities. Consistency.

  • We then put in place an automated Program that helps us manage completeness and correctness of data. This program runs on the 4th of every month and does the following:

    • Identifies data modified or added in the last month

    • Isolates into a bucket all incomplete data. This was a more important step for the very first time; now this step helps us identify any faulty imports and take corrective action.

    • Isolates competitors and ISP emails (we only correspond with corporate email addresses). This helps us keep our data fresh and relevant. These records are deleted from the database.

    • Isolates bouncebacks. These records are deleted after a check for obvious typos.

    • Isolates unsubscriptions. No action is taken on these records – however this is reported to indicate overall health of the database.

  • Once this program successfully runs, these buckets are handed over to our dedicated data desk. The data desk is manned by trained data experts and has been setup in our offshore BPO subsidiary – CaliberPoint (CaliberPoint specializes in data management processes and being an India based offshore setup, affords us significant cost savings). This data desk does the following:

    • Scans through records for any data consistency issues (like where the data is all Uppercase, or where conversational name of company is not being used. This can be a substantial challenge for personalization)

    • Scans through email bounceback records for typos (eg. Any email addresses that have a comma (“,”) etc)

    • Scans and identifies any bad/dummy data (Mickey Mouse records, asdfs, abcs, etc)

    • Scans through any incomplete records (as mentioned earlier the numbers in this bucket have progressively reduced in every run as expected)

This washing machine has been in place the last 5 months and in that time has tremendously improved our segmenting and targeting capability and effectiveness. We’re still learning and making adjustments to it along the way. If you have any suggestions, I would love to hear them. Thanks!