Towards a cleaner Banking environment

Date:   Tuesday , August 04, 2009

Introduction
It would not be an exaggeration to say that the sector in which change has been most visible in past few years is banking. The new paradigm of doing business changed the rules of game not only for what is commonly known as new economy but also for age old established business. The driver for this change has in most cases been the regulatory requirements and the advancements in technology making it more and more capable of helping banks in meeting these requirements.

Data Quality is one of the most important aspects of a Bank’s Data Management efforts and one of the most challenging too. The high volume of data coupled with complex nature in a Banking ecosystem is thanks to modern technology driven entities that generate massive amount of data through their multiple channels of businesses. With the advent of ERP, CRM and SCM systems to support line operations of organizations, data generation has reached mind boggling levels in recent years.

Banking data is characterized by huge volume of business transactions, maintenance of history records and the need to have multiple, disparate applications for various banking functions. The stringent statutory requirements in Banking, present vastly different needs for data management than any other industry vertical.

Sources of major challenges faced by the banking industry from the point of view of data management can be categorized under following heads:
* Data quality
* Legacy applications
* Volume of data
* Statutory Requirements
* Security

The author, with his extensive experience and background in banking domain, Data warehousing and Business Intelligence, proposes to draw on his knowledge and suggest solutions to the data management problems faced by the banking industry with respect to data quality.

The paper would help in understanding the problems faced by the banking industry in particular and will help Data warehousing consultants suggest specific and focused solutions.

Data Quality Management in Banking
Quality of data as a challenge to better IT management is not unique to banks. However when looked in conjunction with other factors e.g. Legacy applications, real time usage, volume of data and not the least geographic spread of operations the severity of challenge becomes very critical.

What is Data Quality?
Data Quality can be defined as fitness or suitability of data to meet business requirements. The various attributes that collectively characterize quality of data are:
* Accuracy
* Integrity
* Consistency
* Completeness
* Validity
* Timeliness
* Accessibility

Data need not be perfect, but simply needs to meet the requirements of applications that use it.

Why is it required?
Unreliable data is the prime reason why Data Quality is required coupled with data entry related errors, duplicate records and missing or incorrect data values.

What does it involve?
Data Quality involves:
a. Fixing of defective data elements, records and existing incorrect values (e.g. fix a misspelling).
b. Modifying data to correct value to conform to corporate standard (e.g. Substitute “Mr.” for “Mister”), replacing a missing value, match and merge of duplicate records that exist in same file.
c. Filtering of data, which involves deleting duplicate, missing or nonsensical data

Which is the best place to clean data?
The best place to clean data is in the source application system.

Other places to clean data are a) staging area, b) during the Extract Transform and Load (ETL) process and c) in the Data Warehouse itself.

Criticality of Data Quality in Banking
The various reasons why data quality is so critical in Banking:
* Increased use of business intelligence systems across companies for various Decision Support Systems
* Since Banks have to meet stringent regulatory requirements defined by the Central Bank, the reports have to be accurate and this is possible only if data is clean
* Ensuring clean Customer data and maintaining a single view of the Customer across databases is a big challenge facing the Banks. Banks incur very high cost of erroneous mailings especially in the Unites States of America and the resultant effect of missing targeted customers
* Rapid growth of eCommerce, which has resulted in multiple entry points
* Banks are expected to collect historical data to comply with Basel II requirement. The historical data needs to be clean to meet the highest data standards prescribed by Basel II accord.
* Focus on complying with regulation such as Sarbanes Oxley, has also forced Banks to come up against the problem of finding and interpreting the required data
* Banks are increasingly deploying Risk management applications for tracking, monitoring and analysis for operational and other risks. Effective analysis would depend on how clean the data is.

Other benefits of Data Quality in Banking
The Management’s confidence in the various MIS reports generated from the Banking decision support system would increase multi folds, if a proper and efficient data quality program is in place.

The time spent by the Accounts department in reconciling data generated from multiple systems would be reduced drastically. The main reason why data reconciliation is required is only because of duplicate records and unclean data.

With clean customer data, the Bank can effectively perform cross selling activity and increase profitability.

Indirectly, as discussed elsewhere, Banks would reduce cost on communication. A single customer may receive same mail more than once if proper customer house holding is not done and also the probability of missing the attention of a Customer if his address details are not clean in the system.

Sources of unclean data in Banking
The sources of unclean data in banking applications are due to:
* Integration with multiple databases across wide geographies across disparate systems
* Mergers and acquisitions happening in the Banking industry have resulted in integrating multiple systems and databases and chances of duplication of records
* Use of paper documents to input data into the system, which may result in wrong data entering the system both from the writer’s as well as data entry operator’s perspective
* Slowly changing dimension e.g. change in the name profile of a Customer post marriage, change of address etc
* Using home grown data cleansing tools, which may meet only defined data quality problems but not those that may creep into the system due to dynamic banking environment
* Due to widespread use of Internet banking technology, the Banking systems are prone to direct usage by Customers for entering their personal details and other information. Purposeful entry of incorrect data by Customers affects data quality.
* The operators at call centers may enter incomplete data to save on time
* Data entering the core banking application from various sources affect data quality due to inconsistent data format or data errors.
* If Banks rely on third party for data capturing, the errors in their data files would result in bad data entering the Bank’s system

Data Quality and Decision Support Systems
With the advent of business intelligence tools specific to Banking industry, Banks are investing heavily into this new technology for decision support. The pre-requisite for deploying such tools is data warehousing. Data warehousing is implemented since Banks deploy multiple applications around its ecosystem and hence the need for single version of truth.

Unclean data in any of the source applications or systems would affect the warehouse and thereby the analytical reports. The millions spent on data warehousing and business intelligence tools would go down the drain if it is not able to give reliable output.

Decisions based on such output could mar the prospects of the bank especially in the areas of Risk management and Asset Liability management, which often decides the course of action Bank may take

Investment on clean data campaign would certainly benefit the Bank and also justify the implementation of data warehousing and decision support systems. The money spent on such a campaign is miniscule compared to the benefit it would derive in the long run.

Hence today most of the Banks are forced to invest in data quality tools along with the data warehousing initiative to derive good mileage

If Banks have to survive in the present competitive world where retaining existing Customers and acquiring new ones is paramount to its sustenance and growth, it is but imperative that they have to depend heavily on advanced analytics, business intelligence and CRM initiatives.

To achieve rapid return on these decision support systems, data quality should be planned as a proactive initiative rather than a cause and effect event.

Banks may spend more on ensuring operational efficiency than on data warehousing and CRM initiatives if a defined data quality program is not in its agenda.

Data Quality program in Banking
The Bank’s IT team has to draw up a data quality plan within its overall data management program. It is never too late for the Bank if such an initiative does not exist, as delay could only result in serious operational problems for the Bank.

Since Banks collect data from multitude of sources, which are not standard or inconsistent, certainly requires a strong data quality process.

The Bank must answer the following basic questions before undertaking a data quality program:
1. Is data quality to be ensured at source or after the data is extracted into the data warehouse from multiple and disparate sources?
2. How often should data be checked for quality?
3. Which are the applications that require stricter adherence to data quality than others? This is only to prioritize the applications as bringing all of them together could hamper the progress of the initiative
4. Which are the target systems or applications which require absolutely clean data?
5. Which are the systems or applications that contain high volume of data compared to others?

Data quality initiative should never be undertaken as an after effect of an unprofitable and bad marketing campaign or after losing customers.

Banks also depend heavily on third party sources for data like address data, customer information data for their credit worthiness, market quotes for treasury applications etc. It is imperative that all data may not be clean and would require cleansing before it enters the Bank’s database.

Steps for a successful Data Quality program in Banking
Bank should first conduct an audit of its database with help of external IT consultants to know the extent of data quality.

Data quality program should be synchronous with any planned data management program like data warehousing, data mining etc.

A team of business analysts from the IT team together with the Subject Matter Experts or SMEs should be formed for the Discovery phase. The SMEs should be drawn out from various application/domain areas, which would be covered in the data quality initiative.

Before a Data quality project is really initiated it is suggested that Bank follow these steps in partnership with its technology consultants.

The author of the article is Arvind Kamath, a Business Intelligence Practice Lead with EDS, HP.