How Banks Can Get Started With Machine Learning

“Water, water everywhere. But not a drop to drink,” is one of the more famous lines of poetry from Samuel Coleridge about a dehydrated sailor adrift at sea; but it could just be describing the data challenge faced by banks.

There is no shortage of bank data, in fact the likes of Oracle and Accenture have built whole divisions in helping build and manage data. The challenge is that accessing and using this data is a minefield for banks due to internal organizational and regulatory compliance constraints, which makes taking advantage of new tools like machine learning all that more challenging.

Therefore, what are the approaches that other banks have used to navigate around this and what best practices can be applied to take advantage of data?

1. Standardize data

One of the primary challenges banks face is that data has piled up in different systems, making the concept of a ‘universal’ customer record something of a joke. Bank consolidation is one of the primary culprits here, though organization structure and under-investment in back office systems during the last 8 years, have had a significant impact.

To address this, banks should start on a small data set and build up to create a universal customer record over-time. The data management process can be divided into three components – Database Management System, Data Modeling and ETL (Extract, Transform, Load). Each step requires specialized tools such as SQL Server, Teradata, and Informatica, though there are some new companies like Paxata and Trifacta support normalizing data specifically for machine learning tools.

At this step, it is also recommended banks hire or retrain individuals in their own team to better ‘own’ the process and start to develop institutional knowledge around the data – here is a link to a Data Analytics bookcamp course provided by UC Berkeley that is a good place to begin for any retraining program.

2. Take advantage of off-the-shelf tools

All the large data companies are competing to get developers to use their machine learning tools and marketplaces, which is great news for banks getting started in the space as costs are negligible and they each include a fair amount of functionality to get started. At a high-level, the data science tools from IBM, Microsoft, Google and Amazon are similar, and therefore it really goes down to the use case (and data security policies) of individual banks. We have taken IBM, Amazon and Google for test drives, and ended up choosing Google for our front (Google Studio) and backend (Google BigQuery). There are also new startups building tools to help with discrete machine learning tasks (e.g. Heat) and tools to help analyze the data (e.g. Microstrategy, Qlik, Tableau).

It is recommended that banks start with these off-the-shelf solutions to deliver on some quick wins to get institutional support for any new program. Once you get buy-in, there is plenty of scope for future projects that include building machine learning algorithms and moving towards unsupervised learning!

3. Test and Learn

It is easy to think the heavy-lifting has been completed at this stage, but in reality the hardest part is yet to come. Now the data is starting to provide results, there needs to be A/B testing to verify accuracy and get to a high confidence level. For example, if the data is used to extrapolate trends and start to predict behaviors, a data scientist should validate each of the variables in the regression analysis and test the predicted outcomes against a known set of data. Example companies that can help in this stage include RapidMiner and Feature Labs.

4. Begin with an internal product first

Determining the initial use case of the machine learning product for banks can be tricky. There is a lot of interest in bots to help with customer-facing operations, since there are direct cost savings from cutting back on customer support as well as tapping into new markets. But, there are also plenty of issues around consumer behavior and adoption that may derail the practical benefits. For example, Microsoft’s machine learning-powered bot called Tay, which was responding to tweets and chats was turned-off due to its inability to recognize when it was making offensive or racist statements.

Starting with an internal-focused project for a bank is arguably a better approach. There are plenty of projects which could have immediate revenue with less risk, including churn analysis and customer acquisition. For example, at acuteIQ, we are working with the data analytics teams of banks to use their current customer data to predict new business customers for SBA 7a loans. Results from mining this data has resulted in a 2x increase in SBA loan applicants.

Whether a bank chooses to start with an internal or external project, the pace of change requires that banks start to think more about how to unlock the value of their data. In a survey of 424 senior executives from financial institutions and fintech companies, Euromoney found the majority of respondents thought that risk assessment, financial analysis and portfolio management roles would be the first impacted by the introduction of AI/machine learning over the next three years. Yet, these predictions are already starting to come true, with banks such as Goldman Sachs, seeing revenue decreasing in areas like equity underwriting and stock trading, and new tools that using machine learning like Robinhood (offering commission free stock trading) has seeing soaring usage and a new entry into the Hall of Unicorns (companies with billion dollar valuations).

If you’re a startup interested in joining our world-class Fintech accelerator, apply here. If you’re a corporation or financial institution seeking innovation, join us here.

If you liked what you read, please share it with friends.