Machine learning: from skepticism to application in fraud prevention

The following blog is a transcription of the presentation done by Jérôme Kehrli, CTO of NetGuardians during the Innovation Leaders 2018 evening. You can also see the full video below.

Jérôme Kehrli about machine learning applied in fraud prevention

Hello, everybody! My name is Jérôme Kehrli, and tonight I want to tell you about the way we use machine learning for fraud prevention in banking institutions at NetGuardians. I don’t have the pretention to give you a complete, global and absolute overview on the topic. On the contrary, I would want to present you our perspective at NetGuardians on the problems we encounter at our customers’ sides and the way we could solve them using artificial intelligence.

My name is Jérôme Kehrli, and I’m in this wonderful business of software engineering and data analytics for 18 years, still a big passion. So maybe before I start a few words aboutNetGuardians.

NetGuardians is a Swiss software editor that has been founded in 2008. After a pretty long incubation period, it really started to develop in 2012. It’s been founded by two former students of the engineering school in Yverdon, and this is maybe the reason why it is still based in Yverdon. We develop a big data analytics platform that we deployed in banking institutions for one big key concern which is fraud prevention on a large scale. By fraud prevention, I mean internal fraud as well as external fraud. Internal fraud is when banking employees withdraw money from their employers. And external fraud is as much ebanking fraud, credit card fraud all the other attacks cybercriminals can imagine.

Today we have around 60 employees, 50 customers, we double our sales turnover every year for the past three years. And we acquire a dozen of new customers. And we hope we continue doing that this year. And I myself am the CTO for 3.5 years now.

 

So, let’s get started! A little bit of history first

Before 2000, banking fraud detection relies mostly on manual controls: internal control, internal audits or audits performed by external auditors. With all the usual issues with this approach: internal control and auditors work by using sampling which means a lot of frauds pass through the cracks. Of course, there are a few security checks implemented here and there in the operational information system or some business intelligence reports targeted at detecting fraud. But all in all, it’s not considered a very big deal.

And you know beginning 2000, we are before the subprime crisis, we are before the South European countries debt crisis, margins are important, and people trust banking institutions. All in all, bankers are rather happy people. Banking fraud exists, of course, but it is a deal but not such a big deal.

In the late 2000s, the cost of fraud, the maturity of the attackers and the complexity of the attacks increased significantly. Banking institutions react by deploying quite massively specific analytics systems aimed at detected and preventing fraud. All these systems are at the times rules engines, and most of them come from the Anti-Money Laundering world, not even specifically developed for fraud prevention.

Netguardians was founded at this time and NetGuardians‘ NG|screener platform was a sort of gigantic rule engine. By rule, I give you an example of what I mean at the bottom of the slide. Of course, there have been a few papers published in the early 2000s about how artificial intelligence and machine learning can be interesting to detect fraud cases, but the bankers and the engineers are not concerning that seriously. They do not want to consider an approach, whom the interpretation of results was considered fuzzy and blurry. Let’s just say that artificial intelligence at that time was just generating a lot of scepticism in banking institutions.

But the reality of fraud in financial institutions evolved dramatically.

Let me give you two examples: the first one is the Bangladesh bank heist, pretty funny story. In February 2016 a group of attackers successfully compromised the banking information system of the Central Bank of Bangladesh. They successfully attacked the specific gateway used by the bank to reach the SWIFT network, and they use it to issue a set of financial transactions on the SWIFT network, transactions that aimed at withdrawing money from the Bangladesh Central bank VOSTRO account by the US Federal Reserve Bank. They successfully send 81 million dollars to the Philippines where they have been laundered through the Philippine casinos. After the facts, all the responsible of the financial institutions involved: the Central Bank of Bangladesh, The US Federal Reserve, even the Prime Minister of Philippines were all convinced that they would recover this money and find the cybercriminals. Two years after, we know that we will never recover the money. We know that the attackers are untraceable, and they will never be found. 81 million dollars for a team that we believe to be 15 to 20 persons.

But of course, this is Bangladesh, right? And here we are in Europe. Even better here we are in Switzerland. And let’s say the army of massive security holes of the Bangladesh information system does not concern us. Right? Let me give you another example: The Retefe worm.

The Retefe worm is a worm developed for four years by a group of cybercriminals targeting specifically the e-banking applications of small to medium size Austrian and Swiss financial institutions. It is four years old, and for four years the cybercriminals keep evolving and maintaining it to counter anti-viruses and specific securities banking institutions put in place. It is four years old, and even today it successfully steals money from between 10 and 19 banking sessions every day. And this is today in Switzerland.

This is the reality to which banking institutions are confronted nowadays.

Some numbers: the big one: 81 million from the Bangladesh case, we estimate that in 2017  the total cost of fraud on a large scale has been 3 000 billion dollars worldwide. Even further cybersecurity ventures estimate that by 2021 the total cost of cybercrime will be 6 000 billion dollars. And of course, a lot of it is internal fraud, right? And here in Switzerland the maturity of the baking business but also the security put in place within the banking information systems make internal fraud more marginal. But external fraud is a cruel reality, think about the Retefe worm.

The consequence of that is that traditional systems deployed in banking institutions to prevent fraud, rule engines are beaten. Let me try to explain to you why with an example. Let’s take the example of one customer such as myself who is using his banking account to pay his mortgage at the end of the month, his telephone bill, his taxes, etc. If suddenly a transaction is input on the system that attempts to withdraw 20 000 CHF from my account and send it to Nigeria, it is an anomaly. It should be blocked by the system. This should not happen.

Now let’s take another example, the example of a customer who is responsible for acquisitions for a big corporation. The guy travels all around the world and pays providers with the corporate account in big amounts. In this case, it is, on the contrary, a small

transaction withdrawing money for a counterparty in Switzerland that should be blocked. That’s the anomaly, right?

If you want to represent all the different situations of all the banking customers with rules, you end up defining hundreds of thousands of rules in your system, which is impossible. So only the most common rules can be defined. As a consequence, a lot of frauds pass through the cracks. And you know, to catch the bigger fraud you have to define limits in these rules that are sufficiently low to catch them, resulting in a huge number of alerts to be generated by the system requiring an army of analysts by the banking institutions to analyse them. And all of that has huge financial impacts on the banks. Fraud is dead money, and the analysts should be able to focus on tasks with more added value. And I don’t need to explain what consequence had the Bangladesh case regarding reputation on the Bangladesh Central Bank.

Something else is required to protect financial institutions.

Artificial intelligence and machine learning provide the solution. In 2016 we started to deploy our first machine learning algorithm approaches within our system at NetGuardians. Our initial idea was to use the machine to analyse a very deep history of data, to analyse a very deep history of transactions, to learn about habits and behaviour of individuals, customers or employees, and build dynamic profiles capturing these habits and behaviours. Then each and every transaction, should it be an internal transaction, a credit card transaction, a banking transaction, etc. input in the system will be compared with the profile of the customer of the user, and the machine will compute a global risk score and will take a decision based on this risk score: either let the transaction pass through, either block the transaction and register it for further investigations by an analyst.

With this technic, using artificial intelligence and profiling to detect suspicious transactions, we have been able to significantly improve the situation of our customers. It was a game-changing paradigm. We could reduce drastically the number of false cases passing through and that by still reducing to 1/3 the number of cases to be investigated by analysts. Not only we could reduce the number of cases to be investigated, but we could reduce the required time to investigate one case by 80%. And finally, the number of reconfirmation asked to the bank customers could be reduced to 1/4 of what it was before. And this has obvious benefits for the financial institutions: operational efficiency, financial gains, reputation and so on.

And then we figured we still had an issue. Let me give you an example: If tomorrow I buy a new car: an Audi. That is a 60 000 CHF transaction that leaves my account to a new beneficiary – Amag Audi Switzerland – that I have never used before. The machine will qualify it an anomaly since I never used such a counterparty before and I never had such a big transaction on my account.

So, my money will be blocked, and I would be annoyed.

So, we figure that sometimes, it’s required to broaden the view of the artificial intelligence. And the idea we had in 2017 was the following: using clustering technics to group together individuals having the same behaviours regarding transactions. Building so-called peer groups and maintaining of course with big data technologies these peer groups up to date in real time. And if you look at the Audi example, of course, the machine will think looking at my profile that it is an anomaly.  But if the machine looks at the 200 or 300 customers that have had the same kind of behaviours as myself, it would find out that people buy new Audi every day. It is not an anomaly. The machine will find out that is a legitimate transaction and release it.

The big impact of that new approach has been that it has helped us to reduce significantly the number of cases to be investigated. These cases we call false positive, wrong alerts that still need to be generated if you want to catch the true alerts, the true positive.

And then we figured: profiling and clustering work cool but can we do something better. Because we still have a problem here: that we work after the facts. By analysing financial transactions, we need to have the transaction input on the system, so we can qualify it.

So, our idea was: can we analyse each and every little trace of interaction between the individuals, customers and employees, with the banking information system and qualify them as legitimate or potentially fraudulent, even before a transaction is inputted on the system?
And this has required significantly different analysis technics. Let me give you an example: for instance, with e-banking application.

Imagine the situation of a legitimate user such as myself login the e-banking platform, checking the account balance imputing the first payment, validating it, second payment, validate it, third payment, validate it. Then I check my pending orders to make sure I don’t forget anything, and if I’m fine, I quit the application.

Now imagine that a worm successfully hijacks the e-banking session. The worm will have a completely different behaviour. The worm will likely go very fast from login to payment input to payment validation and log out. And here I am only focusing on transitions, but think of what if we consider user thinking time in front of the screen? What if we consider the speed at which the user types on the keyboard?

And our idea has been to let the machine build a probabilistic model of “path-to-action”, using probabilistic learning and discover what is the usual path reaching a specific action or a specific interaction of the banking employee or customer.

With this new approach, we are able to detect fraud before it happens. We are able for instance to qualify an e-banking session potentially fraudulent before a transaction is inputted into the system. And this way we protect, the privacy and the data of banking institutions customers.

My conclusion on all of this would be as follows. Today the reality at our customer is the following: artificial intelligence monitors each and every little trace of interaction between the employees and customers and the banking information system, in addition to financial transactions, to secure banks and customers.

This is what AI does for us.

A little note that I cannot retrain myself from saying is about science-fiction vs reality. Today science-fiction is way in advance when it comes to artificial intelligence with reality. And you know, because of maybe of Elon Musk or Hawkins or Holywood, in its imagination, the public has the fantasy that AI represents a supercomputer that will want to rule the world.

So, let’s clarify something.

If we qualify a weak artificial intelligence, an intelligence able to optimize a mathematical functions or find a solution for a problem in a very strict and given context or give an answer to a question in a very strict and given context, then we will call a strong artificial intelligence an intelligence able to contextualise, an intelligence able to show sensitivity or emotions.

If the progress in the world of weak artificial intelligence is today very impressive, amazing and tremendous, let’s agree that strong artificial intelligence is really science-fiction. We don’t have today the littlest trail of proof that one day we will be able to build a strong artificial intelligence. This does not mean that artificial intelligence is not interesting “au contraire”, applications are amazing.

 

Let me give one perspective on that. You know if you consider the chess game, for instance, artificial intelligence has been beating chess masters for quite a long time now. But today it is what we call the Centors, sometimes average chess payers, amateurs payers, augmented with artificial intelligence: half-man, half-machine who win all freestyles parties on the internet. Today this technology gives the best results not when they replace the human decision process but when they support it. This is called augmented intelligence. And the augmented intelligence is precisely what we do at NetGuardians by providing the bankers with means to detect fraud and prevent fraud more efficiently.

Thank you very much for listening.