Fraud Detection in Banking Data by Machine Learning Techniques

Arisetty suresh babu; Pilla Devi Prasanna

ABSTRACT As technology advanced and e-commerce services expanded, credit cards became one of the most popular payment methods, resulting in an increase in the volume of banking transactions. Furthermore, the significant increase in fraud requires high banking transaction costs. As a result, detecting fraudulent activities has become a fascinating topic. In this study, we consider the use of class weight-tuning hyperparameters to control the weight of fraudulent and legitimate transactions. We use Bayesian optimization in particular to optimize the hyperparameters while preserving practical issues such as unbalanced data. We propose weight-tuning as a pre-process for unbalanced data, as well as CatBoost and XGBoost to improve the performance of the LightGBM method by accounting for the voting mechanism. Finally, in order to improve performance even further, we use deep learning to fine-tune the hyperparameters, particularly our proposed weight-tuning one. We perform some experiments on real-world data to test the proposed methods. To better cover unbalanced datasets, we use recall-precision metrics in addition to the standard ROC-AUC. CatBoost, LightGBM, and XGBoost are evaluated separately using a 5-fold cross-validation method. Furthermore, the majority voting ensemble learning method is used to assess the performance of the combined algorithms. LightGBM and XGBoost achieve the best level criteria of ROC-AUC D 0.95, precision 0.79, recall 0.80, F1 score 0.79, and MCC 0.79, according to the results. By using deep learning and the Bayesian optimization method to tune the hyperparameters, we also meet the ROC-AUC D 0.94, precision D 0.80, recall D 0.82, F1 score D 0.81, and MCC D 0.81. This is a significant improvement over the cutting-edge methods we compared it to. EXISTING SYSTEM Halvaiee&Akbari study a newmodel called the AIS-based fraud detection model (AFDM). They use the Immune System Inspired Algorithm (AIRS) to improve fraud detection accuracy. The presented results of their paper show that their proposed AFDM improves accuracy by up to 25%, reduces costs by up to 85%, and reduces system response time by up to 40% compared to basic algorithms [11]. Bahnsen et al. developed a transaction aggregation strategy and created a new set of features based on the periodic behaviour analysis of the transaction time by using the von Mises distribution. In addition, they propose a newcost-based criterion for evaluating credit card fraud detection's models and then, using a real credit card dataset, examine how different feature sets affect results. More precisely, they extend the transaction aggregation strategy to create newoffers based on an analysis of the periodic behaviour of transactions [12]. Randhawa et al. study the application of machine learning algorithms to detect fraud in credit cards. They _rst use Naïve Bayes, stochastic forest and decision trees, neural networks, linear regression (LR), and logistic regression, as well as support vector machine standard models, to evaluate the available datasets. Further, they propose a hybrid method by applying AdaBoost and majority voting. In addition, they add noise to the data samples for robustness evaluation. They perform experiments on publicly available datasets and show that majority voting is effective in detecting credit card fraud cases [6]. Porwal and Mukund propose an approach that uses clustering methods to detect outliers in a large dataset and is resistant to changing patterns [13]. The idea behind their proposed approach is based on the assumption that the good behavior of users does not change over time and that the data points that represent good behaviour have a consistent spatial signature under different groupings. They show that fraudulent behaviours can be detected by identifying the changes in this data. They show that the area under the precision-recall curve is better than ROC as an evaluation criterion [13]. The authors in [14], propose a group learning framework based on partitioning and clustering of the training set. Their proposed framework has two goals: 1) to ensure the integrity of the sample features, and 2) to solve the high imbalance of the dataset. The main feature of their proposed framework is that every base estimator can be trained in parallel, which improves the effectiveness of their framework. Itoo et al. use three different ratios of datasets and an oversampling method to deal with the problem of data imbalance. Authors use three machine learning algorithms: logistic regression, Naive Bayes, and K-nearest neighbor. The performance of the algorithms is measured based on accuracy, sensitivity, specificity, precision, F1-score, and area under the curve. They show that the logistic regression-based model outperforms the other commonly used fraud detection algorithms in the paper [15]. The authors in [16] propose a framework that combines the potential of meta-learning ensemble techniques and a cost sensitive learning paradigm for fraud detection. They perform some evaluations, and the results obtained from classifying unseen data show that the cost-sensitive ensemble classifier has acceptable AUC value and is efficient as compared to the performances of ordinary ensemble classifiers. Altyeb et al. propose an intelligent approach for detecting fraud in credit card transactions [17]. Their proposed Bayesian-based hyperparameter optimization algorithm is used to tune the parameters of a LightGBM. They perform experiments on publicly available credit card transaction datasets. These datasets consist of fraudulent and legitimate transactions. Their evaluation results are reported in terms of accuracy, area under the receiver operating characteristic curve (ROC-AUC), precision, and F1-score metrics. Xiong et al. propose a learning-based approach to tackle the fraud detection problem. They use feature engineering techniques to boost the proposed model's performance. The model is trained and evaluated on the IEEE-CIS fraud dataset. Their experiments show that the model outperforms traditional machine-learning-based methods like Bayes and SVM on the used dataset [18].Viram et al. evaluate the performance of Naive Bayes and voting classifier algorithms. They demonstrate that in terms of evaluated metrics, particularly accuracy, the voting classifier outperforms the Naive Bayes algorithm [19]. Disadvantages The system never use a sequential model, which is a linear stack of layers to construct an artificial neural network model. Our model has a dense class, which is a very common layer and is often used. The system never implements Majority Voting model which leads less effective. Proposed System The system proposes an efficient approach for detecting credit card fraud that has been evaluated on publicly available datasets and has used optimized algorithms SVM and logistic regression individually, as well as majority voting combined methods, as well as deep learning and hyper parameter settings. An ideal fraud detection system should detect more fraudulent cases, and the precision of detecting fraudulent cases should be high, i.e., all results should be correctly detected, which will lead to the trust of customers in the bank, and on the other hand, the bank will not suffer losses due to incorrect detection. propose a group learning framework based on partitioning and clustering of the training set. Their proposed framework has two goals: 1) to ensure the integrity of the sample features, and 2) to solve the high imbalance of the dataset. The main feature of their proposed framework is that every base estimator can be trained in parallel, which improves the effectiveness of their framework. Advantages _ We adopt Bayesian optimization for fraud detection and propose to use the weight-tuning hyperparameter to solve the unbalanced data issue as a pre-process step. We also suggest using CatBoost and XGBoost alongside LightGBM to improve performance. We use the XGBoost algorithm due to the high speed of training in big data as well as the regularization term, which overcomes overfitting by measuring the complexity of the tree, and it does not require much time to set the hyper parameters. We also use the Catboost algorithm because there is no need to adjust hyper parameters for overfitting control, and it also obtains good results without changing hyper parameters compared to other machine learning algorithms. _ We propose a majority-voting ensemble learning approach to combine CatBoost, XGBoost, and Light- GBM and review the effect of the combined methods on the performance of fraud detection on real, unbalanced data.We also propose to use deep learning for adjusting and _ne-tuning the hyper parameters. _ To evaluate the performance of the proposed methods, we perform extensive experiments on real-world data. To better cover the unbalanced datasets, we use recall precision in addition to the typically used ROC-AUC. We also evaluate the performance using F1_score and MCC metrics. According to the results, the proposed methods outperform the existing and based methods. For evaluations, we use publicly available datasets and also publish the source codes 1 with public access to be used by other researchers. SYSTEM REQUIREMENTS ➢ H/W System Configuration:- ➢ Processor - Pentium –IV ➢ RAM - 4 GB (min) ➢ Hard Disk - 20 GB ➢ Key Board - Standard Windows Keyboard ➢ Mouse - Two or Three Button Mouse ➢ Monitor - SVGA SOFTWARE REQUIREMENTS: Operating system : Windows 7 Ultimate. Coding Language : Python. Front-End : Python. Back-End : Django-ORM Designing : Html, css, javascript. Data Base : MySQL (WAMP Server).

Contact Us
Click Here

WhatsApp Contact
Click Here

Published in:

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

Unique Identifier

Page Number

Post-Publication

Share This Article

Important Links:

Jetir RMS

Title

Authors

Abstract

Key Words

Cite This Article

ISSN

Cite This Article

Publication Details

Download Paper / Preview Article

Download Paper

Preview This Article

Download PDF

Downloads

Print This Page

Impact Factor:

7.95

Impact Factor Calculation click here

Impact Factor:

7.95

Impact Factor Calculation click here

Current Call For Paper

Call for Paper
Cilck Here For More Info

Important Links:

Jetir RMS

Contact Us Click Here

WhatsApp Contact Click Here

Published in:

UGC and ISSN approved 7.95 impact factor UGC Approved Journal no 63975

Unique Identifier

Page Number

Post-Publication

Share This Article

Important Links:

Jetir RMS

Title

Authors

Abstract

Key Words

Cite This Article

ISSN

Cite This Article

Publication Details

Download Paper / Preview Article

Download Paper

Preview This Article

Download PDF

Downloads

Print This Page

Impact Factor: 7.95 Impact Factor Calculation click here

Impact Factor:

7.95

Impact Factor Calculation click here

Current Call For Paper

Call for Paper Cilck Here For More Info

Important Links:

Jetir RMS

Contact Us
Click Here

WhatsApp Contact
Click Here

Impact Factor:

7.95

Impact Factor Calculation click here

Call for Paper
Cilck Here For More Info