Most of these links are in the format "domain.com/feed" or "domain.com/category/category-name/feed"
I'd need 3 things:
- find all categories on each website and get the feed urls for each individual category
- get that publishing date of the most recent item in the feed
- save the url, category name and date to a structured file (e.g CSV) for later processing
For someone familiar with scrapy or a similar tool this should be a straight forward task. Please start your reply with the word bigkahuna so that I know you read the description.
Feel free to ask any questions. Will also share some sample URLs if you like. Thanks!
Posted On: August 14, 2021 12:37 UTC Category: Data Extraction Skills:Web Scraper, Data Scraping, Scrapy, Python, Scraper, Data Extraction, Data Mining, Web Crawler, JavaScript
Skills: Web Scraper, Data Scraping, Scrapy, Python, Scraper, Data Extraction, Data Mining, Web Crawler, JavaScript Country: Germany
click to apply
Project ID:
3197884
Project category:
Web Scraper, Data Scraping, Scrapy, Python, Scraper, Data Extraction, Data Mining, Web Crawler, JavaScript