After discussions with a variety of developers, I decided to change some of the specifications for this project for greater clarity and simplicity.
IN BRIEF:
I have a webhosting vps account over at webhostpython com
I have a couple of Wordpress-based book websites on that hosting account.
There is a book deals website, which lists some great book deals every day
I need a scraper that will visit that book site, grab its list of books, then visit each book's Amazon page, and scrape each book's details
The scraper will then submit the details for each book to my 2 book sites, as individual posts
.
I will create an account on my vps with its own MySQL dbs / etc (whatever is required) to host the scraper.
I don't mind what platform is used to build the scraper/submitter (php, python, etc) so long as it scrapes and submits.
SUBMISSION STRATEGIES:
In previous discussions, the 'submission' element of this project seemed to raise unexpected difficulties for scraping specialists. Wordpress seemed to be the main difficulty. And submitting directly into Wordpress seemed to be the main problem for non-Wordpress specialists.
To make this easier (perhaps) I decided to suggest an alternative option:
I have the Gravity Forms plugin on both my sites, for Authors to manually submit their book details. Gravity Forms is a sophisticated forms solution.
My current 'idea' is to create a new 'hidden' form which would only be accessible by this scraper / submitter, so the submitter could send the book info directly to the form, so the Gravity Forms plugin can manage the Wordpress side of things.
This maybe a slightly less 'elegant' solution than accessing WP directly, but I think it potentially simplifies the job substantially -- unless you have a better idea!
IN DETAIL:
:
I have a couple of Wordpress-based book websites on which I publish 30-50 books every day.
I find a lot of the books on one particular site which lists new books every day.
So, it would save me time if I had an automation which scraped this one site for new books each day, then submitted them to my sites.
Project in summary:
I want a script/scraper which can visit this site, collect a 'list' of all the books listed each day, then visit Amazon for each book, and scrape each book's details, and submit them to my sites.
(I will not post the name of the website I want to scrape but I can send it to you if you are interested.)
Project Details:
It is very important to understand these elements of this scraper project:
#1 The site I want to scrape does not have an RSS feed but updates daily.
= therefore, when the scraper/script visits the site it must make a record of ALL books scraped so it will ONLY scrape new books (today's books) each day.
#2 The site is only a 'listing' of books, with links to book pages on Amazon, so the scraper must go to the site, collect links to Amazon, then visit those links and scrape each Amazon page.
Additional info:
I can provide some additional precise details to the person who creates this. I have worked with a few developers for this kind of scraper / submitter, so I already know most of the details / issues which relate to it so this should be a fairly simple job for someone who knows what they are doing.
Future Projects:
There are also another 3-5 sites I want to scrape / submit from in the same way, so there may be additional similar projects for the right worker. And this solution should be 'scaleable'.
Please note, these are book sites, they are not high-earners. I do not have large budgets for any of my work. So I am looking for low cost solutions. But I write very good reviews for good workers!
Please write to me with an accurate assessment of how much time this would take you, when you can start and how much you will charge. I will ensure I provide as many details as possible, so the job doesn't have any 'unexpected surprises'.
Thank you.
Budget: $70
Posted On: February 18, 2021 15:05 UTC Category: Data Extraction Skills:Web Scraper