it's nuts that I have to write this job post, but here goes:
A developer has been doing some work for me, and is 95% - 99% finished, but now he says he doesn't have time to finish and has left the job. We were very close to 'handover' -- there are just a couple of outstanding issues, probably an hour of work left to do (or less!).
The job is a scraper that visits site#1 (a list of books) scrapes some details, then visits Amazon and scrapes some more details, then drops all the details into a csv file on my server.
That scraper has been done, and a sample data file is in the folder on my server.
It was created with php. It's finished -- but there are a couple of issues:
= I don't think the cron job is set up correctly, because the scraper isn't running each day
= the scrape results show some rows are missing info (only some of them) so the scraper is stopping its scrape occassionally, which needs to be corrected
THAT'S ALL
The second part of the job was to set up the WP All Import plugin to 'capture' the scrape details from the csv file. I have done 99% of it but there are a couple things I cannot figure out:
= I can't figure out how to 'match' the imported categories to the categories on my site (my tests are creating new categories and I don't want that)
= I can't figure out how to make all results set 'todays date' (current day's date) in the date fields (I think it's just a matter of adding a small snippet of php script to the field)
= And one other thing I can explain to the person doing this
THAT'S ALL
It's very surprising the developer dropped the job, as it really wouldn't take him much time to finish. But that's the situation.
-----------------------------------------------
ORIGINAL JOB DESCRIPTION BELOW:
I have installed WP All Import (WPAI) on my Wordpress website
https://www.wpallimport.com/
My site uses Advanced Custom Fields (ACF) so I have also installed the ACF Addon for WPAI
My site is on a webhosting vps account over at webhostpython com
My website is a book deals website, which lists the best book deals every day
I find approx 1/3rd of the deals listed on a 3rd party book site: website 'x'
I need a scraper that will visit website 'x', grab its list of books (and some details), then visit each book's Amazon page, and scrape each book's details from Amazon
The scraper will then drop a csv file on my server with ALL the scraped details (from both sites)
In Summary, there are 3 steps:
# 1 visit website 'x', collect book details
#2 visit Amazon pages for each book and scrape Amazon book details
#3 create updateable csv file on server
THEN
Ensure WP All Import captures details/fields correctly (see below, most of this is already done)
IN DETAIL:
:
#1 Website 'x' does not have an RSS feed but updates daily.
= therefore, when the scraper/script visits the site it must make a record of ALL books scraped so it will ONLY scrape new books (today's books) each day.
NOTE: Website 'x' is a privately owned website, I do not want to hammer their resources or be flagged in any way, EG: Be respectful, I don't have permission to scrape it.
#2 The site is only a 'listing' of books, with links to book pages on Amazon, so the scraper must go to the site, collect links to Amazon, then visit those links and scrape each Amazon page.
+ scraper will need to visit multiple pages on the site
I have manually captured the required details from BOTH site 'x' and Amazon and created a sample csv file which you can use = all columns correctly labelled etc
I will provide EXACT details of what to scrape, with marked up screenshots, all communications will be very clear and easy to understand.
I don't mind what platform is used to build the scraper/submitter (php, python, etc) so long as it scrapes and submits.
WP All Import:
If you have experience with WP All Import, that is good, but probably not necessary. I have been able to figure out 90% of the necessary settings.
There are a couple of settings I cannot figure out, which maybe need a php function added, I would like you to check/edit the settings to ensure the import runs smoothly/correctly.
EG #1 I can't figure out how to match categories from csv to categories on my site!
EG #2: A custom field needs to display a date, and I don't know how to do that
EG#3: Some book titles need small text changes such as this title:
'Hunt for Justice Box Set: Books 1–2'
= IF a book title contains a colon ':' THEN remove the colon and ALL that follows it
So, title will now display as: 'Hunt for Justice Box Set'
ALSO
= IF a book title contains the words 'Box Set' THEN change 'Box' to 'Boxed' and surround words with brackets
So, title will now display as 'Hunt for Justice (Boxed Set)
ALSO
= If a title contains the words 'Omnibus', or 'Complete Omnibus', or 'The Complete Ominibus' THEN change to 'Omnibus' and surround with brackets
So, title will display as 'book title (Omnibus)
Budget: $30
Posted On: March 01, 2021 11:23 UTC Category: Data Extraction Skills:Web Scraper, PHP, SQL, Data Scraping
Skills: Web Scraper, PHP, SQL, Data Scraping Country: United Kingdom