1001 Freelance Projects
Latest Projects from Freelance Marketplaces
Today is: 05-May-2024 15:46 GMT
View Project
View this project in detail (Note: you will be redirected to external marketplace)
Project title: Fix error in Python-based web scraper wit GUI
Posted by: External project from PeoplePerHour
Started: 24-Apr-2024 19:27 GMT
Description: Hello Freelancers,
I'm searching for a developer familiar with web scraping and Python to fix an existing web scraper which scrapes product data from products from category links from italian ecommerce-website www.yeppon.it

The script basically works, but it gives an error on certain points when scraping, I think because of a light change in the structure of the website which causes an error when the script tries to scrape a products text description.

Goal of this project is to fix the errors so the script works like it used to again, scraping data of products from given category-URLs from the website and giving out the data in csv-files. I think this won't be too much of an effort because it is basically this one error which needs to be located and fixed, everything else still seems to work fine. Price can be discussed.

Some facts:

1. The web scraper is based on Python with a GUI. It's final version comes as an exe file (therefore I can't attach it in the project description, I will send it in the messages or work stream).
2. It scrapes certain product data (like product name, price, description, image links) by category links which can be entered into the GUI. The GUI also has some input fields, these are just for fixed strings which can be entered into the fields and will be given out in the CSV files the script gives the product data in.
3. The scraper technically still works, however, it gives an error when scraping certain categories. You can check this by running the tool, filling out the given input fields with the data explained in the "Instructions" tab of the tool and then start scraping. It will produce this error (can be found in the log file):

--------------------------------------------------------------------------------------------------------------
2024-04-24 12:44:24,765:ERROR:'descriptionHtml'
Traceback (most recent call last):
File "async_scraper.py", line 739, in scrape
description_html = pdata["pageProps"]["product"][
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'descriptionHtml'
2024-04-24 12:44:24,765:INFO: Finally
2024-04-24 12:44:24,765:INFO:


No data found!


------------------------------------------------------------------------------------------------------------------
4. The error seems to occur when scraping a products text description. The text description consists of three possible elements:
- bulletpoints formatted into an ul element
- a text description which is cleaned/has HTML code removed/replaced
- scraping data from a table on the website and putting it into a given HTML structure



It was developed by a freelancer from PPH for a colleague of mine, unfortunately I can't reach my colleague for quite some time now to ask for all the details or the freelancers name, so I will post this to the public.


Scraping some categories will result in the error mentioned above, for example:
https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/asciugabiancheria
or
https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/frigoriferi

Others work just fine, like:
https://www.yeppon.it/c/telefonia/smartphone/smart-phone




I will attach the files I have about this project from my colleague. As I can't attach exe or rar files, I attached:

- a first version of the Python code (it is a beta version which will give another error which is solved in the final exe file and not the final code, just to give you an impression), as well as the code of the GUI and the requirements. These are async_scraper.txt, gui.txt and requirements.txt
Project ID: 3382569
Project category:
Project budget:
View this project in detail (Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
Playful Personal Celebration Graphic Design
Category: Graphic Design, Illustration, Logo Design, Photoshop
Budget: ₹700 - ₹2500 INR
05-May-2024
10:04 GMT
Experienced React JS Developer for Advanced Web App Development
Category: CSS, HTML, JavaScript, React.js, Web Design
Budget: $100 - $300 USD
05-May-2024
10:04 GMT
Steel Catalogue Website Development
Category: Graphic Design, HTML, PHP, User Interface / IA, Web Design
Budget: ₹37500 - ₹75000 INR
05-May-2024
10:03 GMT
PDF to Table for Meeting Diary
Category: Data Entry, Data Processing, Excel, PDF, Word
Budget: €8 - €30 EUR
05-May-2024
10:03 GMT
Shoecaresssss
Category: ECommerce, Graphic Design, HTML, PHP, Web Design
Budget: $250 - $750 USD
05-May-2024
10:01 GMT
Web-Based Account Receivable Software
Category: MySQL, PHP, Software Architecture, Software Development, Web Design
Budget: ₹1500 - ₹12500 INR
05-May-2024
10:00 GMT
Website one.com in wordpress
Category: Graphic Design, HTML, PHP, Web Design, WordPress
Budget: €30 - €250 EUR
05-May-2024
10:00 GMT
Spoken Word Audio Amplification
Category: Audio Production, Audio Services, Music, Sound Design, Voice Talent
Budget: ₹600 - ₹1500 INR
05-May-2024
10:00 GMT
Adobe Animate 2D Animation for website
Category: Adobe Animate, Animation, Graphic Design
Budget: $30 - $250 USD
05-May-2024
09:59 GMT
Modern Residential Architectural Design Needed
Category: 3D Rendering, AutoCAD, Building Architecture, Home Design, Interior Design
Budget: $250 - $750 USD
05-May-2024
09:58 GMT
Photo Editing: Forehead Swelling Effect
Category: Adobe Lightroom, Photo Editing, Photography, Photoshop, Photoshop Design
Budget: ₹600 - ₹1500 INR
05-May-2024
09:57 GMT
Branding Design for Discord Platform
Category: Graphic Design, Illustration, Logo Design, Photoshop
Budget: $30 - $250 USD
05-May-2024
09:57 GMT
Record and script localization Medical translator and auditor (Spanish)
Category: Castilian Spanish Translator, English (UK) Translator, English (US) Translator, Spanish Translator, Translation
Budget: €750 - €1500 EUR
05-May-2024
09:56 GMT
Modern Residential House Architect Needed
Category: 3D Rendering, AutoCAD, Building Architecture, Home Design, Interior Design
Budget: $250 - $750 USD
05-May-2024
09:54 GMT
Engaging Social Media Ad Design
Category: Banner Design, Graphic Design, Illustration, Logo Design, Photoshop
Budget: $10 - $30 USD
05-May-2024
09:53 GMT
Browse All Projects
Projects by Skills ...
Projects for 'android'
Projects for 'ajax'
Projects for 'asp'
Projects for 'aspnet'
Projects for 'cms'
Projects for 'cpp'
Projects for 'csharp'
Projects for 'css'
Projects for 'delphi'
Projects for 'design'
Projects for 'drupal'
Projects for 'excel'
Projects for 'facebook'
Projects for 'flash'
Projects for 'html'
Projects for 'java'
Projects for 'javascript'
Projects for 'joomla'
Projects for 'iphone'
Projects for 'mysql'
Projects for 'photoshop'
Projects for 'php'
Projects for 'python'
Projects for 'ruby'
Projects for 'seo'
Projects for 'sql'
Projects for 'sysadm'
Projects for 'translate'
Projects for 'typing'
Projects for 'twitter'
Projects for 'vbnet'
Projects for 'xml'
Projects for 'wordpress'
Projects for 'writing'
Read RSS feeds ... New!
RSS feed for 'android'
RSS feed for 'ajax'
RSS feed for 'asp'
RSS feed for 'aspnet'
RSS feed for 'cms'
RSS feed for 'cpp'
RSS feed for 'csharp'
RSS feed for 'css'
RSS feed for 'delphi'
RSS feed for 'design'
RSS feed for 'drupal'
RSS feed for 'excel'
RSS feed for 'facebook'
RSS feed for 'flash'
RSS feed for 'html'
RSS feed for 'java'
RSS feed for 'javascript'
RSS feed for 'joomla'
RSS feed for 'iphone'
RSS feed for 'mysql'
RSS feed for 'photoshop'
RSS feed for 'php'
RSS feed for 'python'
RSS feed for 'ruby'
RSS feed for 'seo'
RSS feed for 'sql'
RSS feed for 'sysadm'
RSS feed for 'translate'
RSS feed for 'typing'
RSS feed for 'twitter'
RSS feed for 'vbnet'
RSS feed for 'xml'
RSS feed for 'wordpress'
RSS feed for 'writing'
New!
Проекты на русском
(Projects in Russian)

Long URL:
www.1001freelanceprojects.com
Mobile version:
m.1001fp.com
Copyright © 2005-2022 1001 Freelance Projects