1001 Freelance Projects
Latest Projects from Freelance Marketplaces
Today is: 26-Apr-2024 21:28 GMT
View Project
View this project in detail (Note: you will be redirected to external marketplace)
Project title: Find Sitemap & search url's API (for external domains/sitemaps)
Posted by: External project from PeoplePerHour
Started: 08-Mar-2023 12:28 GMT
Description: I need an API that I can use to search for an url, or part of an URL within an external site’s sitemap.

Usage: Laravel & MYSQL

A sitemap is not always located in the same location, nor is the location always mentioned in the robots.txt. So we need to save sitemap locations in a mysql table so we can use those locations to try on other domains, and so be able to locate more sitemaps.

Finding a sitemap
Create a mysql table “sitemaps” (example name) that we can use to save sitemap names (e.g. sitemap.xml, sitemap_index.php, etc). The table has a ‘sitemap’ and ‘count’ field, the count field is simply a counter for each time we find a sitemap with the same name.
Check if the given domain has a robots.txt (https://example.com/robots.txt), if there is a robots.txt you look for the sitemap directive.
“Sitemap: https://www.example.com/example.xml” (can be multiple)
You save the sitemap location to the sitemaps table, if it already exists you do a +1 on the count field.
If we don’t find the sitemap location in the robots.txt we try to find it using all the sitemap locations we have in our sitemaps table (the more we get, the higher the chance we find it) you check themaps with the highest counts first.

Finding a Url
Once you find the sitemap(s), you create an index of all urls in the sitemap and its nested sitemaps.
Now you simply try to find the given search term using a mysql query or regex.

Example request
/Sitemap?domain=example.com&search=url

Example API Response:
What i want is the API to return matching url’s in json format,
{
search: 'example'
domain: domain.com
statistics{
sitemaps_found: 3,
sitemaps{
1: 'www.domain.com/sitemap1.xml',
2: 'www.domain.com/sitemap453.xml',
3: 'www.domain.com/sitemap345.xml'
}
urls: 28892,
matches: 25
},
matches{
1:'www.domain.com/example/13324223',
2:'www.domain.com/example/94827497'
}
}

Discussion;
We can save the sitemap files we find to our server, and search within those files. Or we can insert all sitemap urls in a mysql table and search from there. Not sure what’s faster, let’s discuss.

Save all url’s in Mysql
Pro: Fast searching
Pro: Easily create a cron to delete entries older than x hours
Pro : Easy maintenance
Con: Need to extract all urls from the sitemap files (can potentially be hundred of thousands url’s)

Save sitemap as files
Pro: No need to extract urls and put them in mysql
Cons: Downloading files that might contain vulnerabilities
Cons: Saving files costs more space than saving only the urls in mysql


>>>Outside the scope of the initial task, but would be a followup task, do-not price this in
Project ID: 3314557
Project category:
Project budget:
View this project in detail (Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
Desktop Software Debugging for Notary Registration -- 2
Category: .NET, ASP.NET, C#, Programming, Microsoft SQL Server, SQL
Budget: ₹600 - ₹1500 INR
26-Apr-2024
16:03 GMT
Logo Terms of Service Drafting
Category: Brand Management, Branding, Corporate Identity, Logo Design
Budget: $30 - $250 USD
26-Apr-2024
16:01 GMT
Little modifications in a python django app
Category: Django, HTML, JavaScript, Python, Software Architecture
Budget: €8 - €9 EUR
26-Apr-2024
16:01 GMT
Classic Coat of Arms Illustration
Category: Caricature & Cartoons, Illustration
Budget: $30 - $250 USD
26-Apr-2024
16:00 GMT
Streetwear Logo and Graphic T-Shirts Designer
Category: Graphic Design, Logo Design, Photoshop, Photoshop Design, T Shirts
Budget: ₹600 - ₹1500 INR
26-Apr-2024
15:59 GMT
Help Setting Up PHP Debugging for CodeIgniter 3
Category: PHP, Software Architecture
Budget: £10 - £11 GBP
26-Apr-2024
15:58 GMT
Beachside Coastal Ecosystem Education for Adults
Category: Biology, Environmental Science, Teaching / Lecturing
Budget: min $50 USD
26-Apr-2024
15:58 GMT
experienced web testers -- 2
Category: Testing / QA, Website Testing
Budget: min $50 USD
26-Apr-2024
15:58 GMT
wordpress templates similar
Category: CSS, HTML, PHP, Web Design, WordPress
Budget: €8 - €10 EUR
26-Apr-2024
15:58 GMT
Industry-Specific News Website Design
Category: Graphic Design, HTML, PHP, User Interface / IA, Web Design
Budget: $30 - $250 USD
26-Apr-2024
15:57 GMT
UI/UX Design for Artistic Marketing Site
Category: Graphic Design, Logo Design, User Interface / IA, UX / User Experience, Web Design
Budget: $750 - $1500 USD
26-Apr-2024
15:57 GMT
Theological YouTube Shorts Editor Needed
Category: Article Writing, Content Writing, Copywriting, Video Editing, Video Production
Budget: $10 - $30 USD
26-Apr-2024
15:57 GMT
JPEG Logo to Vector Conversion
Category: Adobe InDesign, Graphic Design, Illustration, Logo Design, Photoshop
Budget: €6 - €12 EUR
26-Apr-2024
15:57 GMT
Website Content Upgrade & Live Chat Implementation
Category: Graphic Design, HTML, PHP, User Interface / IA, Web Design
Budget: $30 - $250 AUD
26-Apr-2024
15:55 GMT
Residential MEP Plan for Medium Property
Category: AutoCAD, Building Architecture, CAD / CAM, Civil Engineering, Engineering
Budget: ₹100 - ₹400 INR
26-Apr-2024
15:55 GMT
Browse All Projects
Projects by Skills ...
Projects for 'android'
Projects for 'ajax'
Projects for 'asp'
Projects for 'aspnet'
Projects for 'cms'
Projects for 'cpp'
Projects for 'csharp'
Projects for 'css'
Projects for 'delphi'
Projects for 'design'
Projects for 'drupal'
Projects for 'excel'
Projects for 'facebook'
Projects for 'flash'
Projects for 'html'
Projects for 'java'
Projects for 'javascript'
Projects for 'joomla'
Projects for 'iphone'
Projects for 'mysql'
Projects for 'photoshop'
Projects for 'php'
Projects for 'python'
Projects for 'ruby'
Projects for 'seo'
Projects for 'sql'
Projects for 'sysadm'
Projects for 'translate'
Projects for 'typing'
Projects for 'twitter'
Projects for 'vbnet'
Projects for 'xml'
Projects for 'wordpress'
Projects for 'writing'
Read RSS feeds ... New!
RSS feed for 'android'
RSS feed for 'ajax'
RSS feed for 'asp'
RSS feed for 'aspnet'
RSS feed for 'cms'
RSS feed for 'cpp'
RSS feed for 'csharp'
RSS feed for 'css'
RSS feed for 'delphi'
RSS feed for 'design'
RSS feed for 'drupal'
RSS feed for 'excel'
RSS feed for 'facebook'
RSS feed for 'flash'
RSS feed for 'html'
RSS feed for 'java'
RSS feed for 'javascript'
RSS feed for 'joomla'
RSS feed for 'iphone'
RSS feed for 'mysql'
RSS feed for 'photoshop'
RSS feed for 'php'
RSS feed for 'python'
RSS feed for 'ruby'
RSS feed for 'seo'
RSS feed for 'sql'
RSS feed for 'sysadm'
RSS feed for 'translate'
RSS feed for 'typing'
RSS feed for 'twitter'
RSS feed for 'vbnet'
RSS feed for 'xml'
RSS feed for 'wordpress'
RSS feed for 'writing'
New!
Проекты на русском
(Projects in Russian)

Long URL:
www.1001freelanceprojects.com
Mobile version:
m.1001fp.com
Copyright © 2005-2022 1001 Freelance Projects