Start or Schedule Website Crawling
There are two options to start a crawling of a website:
Crawl Now
Schedule Crawling
Go to Germain Workspace > Left Menu > Analytics > Website Crawler
![image-20240220-144701.png](../__attachments/3819110401/image-20240220-144701.png?inst-v=baad7441-1094-48da-b263-fa99f8ae5725)
Start website crawling - Germain UX
Crawl Now
This is the quickest way to execute a website crawler and get its results once completed. Configuration settings in this option are limited but sufficient for most use cases.
Configuration Settings
URL: Starting URL for the crawler.
Stay On Domain
True: Visit only URLs on the same domain or its subdomain (e.g. google.com and drive.google.com are on the same domain)
False: Visit every URL
Store Successfully Visited URLs
True: Store all Website URL Availability facts (these representing available and not available URLs)
False: Store only failed Website URL Availability facts (not available URLs only)
HTTP Failure Status Code: Any visited URL with returned HTTP status code equal or bigger to this value will be considered as unavailable.
Maximum Crawling Depth: This value represents how deep the crawler can visit URLs. Null value means no cap for maximum crawling depth.
Maximum URLs To Crawl: This value puts a cap on how many URLs can be visited by the crawler. Null value means there is no cap for maximum URLs to crawl.
Crawler Threads: How many independent threads will be used to crawl your website. More threads means more resources needed but quicker execution time.
Ignore URLs: URLs which shouldn’t be ignored by the crawler. Regex patterns are allowed if you want to exclude all domains (e.g. to ignore all URLs from drive.google.com, you need to add .*drive.google.com.* value)
![image-20240220-150956.png](../__attachments/3819110401/image-20240220-150956.png?inst-v=baad7441-1094-48da-b263-fa99f8ae5725)
Start website crawling (2) - Germain UX
Schedule Crawling of a Website
This is a more advanced option and it allows to configure:
a website crawler on a schedule
provide more advanced settings to the crawler (e.g. customer headers, authentication settings, connection settings and more)
![image-20240220-150300.png](../__attachments/3819110401/image-20240220-150300.png?inst-v=baad7441-1094-48da-b263-fa99f8ae5725)
Start website crawling (3)- Germain UX
If you are having any difficulty, please contact us.
Service: Automation
Feature Availability: 2024.1