The web Crawler is a python based tool that automatically spider a web site. This tool also look for directory indexing and crawl the directories with indexing again to list all files in it. There is also an option that allows download the files found and it can be used with FOCA or other software to extract metadata from files.
Current stable version is 0.4 and the main features are:
Crawl http and https web sites.
Crawl http and https web sites not using common ports.
Uses regular expressions to find ‘href’ and ‘src’ html tag. Also content links.
Identifies relative links.
Identifies domain related emails.
Identifies directory indexing.
Uses CTRL-C to stop current crawler stages and continue working.
Identifies file extensions (zip, swf, sql, rar, etc.)
Download files to a directory:
Download every important file (images, documents, compressed files, etc)
Or download specified files types.
Or download a predefined set of files (like ‘document’ files: .doc, .xls, .pdf, .odt, .gnumeric, etc.).
Maximum amount of links to crawl. A default value of 5000 URLs is set.
Note: This crawler can be used with Domain Analyzer Security Tool. (See Domain Analyzer)