Web Crawler Security Tool – Beta

The Web Crawler is a python based tool that automatically spider a web site. This tool look for directory indexing, download found files, crawl http or https and also http and https web sites not using common ports.

Features

  • Crawl http and https web sites.
  • Crawl http and https web sites not using common ports.
  • (beta) Now you can provide a username and password, if there is login request (http 401 code) it will try to login. Feedback is welcome!
  • Uses regular expressions to find ‘href’ and ‘src’ html tag. Also content links.
  • Identifies relative links.
  • Identifies and not crawl non-html content
  • Identifies directory indexing.
  • Uses CTRL-C to stop current crawler stages and continue working.
  • Identifies all kind of files
  • -e option exports a list of URL of the files found during crawling.
  • -i option allow users to interactively select files to download
  • It only creates an output directory for files if there is at least one file to download
  • It works in Windows, but didn’t save results yet.
  • It extracts information commonly leaked in directories with indexing. (not yet implemented in the new version)
  • It detects the framework used by web site, like WordPress or Joomla.(not yet implemented in the new version)
  • It looks for ‘.bk’ or ‘.bak’ files of php, asp, aspx, jps pages.(not yet implemented in the new version)
  • It identifies and calculates the number of unique web pages crawled.(not yet implemented in the new version)
  • It identifies and calculates the number of unique web pages crawled that contains parameters in URL.(not yet implemented in the new version)

Download Latest Version : crawler_v1.0.tar.gz (6.5 kB)
Fond other version |
Read more In here http://webcrawler-py.sourceforge.net/