Scrapy V0.15.1 – a fast high-level screen scraping and web crawling framework for Python.

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.Features :

  • Simple : Scrapy was designed with simplicity in mind, by providing the features you need without getting in your way
  • Productive : Just write the rules to extract the data from web pages and let Scrapy crawl the entire web site for you
  • Fast : Scrapy is used in production crawlers to completely scrape more than 500 retailer sites daily, all in one server
  • Extensible : Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core
  • Portable : Scrapy runs on Linux, Windows, Mac and BSD
  • Open Source and 100% Python : Scrapy is completely written in Python, which makes it very easy to hack
  • Well-tested : Scrapy has an extensive test suite with very good code coverage
  • Batteries included : Scrapy comes with lots of functionality built in. Check this section of the documentation for a list of them.
Requirements :
Python 2.6, 2.7 (3.x is not yet supported)
Twisted 8.0 or above (Windows users: you’ll need to install Zope.Interface and maybe pywin32 because of this Twisted bug)
lxml or libxml2 (if using libxml2, version 2.6.28 or above is highly recommended)
pyopenssl (for HTTPS support. Optional, but highly recommended)

Intallation :
Scrapy is distributed in two ways: a source code tarball (for Unix and Mac OS X systems) and a Windows installer (for Windows). If you downloaded the tarball, you can install it as any Python package using

tar zxf Scrapy-X.X.X.tar.gz
cd Scrapy-X.X.X
python install

If you downloaded the Windows installer, just run it.

In Windows, you may need to add the C:Python25Scripts (or C:Python26Scripts) folder to the system path by adding that directory to the PATH environment variable from the Control Panel.

Download :
Zipball (1.0 MB)
Tarball (857 KB)
Read more in here :