The know-crawler is a customizable web crawling tool for the KNOW Project at the University of Washington. The concept is simple: instead of having students manually enter meta-data about articles such as dates, authors, keywords, etc., we automate the process with a web crawler. Our web crawler parses through news articles and compacts meta-data into one useful annotation about each article which we store in our database. The metadata includes the following information for each article: title, description, keywords, author, date, url. Our database of annotations could later be used to produce valuable visualization tools for analyzing news articles about international events.The concept around which the UI is organized is also very simple. The UI is a website. The way in which the user performs specific actions is simply by clicking links which will navigate the user throughout the different pages of the website and checking radio buttons or checkboxes to select the desired settings. There will be a default value for each category so if the user doesn’t specify a value then the default value will be used for this particular category. Therefore, the input of the user will be always valid. The UI is designed as an analogy to the real world process of ordering food from a delivery website. You log in with a password and choose the day and time of the delivery. You also select from the list of products the list of items that you want to have delivered. The modes are the default values. There will be a default time the crawler will run – once weekly at 2 am and there will be a default list of news sources that the crawler will crawl that is specified by our customer. Currently our crawler works only on

How to Install KNOW-Crawler
1. There is no need to install the KNOW-Crawler. The user can access it by logging into the crawler settings page with UW credentials at (under construction). Only selected members of UW’s KNOW Project will be given access to the crawler’s setting page, which will be password protected.

How to Run KNOW-Crawler
Only verified members of the UW’s KNOW project will be able to run the crawler. Once they are logged in and can access the front page, they simply need to navigate to the schedule customization or news source customization page. Each page will allow the user to make changes to the web crawler’s settings. These settings will then be submitted to the server that runs the web crawler automatically.

How to Use KNOW-Crawler
Using the know-crawler is easy. The web crawler and annotation database are already installed on a UW server. The web crawling script is set up to run automatically on the server. All the user needs to do is specify a desired time schedule for when the crawler should run and a list of desired news sources to crawl. A list of international news sources is already stored in the database. Verified members of the UW’s KNOW Project will have access to a customization page on the KNOW Project site which will allow them to modify the crawling schedule and news source list. Only those verified members will be allowed to change the settings of the crawler also reffered to as schedule it.

To schedule the crawler:

Navigate to the Time Schedule page by following the link “Change Crawler’s time”.
Specify time of day (hour and minute) for the crawler to start searching and annotating news articles, schedule it to run either daily or weekly. The option daily means that the crawler will run 7 days per week at the specified time and the choosing the option weekly will activate a list of checkboxes that when checked will specify the days on which the crawler will run.
To view and edit news source list:

Navigate to the News Source List page by following the link “Change the crawled websites”. You will see the list of all news websites that the crawler will search for articles and annotate them.
You can edit this list by adding or removing news sources.

Download & Stay Up-to-date :
Read more in here :