aoaExtractor
| Try » | Price: $14.99 | Demos - Match | Counters | URLs |
Contents:
aoaExtractor is a desktop application for crawling (spidering) the web and extracting words, phrases, email addresses, web addresses, and everything you want from the web pages.
It is a good solution for companies that needs to convey email advertisement campaigns, get leads, search for specific content on the Internet.
Most of the similar programs use search engines to find pages by keywords and extract content from them, but often these programs are not so smart to crawl members profile pages and forums, and although you see the content on the site, the program does not extract it.
aoaExtractor is designed exactly for this purpose. Although it can spider the web like other similar programs, its strength is in crawling big membership sites like forums and community sites and extracting the desired content (emails, web addresses or whatever)
aoaExtractor has even the ability to execute javascript code after the page loads in the built in browser. Thus you can use aoaExtractor to post forms on the sites and send messages, register users, or whatever you want.
You may try or directly purchase aoaExtractor. After purchasing, you will be directed to the installation page where you will see the key to unlock the application and the install link. Just copy the key and click the install button.
Via the installation process you might be prompted to install the Adobe AIR runtime (If it is not installed on your system). You will see the image below. Just click "Yes"

The program has three modes described below. All of them extract content from the pages matching certain criteria, defined with regular expressions. This software has two built in and often used regular expressions for extracting emails and web addresses, and ability to use your own custom regular expressions. It is really easy to work with this program...
The first mode is used for more general purposes. You just enter the start page, restrict the crawler allowing it to visit only certain pages pages, limit the maximum number of pages to be crawled and run it. Here is the demo
Sometimes just crawling the site either takes too much time or just can't extract the desired content. In this case, you can use the counters. This is probably the most useful mode for crawling profile pages or pages in "paged lists". Here is the counters demo
In this mode you have to manually prepare a list of the URLs which you want to be crawled. This mode has one unique feature - After visiting the page, the crawler can execute a JavaScript code which you prepare in advance. See the demo
- If the pages are inside a membership area and you can't view them, just stop the crawler for a while, login, and go on.
Regular expressions are widely used in the software development and are well described on the following sites:
http://www.regular-expressions.info
http://en.wikipedia.org/wiki/Regular_expression








