This is an old revision of the document!


Options - Program behavior

This will only extract so many e-mails from one website in order to not grab a high number of e-mails who possibly have nothing to do with your entered search-keyword but are listed there in addition. Please note that this counter is reset on each new URL from that host.

Some sites use javascript to hide there data from parsing programs. If you enable this option, then the program will try to analyze the javascript and extract the data anyway. This however doesn't work always but can safely be left enabled.

The downloaded html page will only be analyzed for the chosen parts. A web page consists of mainly 2 parts. A head where the web page is described and the actual content (body). In most cases you want to analyse them all.

When parsing websites, some servers store certain settings on the clients machine. This can be various things like the search keyword or a page number. It is a good idea to let this enabled. You only have to disable this in special cases where e.g. a server identifies you on a set cookie and then doesn't allow more than XX downloads an hour.

When a new URL is found that should be parsed, it is put to the “URLs in Queue” box. You can define where the program should put it.

This will define how many pages should be analyzed simultaneously. Please set this option with care. We had good results with a value of five. To high numbers will result in to much memory usage and an in-stable system.

Each browser identifies itself to a webserver with a special string when it downloads a website. If you use e.g. Internet Explorer, it is known by the webserver and in some cases it shows you a different site as when you use the Opera Browser or FireFox. You can also add new Browser-Identifications here when pressing “Edit” and let the program choose a random one when you check the “Random” box. However leave it to the default setting and it should be good in most cases.

Normally the program will run until there is nothing in the “URLs in Queue” left. In some cases it would be enough if you e.g. got one E-Mail from an URL or if you have parsed 100 URLs or if you have already parsed for like 60 minutes.

Set this to e.g. 1 to slow down the CPU usage if you have problems with it.

In case your system is kinda in-stable it would be good to save the results of the project to a file automatically so that you don't have to restart the whole progress again.

If you make use of the Automailer, then you probably don't want to send the e-mails twice. This option collects all successfully send e-mails to one file. The program will check new e-mails against this list and will skip sending them if they are in it. If you plan to send e-mails to the same people again, you have to either delete them from this list or change the name of the file. A restart of the program might be required.

When you are parsing an URL with a lot sublinks and no Item was found for like 100 links, then the all URL with the same domain get removed from the Queue.

A quick test is made if the email domain really exists and not some fake email was generated by a script.


Options - Filter

Here you can define in details what to parse and what not. New items can be added and modified when you press the right mouse button on a window. You can use the following syntax:

*something* will match if something is in the string, a “*” stands for any possible character
!badword goodwordwill match if badword is not in the string but goodword is
simpleword will match if simpleword is somewere in string but before and after it is a none alpha character.

Options - Search Engines

Here you can select the search engines you want to use. Please note that most of these search engines will also deliver result in other languages then the one in the second column. You can open a pop-up menu when clicking with the right mouse button on the list.

You can also add new search engines.


Options - Keywords

In some cases you might not be satisfied by the results of the program if you search for phone or fax numbers. This can happen when you have a different word for e.g. „phone“ in your language that is used on the page. You have to enter it here (right mouse button on the box) and you will get the results you want. The program has all English, German and French words for „phone/fax“ included though.

You can also add more place holders for the @ and . sign that is often replaced on web sites to make it harder for programs like ours to locate the email address.