Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
email_spider:options [2013-09-05 09:44] – [Options - Proxy] svenemail_spider:options [2019-05-22 05:23] (current) – [Keywords] sven
Line 1: Line 1:
-====== Options Program behavior ======+{{indexmenu_n>3}} 
 +====== Options ====== 
 + 
 +The program is known for it's many options and possibilities to get the data you want. By default everything is setup to be optimal for most customers, but of course everything can be optimized to your needs. 
 + 
 +===== Program behavior =====
  
 {{ :email_spider:email_spider_options_behavior.png?400 |}} {{ :email_spider:email_spider_options_behavior.png?400 |}}
  
-===== Not more then XX E-Mails from same host =====+==== Not more then XX E-Mails from same host ====
 This will only extract so many e-mails from one website in order to not grab a high number of e-mails who possibly have nothing to do with your entered search-keyword but are listed there in addition. Please note that this counter is reset on each new URL from that host. This will only extract so many e-mails from one website in order to not grab a high number of e-mails who possibly have nothing to do with your entered search-keyword but are listed there in addition. Please note that this counter is reset on each new URL from that host.
  
-===== Analyze javascript for protected E-Mails =====+==== Analyze javascript for protected E-Mails ====
 Some sites use javascript to hide there data from parsing programs. If you enable this option, then the program will try to analyze the javascript and extract the data anyway. This however doesn't work always but can safely be left enabled. Some sites use javascript to hide there data from parsing programs. If you enable this option, then the program will try to analyze the javascript and extract the data anyway. This however doesn't work always but can safely be left enabled.
  
-===== Analyze Head/Body =====+=== Analyze Head/Body ====
 The downloaded html page will only be analyzed for the chosen parts. A web page consists of mainly 2 parts. A head where the web page is described and the actual content (body). In most cases you want to analyse them all. The downloaded html page will only be analyzed for the chosen parts. A web page consists of mainly 2 parts. A head where the web page is described and the actual content (body). In most cases you want to analyse them all.
  
-===== Accept Cookies =====+==== Accept Cookies ====
 When parsing websites, some servers store certain settings on the clients machine. This can be various things like the search keyword or a page number. It is a good idea to let this enabled. You only have to disable this in special cases where e.g. a server identifies you on a set cookie and then doesn't allow more than XX downloads an hour.  When parsing websites, some servers store certain settings on the clients machine. This can be various things like the search keyword or a page number. It is a good idea to let this enabled. You only have to disable this in special cases where e.g. a server identifies you on a set cookie and then doesn't allow more than XX downloads an hour. 
  
-===== Add URLs to =====+==== Add URLs to ====
 When a new URL is found that should be parsed, it is put to the “URLs in Queue” box. You can define where the program should put it.  When a new URL is found that should be parsed, it is put to the “URLs in Queue” box. You can define where the program should put it. 
  
-===== Concurrent Connections =====+==== Concurrent Connections ====
 This will define how many pages should be analyzed simultaneously. This will define how many pages should be analyzed simultaneously.
 Please set this option with care. We had good results with a value of five. To high numbers will result in to much memory usage and an in-stable system. Please set this option with care. We had good results with a value of five. To high numbers will result in to much memory usage and an in-stable system.
  
-===== Identify as =====+==== Identify as ====
 Each browser identifies itself to a webserver with a special string when it downloads a website. If you use e.g. Internet Explorer, it is known by the webserver and in some cases it shows you a different site as when you use the Opera Browser or FireFox. You can also add new Browser-Identifications here when pressing “Edit” and let the program choose a random one when you check the “Random” box. However leave it to the default setting and it should be good in most cases. Each browser identifies itself to a webserver with a special string when it downloads a website. If you use e.g. Internet Explorer, it is known by the webserver and in some cases it shows you a different site as when you use the Opera Browser or FireFox. You can also add new Browser-Identifications here when pressing “Edit” and let the program choose a random one when you check the “Random” box. However leave it to the default setting and it should be good in most cases.
  
-===== Stop work after XX minutes/URL/Items =====+==== Stop work after XX minutes/URL/Items ====
 Normally the program will run until there is nothing in the “URLs in Queue” left. In some cases it would be enough if you e.g. got one E-Mail from an URL or if you have parsed 100 URLs or if you have already parsed for like 60 minutes. Normally the program will run until there is nothing in the “URLs in Queue” left. In some cases it would be enough if you e.g. got one E-Mail from an URL or if you have parsed 100 URLs or if you have already parsed for like 60 minutes.
  
-===== Seconds to wait after each Download =====+==== Seconds to wait after each Download ====
 Set this to e.g. 1 to slow down the CPU usage if you have problems with it. Set this to e.g. 1 to slow down the CPU usage if you have problems with it.
  
-===== Backup Results every XX Minutes to file YY =====+==== Backup Results every XX Minutes to file YY ====
 In case your system is kinda in-stable it would be good to save the results of the project to a file automatically so that you don't have to restart the whole progress again. In case your system is kinda in-stable it would be good to save the results of the project to a file automatically so that you don't have to restart the whole progress again.
  
-===== Save e-mails where we could sent automatically to. =====+==== Save e-mails where we could sent automatically to. ====
 If you make use of the Automailer, then you probably don't want to send the e-mails twice. This option collects all successfully send e-mails to one file. The program will check new e-mails against this list and will skip sending them if they are in it. If you make use of the Automailer, then you probably don't want to send the e-mails twice. This option collects all successfully send e-mails to one file. The program will check new e-mails against this list and will skip sending them if they are in it.
 If you plan to send e-mails to the same people again, you have to either delete them from this list or change the name of the file. A restart of the program might be required. If you plan to send e-mails to the same people again, you have to either delete them from this list or change the name of the file. A restart of the program might be required.
  
- +In latest version you have an EDIT button next to that field where you can erase certain E-Mails. When creating a new project, it would be wise to also change the file name there to avoid future trouble. 
-===== Skip whole domain when no item was found for a long time =====+==== Skip whole domain when no item was found for a long time ====
 When you are parsing an URL with a lot sublinks and no Item was found for like 100 links, then the all URL with the same domain get removed from the Queue. When you are parsing an URL with a lot sublinks and no Item was found for like 100 links, then the all URL with the same domain get removed from the Queue.
  
-===== Detect and remove fake emails (e.g. email produce scripts) =====+==== Detect and remove fake emails (e.g. email produce scripts) ====
 A quick test is made if the email domain really exists and not some fake email was generated by a script. A quick test is made if the email domain really exists and not some fake email was generated by a script.
  
 ---- ----
  
-====== Options - Filter ======+===== Filter =====
  
 {{ :email_spider:email_spider_options_filter.png |}} {{ :email_spider:email_spider_options_filter.png |}}
Line 60: Line 65:
 ---- ----
  
-====== Options - Search Engines ======+===== Search Engines =====
  
 {{ :email_spider:email_spider_options_search_engines.png |}} {{ :email_spider:email_spider_options_search_engines.png |}}
Line 70: Line 75:
 ---- ----
  
-====== Options - Keywords ======+===== Keywords =====
  
 {{ :email_spider:email_spider_options_keywords.png |}} {{ :email_spider:email_spider_options_keywords.png |}}
Line 79: Line 84:
  
 You can also add more place holders for the **@** and **.** sign that is often replaced on web sites to make it harder for programs like ours to locate the email address. You can also add more place holders for the **@** and **.** sign that is often replaced on web sites to make it harder for programs like ours to locate the email address.
 +
 +For the phone and fax numbers you can also add the prefix phone numbers that they should have (e.g. +49 482). If you uncheck everything else, it would work as a filter to only accept numbers from that region.
  
 ---- ----
  
-====== Options - Extra Data ======+===== Extra Data =====
  
 {{ :email_spider:email_spider_options_extra_data.png |}} {{ :email_spider:email_spider_options_extra_data.png |}}
Line 98: Line 105:
 ---- ----
  
-====== Options – PreParser ======+===== Options – PreParser =====
  
 {{ :email_spider:email_spider_options_preparser.png |}} {{ :email_spider:email_spider_options_preparser.png |}}
Line 107: Line 114:
 ---- ----
  
-====== Options - AutoMailer ======+===== AutoMailer =====
  
 {{ :email_spider:email_spider_options_email_message.png |}} {{ :email_spider:email_spider_options_email_message.png |}}
Line 125: Line 132:
 |{word1#word2#word3} |one of the words in the brackets gets inserted| |{word1#word2#word3} |one of the words in the brackets gets inserted|
  
-By default the email is sent as text only message. But you can also define a html message with different colors and text styles as seen in the screenshot above. Attachments are not possible and if you want to use images in the email you have to use links to an online resource.+By default the email is sent as text only message. But you can also define a html message with different colors and text styles as seen in the screenshot above. Attachments are not possible and if you want to use images in the email you have to use links to an online resource. See article "[[email_spider:creating_emails_with_images|E-Mails with Images]]" for more.
  
 On the “E-Mail Settings” you can define how to send the email. On the “E-Mail Settings” you can define how to send the email.
Line 133: Line 140:
 The **Pop3-Server** as well as the **SMTP-Server** and the **Own Email** are required to send mails normally over your E-Mail provider. Of course the **Login** and **Password** are also needed.  The **Pop3-Server** as well as the **SMTP-Server** and the **Own Email** are required to send mails normally over your E-Mail provider. Of course the **Login** and **Password** are also needed. 
  
-Please note that the login could be different from the email name for your email-provider. If you don't know what to enter then have a look into the settings of your mail-program like Outlook or search the FAQs of your email provider for details.+Please note that the login could be different from the email name for your email-provider. If you don't know what to enter then have a look into the settings of your mail-program like Outlook or search the FAQs of your email provider for details or see this listing of [[email_spider:email_provider_settings|email providers and their settings]].
  
 When sending mails with the program you have to make sure that no other email-client is logged in as this might result in a rejection from the mail server. When sending mails with the program you have to make sure that no other email-client is logged in as this might result in a rejection from the mail server.
Line 141: Line 148:
 **Own Email** is the E-Mail that the customer will send a reply to. So if you want to get feedback, you have to enter a valid email here. You can also write there something like %random%@my-host.com. In this case, the %random% string will be replaced with a random characters. This is useful if you want to change the email on each email you send out.  **Own Email** is the E-Mail that the customer will send a reply to. So if you want to get feedback, you have to enter a valid email here. You can also write there something like %random%@my-host.com. In this case, the %random% string will be replaced with a random characters. This is useful if you want to change the email on each email you send out. 
  
-**Simulated Mailer** is an option to pretend that the sent e-mail was written by one of the e-mail clients in the list. It is recommended to use “outlook” as its one of the most common e-mail clients. However if you want to use something else use the one from the list or simple chose random.+**Simulated Mailer** is an option to pretend that the sent e-mail was written by one of the e-mail clients in the list. It is recommended to use “outlook” as its one of the most common e-mail clients. However if you want to use something else use the one from the list or simple choose random.
  
 {{ :email_spider:email_spider_automailer_enabled.png |}} {{ :email_spider:email_spider_automailer_enabled.png |}}
Line 147: Line 154:
 If your TEST E-Mail arrived, you can switch the automailer on as seen on the screenshot above. If your TEST E-Mail arrived, you can switch the automailer on as seen on the screenshot above.
  
 +**If you see no activity in the status column, than you might have configured the email message wrong. Make sure that email is matching any of the masks you defined in the templates. Else the default template is used but only if there is some message in it.**
 +
 +You can use many macros here for the login you can e.g. use %email% for the whole email or
 +%emailuser% for the part before @. This way, you can e.g. use the spin syntax in the smtp server field to send emails by a random provider/account. However make sure the password is the same then.
 ---- ----
  
-====== Options - Proxy ======+===== Proxy =====
  
 {{ :email_spider:email_spider_options_proxy.png|}} {{ :email_spider:email_spider_options_proxy.png|}}