Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
email_spider:methods_to_collect_emails [2013-09-05 07:55] – created svenemail_spider:methods_to_collect_emails [2021-12-28 09:02] (current) – [Use URL as Start] sven
Line 1: Line 1:
 +{{indexmenu_n>1}}
 ====== Methods to collect E-Mails ====== ====== Methods to collect E-Mails ======
- 
-{{:email_spider:email_spider_methods.png|}} 
  
 Basically there are two methods to get E-Mails, phone and fax numbers. Basically there are two methods to get E-Mails, phone and fax numbers.
 +
 +{{:email_spider:email_spider_methods.png|}}
  
 ===== Use Search Engines ===== ===== Use Search Engines =====
Line 15: Line 16:
 Use this option if you have already a website where you want to extract emails from. Use this option if you have already a website where you want to extract emails from.
  
-The option „Page has to include“ will check the downloaded website for this keyword and will only continue to parse if the keyword is present on the website.+The option „Page must include“ will check the downloaded website for this keyword and will only continue to parse if the keyword is present on the website.
  
 ==== How deep to parse this site ==== ==== How deep to parse this site ====
  
-This will let you chose how this URL gets parsed and how deep. All links from that same domain get seen as a certain level of sublink from that site. +This will let you choose how this URL gets parsed and how deep. All links from that same domain get seen as a certain level of sublink from that site. 
  
 ==== How deep to parse external sites ==== ==== How deep to parse external sites ====
Line 30: Line 31:
 Whatever you plan to parse you can restrict it to one or the other setting. Whatever you plan to parse you can restrict it to one or the other setting.
  
 +==== Place holders ====
 +
 +You can use a special place holders or variable in the URL field. When entering **%nr%** or **%string%** inside of the URL you will be able to define a range of values on how to replace it.
 +
 +{{ :email_spider:email_spider_place_holders.png |}}
 +
 +If you enter e.g. %%http://www.some-site.com%%/?page=**%nr%**, then the program will replace 
 +**%nr%** with numbers that you define. It will add then multiple URLs to parse. Same for the place holder **%string%**.
 +
 +**%nr%** stands for "number" and will insert a number counting from //start number// to //end number// that you define, where %string% will insert a random generated string.
 +The result are generated URLs that are all getting parsed by the program. You don't need **%string%** in most cases. Just **%nr%** is interesting for you as it can be used to speed things up when you found an URL that holds all the data you need and uses a parameter that is a number. Usually this is a database ID.
 +==== Login Required Websites ====
 +
 +Some websites require you to enter a login and password to get to a certain point of interest. There are basically two types to login.
 +
 +=== Auth-Http-Header ===
 +
 +This method is a server based authorization and your browser asks you to enter a login/password before it opens the website. If this is the case, you can use the following format when entering the URL:
 +
 +%%http://%%<color red>login</color>:<color red>password</color>@%%www.some-site.com/page.html%%
 +
 +=== Website-Form ===
 +
 +If you have to open a special website to enter a login and password followed by pressing a button on the page, you have to click on the label "**Requires Login**" as seen in the screenshot below.
 +
 +{{ :email_spider:email_spider_requires_login.png |}}
 +
 +A new form opens where you can simulate the login.
 +
 +{{ :email_spider:email_spider_login_form.png |}}
 +
 +  * Enter the URL click on **1. Parse**
 +  * Choose the login form in the box below (usually detected automatically)
 +  * Fill the fields with login/password
 +  * Click on **2. Submit**
 +
 +A page should open in your browser and hopefully showing some indication that the login was successful. Usually you should see a "Logout" link or button or some welcome message.