Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
email_spider:methods_to_collect_emails [2013-09-05 07:55] svenemail_spider:methods_to_collect_emails [2018-10-22 16:33] sven
Line 1: Line 1:
 +{{indexmenu_n>1}}
 ====== Methods to collect E-Mails ====== ====== Methods to collect E-Mails ======
  
Line 19: Line 20:
 ==== How deep to parse this site ==== ==== How deep to parse this site ====
  
-This will let you chose how this URL gets parsed and how deep. All links from that same domain get seen as a certain level of sublink from that site. +This will let you choose how this URL gets parsed and how deep. All links from that same domain get seen as a certain level of sublink from that site. 
  
 ==== How deep to parse external sites ==== ==== How deep to parse external sites ====
Line 30: Line 31:
 Whatever you plan to parse you can restrict it to one or the other setting. Whatever you plan to parse you can restrict it to one or the other setting.
  
 +==== Place holders ====
 +
 +You can use a special place holders or variable in the URL field. When entering **%nr%** or **%string%** inside of the URL you will be able to define a range of values on how to replace it.
 +
 +{{ :email_spider:email_spider_place_holders.png |}}
 +
 +If you enter e.g. %%http://www.some-site.com%%/?page=**%nr%**, then the program will replace 
 +**%nr%** with numbers that you define. It will add then multiple URLs to parse. Same for the place holder **%string%**.
 +
 +**%nr%** stands for "number" and will insert a number counting from //start number// to //end number// that you define, where %string% will insert a random generated string.
 +The result are generated URLs that are all getting parsed by the program. You don't need **%string%** in most cases. Just **%nr%** is interesting for you as it can be used to speed things up when you found an URL that holds all the data you need and uses a parameter that is a number. Usually this is a database ID.
 +==== Login Required Websites ====
 +
 +Some websites require you to enter a login and password to get to a certain point of interest. There are basically two types to login.
 +
 +=== Auth-Http-Header ===
 +
 +This method is a server based authorization and your browser asks you to enter a login/password before it opens the website. If this is the case, you can use the following format when entering the URL:
 +
 +%%http://%%<color red>login</color>:<color red>password</color>@%%www.some-site.com/page.html%%
 +
 +=== Website-Form ===
 +
 +If you have to open a special website to enter a login and password followed by pressing a button on the page, you have to click on the label "**Requires Login**" as seen in the screenshot below.
 +
 +{{ :email_spider:email_spider_requires_login.png |}}
 +
 +A new form opens where you can simulate the login.
 +
 +{{ :email_spider:email_spider_login_form.png |}}
 +
 +  * Enter the URL click on **1. Parse**
 +  * Choose the login form in the box below (usually detected automatically)
 +  * Fill the fields with login/password
 +  * Click on **2. Submit**
 +
 +A page should open in your browser and hopefully showing some indication that the login was successful. Usually you should see a "Logout" link or button or some welcome message.