Differences

This shows you the differences between two versions of the page.

--- email_spider:methods_to_collect_emails [2013-09-05 07:55] – sven
+++ email_spider:methods_to_collect_emails [2024-08-06 08:27] (current) – [How deep to parse external sites] sven
@@ Line 1: / Line 1: @@
+{{indexmenu_n>1}}
 ====== Methods to collect E-Mails ======
@@ Line 15: / Line 16: @@
 Use this option if you have already a website where you want to extract emails from.
-The option „Page has to include“ will check the downloaded website for this keyword and will only continue to parse if the keyword is present on the website.
+The option „Page must include“ will check the downloaded website for this keyword and will only continue to parse if the keyword is present on the website.
 ==== How deep to parse this site ====
-This will let you chose how this URL gets parsed and how deep. All links from that same	domain get seen as a certain level of sublink from that site.
+This will let you choose how this URL gets parsed and how deep. All links from that same	domain get seen as a certain level of sublink from that site.
 ==== How deep to parse external sites ====
-Each link that leaves the entered sites domain is seen as an external site. E.g. you want to parse site http://www.gsa-online.de and the program finds a link to http://www.shareware.de then this site is the first level of the external site.
+Each link that leaves the entered sites domain is seen as an external site. E.g. you want to parse site %%https://www.gsa-online.de%% and the program finds a link to %%https://www.google.de%% then this site is the first level of the external site.
 ==== Restrict to:: Domain/SubDomain ====
@@ Line 30: / Line 31: @@
 Whatever you plan to parse you can restrict it to one or the other setting.
+==== Place holders ====
+You can use a special place holders or variable in the URL field. When entering **%nr%** or **%string%** inside of the URL you will be able to define a range of values on how to replace it.
+{{ :email_spider:email_spider_place_holders.png |}}
+If you enter e.g. %%http://www.some-site.com%%/?page=**%nr%**, then the program will replace
+**%nr%** with numbers that you define. It will add then multiple URLs to parse. Same for the place holder **%string%**.
+**%nr%** stands for "number" and will insert a number counting from //start number// to //end number// that you define, where %string% will insert a random generated string.
+The result are generated URLs that are all getting parsed by the program. You don't need **%string%** in most cases. Just **%nr%** is interesting for you as it can be used to speed things up when you found an URL that holds all the data you need and uses a parameter that is a number. Usually this is a database ID.
+==== Login Required Websites ====
+Some websites require you to enter a login and password to get to a certain point of interest. There are basically two types to login.
+=== Auth-Http-Header ===
+This method is a server based authorization and your browser asks you to enter a login/password before it opens the website. If this is the case, you can use the following format when entering the URL:
+%%http://%%<color red>login</color>:<color red>password</color>@%%www.some-site.com/page.html%%
+=== Website-Form ===
+If you have to open a special website to enter a login and password followed by pressing a button on the page, you have to click on the label "**Requires Login**" as seen in the screenshot below.
+{{ :email_spider:email_spider_requires_login.png |}}
+A new form opens where you can simulate the login.
+{{ :email_spider:email_spider_login_form.png |}}
+  * Enter the URL click on **1. Parse**
+  * Choose the login form in the box below (usually detected automatically)
+  * Fill the fields with login/password
+  * Click on **2. Submit**
+A page should open in your browser and hopefully showing some indication that the login was successful. Usually you should see a "Logout" link or button or some welcome message.

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

Differences