meta data for this page
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| email_spider:methods_to_collect_emails [2013-09-05 07:55] – sven | email_spider:methods_to_collect_emails [2024-08-06 08:27] (current) – [How deep to parse external sites] sven | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | {{indexmenu_n> | ||
| ====== Methods to collect E-Mails ====== | ====== Methods to collect E-Mails ====== | ||
| Line 15: | Line 16: | ||
| Use this option if you have already a website where you want to extract emails from. | Use this option if you have already a website where you want to extract emails from. | ||
| - | The option „Page | + | The option „Page |
| ==== How deep to parse this site ==== | ==== How deep to parse this site ==== | ||
| - | This will let you chose how this URL gets parsed and how deep. All links from that same domain get seen as a certain level of sublink from that site. | + | This will let you choose |
| ==== How deep to parse external sites ==== | ==== How deep to parse external sites ==== | ||
| - | Each link that leaves the entered sites domain is seen as an external site. E.g. you want to parse site http:// | + | Each link that leaves the entered sites domain is seen as an external site. E.g. you want to parse site %%https:// |
| ==== Restrict to:: Domain/ | ==== Restrict to:: Domain/ | ||
| Line 30: | Line 31: | ||
| Whatever you plan to parse you can restrict it to one or the other setting. | Whatever you plan to parse you can restrict it to one or the other setting. | ||
| + | ==== Place holders ==== | ||
| + | |||
| + | You can use a special place holders or variable in the URL field. When entering **%nr%** or **%string%** inside of the URL you will be able to define a range of values on how to replace it. | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | If you enter e.g. %%http:// | ||
| + | **%nr%** with numbers that you define. It will add then multiple URLs to parse. Same for the place holder **%string%**. | ||
| + | |||
| + | **%nr%** stands for " | ||
| + | The result are generated URLs that are all getting parsed by the program. You don't need **%string%** in most cases. Just **%nr%** is interesting for you as it can be used to speed things up when you found an URL that holds all the data you need and uses a parameter that is a number. Usually this is a database ID. | ||
| + | ==== Login Required Websites ==== | ||
| + | |||
| + | Some websites require you to enter a login and password to get to a certain point of interest. There are basically two types to login. | ||
| + | |||
| + | === Auth-Http-Header === | ||
| + | |||
| + | This method is a server based authorization and your browser asks you to enter a login/ | ||
| + | |||
| + | %%http:// | ||
| + | |||
| + | === Website-Form === | ||
| + | |||
| + | If you have to open a special website to enter a login and password followed by pressing a button on the page, you have to click on the label " | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | A new form opens where you can simulate the login. | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | * Enter the URL click on **1. Parse** | ||
| + | * Choose the login form in the box below (usually detected automatically) | ||
| + | * Fill the fields with login/ | ||
| + | * Click on **2. Submit** | ||
| + | |||
| + | A page should open in your browser and hopefully showing some indication that the login was successful. Usually you should see a " | ||