Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
email_spider:methods_to_collect_emails [2013-09-05 07:55] – sven | email_spider:methods_to_collect_emails [2021-12-28 09:02] (current) – [Use URL as Start] sven | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | {{indexmenu_n> | ||
====== Methods to collect E-Mails ====== | ====== Methods to collect E-Mails ====== | ||
Line 15: | Line 16: | ||
Use this option if you have already a website where you want to extract emails from. | Use this option if you have already a website where you want to extract emails from. | ||
- | The option „Page | + | The option „Page |
==== How deep to parse this site ==== | ==== How deep to parse this site ==== | ||
- | This will let you chose how this URL gets parsed and how deep. All links from that same domain get seen as a certain level of sublink from that site. | + | This will let you choose |
==== How deep to parse external sites ==== | ==== How deep to parse external sites ==== | ||
Line 30: | Line 31: | ||
Whatever you plan to parse you can restrict it to one or the other setting. | Whatever you plan to parse you can restrict it to one or the other setting. | ||
+ | ==== Place holders ==== | ||
+ | |||
+ | You can use a special place holders or variable in the URL field. When entering **%nr%** or **%string%** inside of the URL you will be able to define a range of values on how to replace it. | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | If you enter e.g. %%http:// | ||
+ | **%nr%** with numbers that you define. It will add then multiple URLs to parse. Same for the place holder **%string%**. | ||
+ | |||
+ | **%nr%** stands for " | ||
+ | The result are generated URLs that are all getting parsed by the program. You don't need **%string%** in most cases. Just **%nr%** is interesting for you as it can be used to speed things up when you found an URL that holds all the data you need and uses a parameter that is a number. Usually this is a database ID. | ||
+ | ==== Login Required Websites ==== | ||
+ | |||
+ | Some websites require you to enter a login and password to get to a certain point of interest. There are basically two types to login. | ||
+ | |||
+ | === Auth-Http-Header === | ||
+ | |||
+ | This method is a server based authorization and your browser asks you to enter a login/ | ||
+ | |||
+ | %%http:// | ||
+ | |||
+ | === Website-Form === | ||
+ | |||
+ | If you have to open a special website to enter a login and password followed by pressing a button on the page, you have to click on the label " | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | A new form opens where you can simulate the login. | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | * Enter the URL click on **1. Parse** | ||
+ | * Choose the login form in the box below (usually detected automatically) | ||
+ | * Fill the fields with login/ | ||
+ | * Click on **2. Submit** | ||
+ | |||
+ | A page should open in your browser and hopefully showing some indication that the login was successful. Usually you should see a " |