Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
proxy_scraper:options [2015-06-10 21:55] – [Proxy Timeouts / Threads] devinproxy_scraper:options [2016-11-09 11:24] (current) – [Automatic Export] sven
Line 4: Line 4:
  
 ===== Settings ===== ===== Settings =====
 +{{ :settings.png |}}
  
 ==== Internal Proxy Server ==== ==== Internal Proxy Server ====
Line 31: Line 32:
  
 ===== Provider ===== ===== Provider =====
 +{{ :provider.png |}}
  
 A website that offers free proxy servers is called a //provider// within GSA Proxy Scraper. The software comes with over 800 providers.  A website that offers free proxy servers is called a //provider// within GSA Proxy Scraper. The software comes with over 800 providers. 
 Having them checked in the list means that the automatic searching mode will parse that provider. Having them checked in the list means that the automatic searching mode will parse that provider.
 The maximum //quality// a provider can have is 100 which would mean that all extracted proxies do work. That however is very unlikely. The maximum //quality// a provider can have is 100 which would mean that all extracted proxies do work. That however is very unlikely.
-The //Last Working// column shows the number of working proxies from extracted once on last parsing. As a //description// you can enter whatever you want.+The //Last Working// column shows the number of working proxies that were extracted from that source during the last time it was parsed. As a //description// you can enter whatever you want.
  
 There is another option to proxies called "**Parse extracted links from search-engine-tests**" There is another option to proxies called "**Parse extracted links from search-engine-tests**"
Line 42: Line 44:
 Often new proxy sources are found that way that you can later add here as well. In the proxy list it is shown as "//Proxy-Search Links - ...//". Often new proxy sources are found that way that you can later add here as well. In the proxy list it is shown as "//Proxy-Search Links - ...//".
  
 +Another option called "**Use Search Engines to locate proxy lists**" will activly use search engines (google, bing...) with your queries (**Other**) or the found **IP**s itself to find new proxies.
  
-Under the list you have various options to manage the providers by adding new once or deleting some no longer working once.+ 
 +Under the list you have various options to manage the providers by adding new one'or deleting some no longer working providers.
  
   * **Import** - Will open a popup menu where you can import various proxy sources from other programs   * **Import** - Will open a popup menu where you can import various proxy sources from other programs
Line 49: Line 53:
   * **Edit** - Will open the editor to fine tune the options for each provider   * **Edit** - Will open the editor to fine tune the options for each provider
   * **Delete** - Will delete all selected providers   * **Delete** - Will delete all selected providers
-  * **Clear Cache** - This will delete all the cache files for the provider. Cache files hold information about proxies being extracted from the sources and found as being dead so further proxy crawling will no longer wast time on that. Clearing the cache might be useful only for special cases where all proxies had been tagged as being down due to some network errors.+  * **Clear Cache** - This will delete all the cache files for the provider. Cache files hold information about proxies being extracted from the sources and found as being dead so further proxy crawling will no longer waste time on that. Clearing the cache might be useful only for special cases where all proxies had been tagged as being down due to  network errors.
  
 ===== Automatic Export ===== ===== Automatic Export =====
 +{{ :export.png |}}
  
 There are many options for you to export proxies automatically. You can define an interval and different ways to export. There are many options for you to export proxies automatically. You can define an interval and different ways to export.
Line 58: Line 63:
   * **FTP** - Uploads a file to a FTP Server   * **FTP** - Uploads a file to a FTP Server
   * **E-Mail** - Sends an E-Mail with an attached file with proxies in it   * **E-Mail** - Sends an E-Mail with an attached file with proxies in it
-  * **WEB Upload** - This sends proxies to a webserver. You can define everything on how this should happen (POST/GET)+  * **WEB Upload** - This sends proxies to a webserver. You can define everything on how this should happen (POST/GET). Also see this [[http://pastebin.com/uikYgRyj|sample script in php to accept uploads]].
  
-You can of course Edit or Delete the exports here. Having it checked in the list means it is used on automatic export, else it can still be used in the Export Toolbar.+You can of course Edit or Delete your previously setup export options here. Having it checked in the list means it is used on automatic export, otherwise it can still be used in the Export Toolbar.
  
-Each Export offers Filters to be used. If you plan to create an export for //GScraper// (a famous tool for search engine parsing) you must make sure that only proxies are exported having an IP (filter option **Exclude proxies with a domain**) as //GScraper// will not import anything if there is a domain in it.+Each Export offers many different filters to be used. If you plan to create an export for //GScraper// (a famous tool for search engine parsing) you must make sure that only proxies are exported having an IP (filter option **Exclude proxies with a domain**) as //GScraper// will not import anything if there is a domain in it.
 ===== Automatic Search ===== ===== Automatic Search =====
  
-Having this checked means the program will search all the providers for new proxies. You can define the **Interval** here as well as the conditions on when this should happen or stopped.+{{ :auto_search.png |}} 
 + 
 +Having this checked means the program will search all the providers for new proxies. You can define the **Interval** here as well as the conditions on when this should happen or stop.
  
 The box with the different Tests is in fact very important. Usually people need proxies to be anonymous, you you should pick some tests here that offer this. The box with the different Tests is in fact very important. Usually people need proxies to be anonymous, you you should pick some tests here that offer this.
 Also the //Google Search// test might be of interest for you. Other tests are also available like //StopForumSpam// or tests if a proxy works with //facebook.com//. Also the //Google Search// test might be of interest for you. Other tests are also available like //StopForumSpam// or tests if a proxy works with //facebook.com//.
  
-Of course you can add your own tests here clicking the **Add** button or **Delete** those not of interest for you.+Of course you can add your own tests here clicking the **Add** button or **Delete** the tests that are not of interest for you.
  
 Each found proxy is tested against the selected tests. Each found proxy is tested against the selected tests.
Line 80: Line 87:
 Anyway it is not suitable to keep dead proxies forever in the hope that they will work once again. You should not disable the option **Automatically remove proxies** in order to not wast memory and other resources. Anyway it is not suitable to keep dead proxies forever in the hope that they will work once again. You should not disable the option **Automatically remove proxies** in order to not wast memory and other resources.
  
-The option to store removed proxies in an file enables you to test them all again once you feel like doing itThats possible clicking **Add -> Previously Removed** on the main interface.+The option to store removed proxies in an file enables you to test them all again if you want to go back and try that at a later timeThat'possible by clicking **Add -> Previously Removed** on the main interface.
  
 ===== Filter ===== ===== Filter =====
  
-I don't recommend you to use any of the filter options as you can always define filters for the automatic exports. However for those who want this, there is the possibility to do this.+{{ :filter.png |}} 
 + 
 +I don't recommend you to use any of the filter options as you can always define filters for the automatic exports. However for those who want this, the following options are available:
    
   * **Do not accept anonymous (no elite) proxies**: This is not keeping proxies who identify them as such but not telling the remote website what IP you have.   * **Do not accept anonymous (no elite) proxies**: This is not keeping proxies who identify them as such but not telling the remote website what IP you have.
   * **Do not accept transparent proxies**: This is not keeping proxies who identify them as such AND tell the remote website what IP you have.   * **Do not accept transparent proxies**: This is not keeping proxies who identify them as such AND tell the remote website what IP you have.
-  * **Skip suspicious proxies**: Suspicious proxies are those who probably spy on your activities on the proxy server itself. Keep in mind that many of the proxy servers you find are run by spy agencies. Anyway even if a proxy is not tagged as such, it can still spy on you. You can not trust anyone at all thees days.+  * **Skip suspicious proxies**: Suspicious proxies are those who probably spy on your activities on the proxy server itself. Keep in mind that many of the proxy servers you find are run by spy agencies. Anyway even if a proxy is not tagged as such, it can still spy on you. You can not trust anyone at all these days.
   * **Accept only if tagged as**: Keep proxies only if they match a certain test and that test tagged the proxy as such.   * **Accept only if tagged as**: Keep proxies only if they match a certain test and that test tagged the proxy as such.
-  * **Accept only the following ports** OR **Skip the following ports**: This can be usefull if your firewall does not allow you to go online by certain ports anyway or if you simply need proxies from a specific port. +  * **Accept only the following ports** OR **Skip the following ports**: This can be useful if your firewall does not allow you to go online by certain ports or if you simply need proxies from a specific port. 
-  * **Skip duplicate IPs**: Often you have proxies being on the same IP/Host but with different ports. This can be a problem if you got too many of those and do requests to search engines who only see the IP and check against that to see if you are hammering it.+  * **Skip duplicate IPs**: Often you have proxies being on the same IP/Host but with different ports. This can be a problem if you have too many and do requests to search engines who only see the IP and check against that to see if you are hammering it.
   * **Accept only the following Types**: There are different types of proxies. WEB proxies are those that work like a normal http protocol but with a bit of a modification on the request header. They are only useful for web queries. Then you have Connect proxies, socks4 and socks5 who can basically be used for any purpose, not just website parsing.   * **Accept only the following Types**: There are different types of proxies. WEB proxies are those that work like a normal http protocol but with a bit of a modification on the request header. They are only useful for web queries. Then you have Connect proxies, socks4 and socks5 who can basically be used for any purpose, not just website parsing.
-  * **Accept only the following Regions**: Proxies and there location on Earth are determinate by a small database. In some cases it is required to just take proxies from a special location which can be setup here.+  * **Accept only the following Regions**: Proxies and there location on Earth are determined by a small database. In some cases it is required to just take proxies from a special location which can be setup here.