meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
search_engine_ranker:script_manual [2019-08-09 08:02] – [Data Extraction] sven | search_engine_ranker:script_manual [2025-02-14 12:27] (current) – sven | ||
---|---|---|---|
Line 5: | Line 5: | ||
Here are some helpful links on engine writing and syntax highlighting: | Here are some helpful links on engine writing and syntax highlighting: | ||
+ | * [[https:// | ||
* [[http:// | * [[http:// | ||
* [[http:// | * [[http:// | ||
* [[http:// | * [[http:// | ||
+ | * [[https:// | ||
* [[http:// | * [[http:// | ||
Line 15: | Line 17: | ||
* [[http:// | * [[http:// | ||
* [[https:// | * [[https:// | ||
+ | * [[https:// | ||
===== The Structure ===== | ===== The Structure ===== | ||
Line 27: | Line 30: | ||
Everything is not case sensitive so you don't have to care if you write **[SECTION]** or **[Section]**. There are basically two types of engines. | Everything is not case sensitive so you don't have to care if you write **[SECTION]** or **[Section]**. There are basically two types of engines. | ||
- | * The once that require an account and login.\\ You will have to define at least the following sections:\\ **[SETUP]**, | + | * The once that require an account and login.\\ You will have to define at least the following sections:\\ **[SETUP]**, |
* Those who need no account and no login.\\ You just need the following sections:\\ **[SETUP]**, | * Those who need no account and no login.\\ You just need the following sections:\\ **[SETUP]**, | ||
Line 44: | Line 47: | ||
|page must have|This parameter is used to check whenever the webpage is usable for this engine or not. The content of this variable has to be present in the webpage (either pure text or html source). The variable can have multiple values separated by a %%|%% where just one has to match**.**\\ \\ //Example: //\\ //page must have1=Powered by XYZ%%|%%XYZ Powered//\\ //page must have2=!not allowed to access this page//\\ //page must have3=Webpage%%|%%Homepage// | |page must have|This parameter is used to check whenever the webpage is usable for this engine or not. The content of this variable has to be present in the webpage (either pure text or html source). The variable can have multiple values separated by a %%|%% where just one has to match**.**\\ \\ //Example: //\\ //page must have1=Powered by XYZ%%|%%XYZ Powered//\\ //page must have2=!not allowed to access this page//\\ //page must have3=Webpage%%|%%Homepage// | ||
|url must have|This parameter is used the same way as "page must have" but for the URL string itself and not for the website content. \\ \\ //Example: url must have1=/ | |url must have|This parameter is used the same way as "page must have" but for the URL string itself and not for the website content. \\ \\ //Example: url must have1=/ | ||
- | |fixed url|If no "// | + | |fixed url|If no "// |
|search term|This is used to search for new targets on the internet with the help of search engines like google.\\ \\ //Example: //\\ //search term=" | |search term|This is used to search for new targets on the internet with the help of search engines like google.\\ \\ //Example: //\\ //search term=" | ||
|add keyword to search|1 = Add a keyword from the project to the search query\\ 0 = Never add a keyword to the search query\\ 2 = Add just sometimes a keyword to it if it seems to be useful (default)| | |add keyword to search|1 = Add a keyword from the project to the search query\\ 0 = Never add a keyword to the search query\\ 2 = Add just sometimes a keyword to it if it seems to be useful (default)| | ||
Line 70: | Line 73: | ||
* [REGISTER_STEP*] - used to create an account | * [REGISTER_STEP*] - used to create an account | ||
+ | * [FIRSTLOGIN_STEP*] - used **only ones** to login after account registration is done and first login ever is performes | ||
* [LOGIN_STEP*] - used to log into the site with the created account | * [LOGIN_STEP*] - used to log into the site with the created account | ||
* [STEP*] - he actual submission process | * [STEP*] - he actual submission process | ||
Line 90: | Line 94: | ||
|form class|Try to find a form on the current webpage that has a class as in the variable content. Again you can use %%|%% to have multiple variations. Not many sites use a class in the < | |form class|Try to find a form on the current webpage that has a class as in the variable content. Again you can use %%|%% to have multiple variations. Not many sites use a class in the < | ||
|form name|Try to find a form on the current webpage that has a name like the variable content. If no name is used in the < | |form name|Try to find a form on the current webpage that has a name like the variable content. If no name is used in the < | ||
+ | |form method|Try to find a form on the current webpage that has a method as in the variable content (post or get).\\ \\ //Example: form method=post// | ||
|form url|Try to find a form where the submission URL would match the variable content.\\ \\ //Example: form url=*/ | |form url|Try to find a form where the submission URL would match the variable content.\\ \\ //Example: form url=*/ | ||
|form url ignore|Ignores forms where the submission URL would match the variable content.\\ \\ //Example: form url ignore=*/ | |form url ignore|Ignores forms where the submission URL would match the variable content.\\ \\ //Example: form url ignore=*/ | ||
Line 97: | Line 102: | ||
|seconds to wait before submission condition|This will only delay the submission if something in the variable content is found on the webpage.\\ \\ //Example: seconds to wait before submission condition=stop_spam_time// | |seconds to wait before submission condition|This will only delay the submission if something in the variable content is found on the webpage.\\ \\ //Example: seconds to wait before submission condition=stop_spam_time// | ||
|post data|This is hardly used but will create a custom data that is used to submit to websites instead of using the data from < | |post data|This is hardly used but will create a custom data that is used to submit to websites instead of using the data from < | ||
+ | |http header|Add additional headers to the HTTP Request Header.\\ \\ //Example: http header=X-WP-Nonce: | ||
|encode post data|1 = encode the data in a proper way as used in POST protocol\\ 2 = encode it using multipart \\ 0 = take the data as it is without encoding anything\\ 3 = encode it using json syntax| | |encode post data|1 = encode the data in a proper way as used in POST protocol\\ 2 = encode it using multipart \\ 0 = take the data as it is without encoding anything\\ 3 = encode it using json syntax| | ||
|variable must be used|A form is only submitted if certain variables have been used in that form.\\ \\ //Example: variable must be used=url, | |variable must be used|A form is only submitted if certain variables have been used in that form.\\ \\ //Example: variable must be used=url, | ||
- | |add fixed data add fixed data condition\\ \\ \\ \\ //remove fixed data//\\ \\ //remove fixed data condition// | + | |add fixed data\\ add fixed data condition\\ \\ \\ \\ //remove fixed data//\\ \\ //remove fixed data condition// |
|set unknown variable\\ \\ set unknown variable condition|If a form field is unknown as we didn't define how to fill it in our engine, we could still fill it by something you define here. The submission aborts if this is not defined and something is unable to get filled. The "set unknown variable condition" | |set unknown variable\\ \\ set unknown variable condition|If a form field is unknown as we didn't define how to fill it in our engine, we could still fill it by something you define here. The submission aborts if this is not defined and something is unable to get filled. The "set unknown variable condition" | ||
|match by option label|1 = A form with a select or radio field is filled by checking the variable content against the option labels (the one you see on the browser).\\ 0 = We will not check for a matching label\\ \\ //Example: match by option label=1//| | |match by option label|1 = A form with a select or radio field is filled by checking the variable content against the option labels (the one you see on the browser).\\ 0 = We will not check for a matching label\\ \\ //Example: match by option label=1//| | ||
Line 120: | Line 126: | ||
|verify on unknown status|1 = if a submission is not detected as successful or failed it will still be taken as successful (appearing in log with " | |verify on unknown status|1 = if a submission is not detected as successful or failed it will still be taken as successful (appearing in log with " | ||
|verify submission|1 = verify the submission\\ 0 = do not verify the submission but assume that the link is submitted and will be visible there or is already (default) Even though this is the default behaviour, you should set it to " | |verify submission|1 = verify the submission\\ 0 = do not verify the submission but assume that the link is submitted and will be visible there or is already (default) Even though this is the default behaviour, you should set it to " | ||
- | |verify by\\ \\ verify search for|Defines how to verify a submission. Possible value for " | + | |verify by\\ \\ verify search for|Defines how to verify a submission. Possible value for " |
- | |verify url\\ verify url remove\\ verify url replace|If you use " | + | |verify url\\ verify url remove\\ verify url replace\\ verify url must have|If you use " |
|use original url to verify|1 = this will not use the last URL but the URL we started the whole engine with.\\ 0 = use the last URL at the end of the submission (default)\\ \\ //Example: use original url to verify=1//| | |use original url to verify|1 = this will not use the last URL but the URL we started the whole engine with.\\ 0 = use the last URL at the end of the submission (default)\\ \\ //Example: use original url to verify=1//| | ||
|verify interval|Defines in what interval in minutes this verification should take place (default 180). \\ \\ //Example: verify interval=60// | |verify interval|Defines in what interval in minutes this verification should take place (default 180). \\ \\ //Example: verify interval=60// | ||
Line 129: | Line 135: | ||
|try to continue without verification|0 = follow exact verification steps (default)\\ 1 = try to skip verification and continue\\ \\ //Example: try to continue without verification=1// | |try to continue without verification|0 = follow exact verification steps (default)\\ 1 = try to skip verification and continue\\ \\ //Example: try to continue without verification=1// | ||
|modify url|This is used to change a found URL to something else.\\ \\ //Example: modify url=%targethost% %targetpath%// | |modify url|This is used to change a found URL to something else.\\ \\ //Example: modify url=%targethost% %targetpath%// | ||
- | |modify url condition|If present it will check if the content is presnet | + | |modify url condition|If present it will check if the content is present |
|modify url remove|The same as " | |modify url remove|The same as " | ||
|modify url replace|The same as " | |modify url replace|The same as " | ||
- | |modify step\\ modify step condition|This will go to another submission step if the condition (something in last downloaded page from previous submission step) was found.\\ \\ //Example: modify step=2\\ modify step condition=*No verification required*// | + | |modify submit method|Use this to change the form submission method. Valid values are GET and POST.| |
+ | |modify step\\ modify step condition\\ modify step condition input|This will go to another submission step if the condition (something in last downloaded page from previous submission step) was found.\\ \\ //Example: modify step=2\\ modify step condition=*No verification required*//\\ the //modify step condition input// is optional and will use the content of the webpage as default| | ||
|Download retries|Number of tries to submit or download something (default is 1).| | |Download retries|Number of tries to submit or download something (default is 1).| | ||
|Link type|Defines the type of backlink created. Can be anything you want but you might want to use the types already used in other scripts.| | |Link type|Defines the type of backlink created. Can be anything you want but you might want to use the types already used in other scripts.| | ||
Line 197: | Line 204: | ||
|html to markdown|1 = Convert html code to markdown code\\ 0 = Do not convert it\\ < | |html to markdown|1 = Convert html code to markdown code\\ 0 = Do not convert it\\ < | ||
|html to custom link format\\ custom link format|1 = convert html code to a custom format\\ 0 = Do not convert it.\\ < | |html to custom link format\\ custom link format|1 = convert html code to a custom format\\ 0 = Do not convert it.\\ < | ||
+ | |basic html only|1 = keep just basic html tags (default if custom link format is used)\\ 0 = keep html as it is| | ||
|html line break\\ \\ html line break format|Converts a normal line break to some html line break (default < | |html line break\\ \\ html line break format|Converts a normal line break to some html line break (default < | ||
|custom img format|If set, it will try to locate the html syntax for images and replace it with that new syntax.\\ \\ // | |custom img format|If set, it will try to locate the html syntax for images and replace it with that new syntax.\\ \\ // | ||
Line 214: | Line 222: | ||
|tier data|Sets the data that will overwrite the input data for tier projects. Just if the "tier data" is empty it will use the one that the user set.\\ \\ //Example: tier data=%tier_title%// | |tier data|Sets the data that will overwrite the input data for tier projects. Just if the "tier data" is empty it will use the one that the user set.\\ \\ //Example: tier data=%tier_title%// | ||
|allow data url|0 = do not allow data URLs -> gets removed\\ 1 = allow them (default)\\ \\ This will remove e.g. images from the article that are Data-URLs| | |allow data url|0 = do not allow data URLs -> gets removed\\ 1 = allow them (default)\\ \\ This will remove e.g. images from the article that are Data-URLs| | ||
+ | |resize=< | ||
+ | |filter|What to accept when type=file is used as files. Example: filter=Image files|*.jpg; | ||
Line 241: | Line 251: | ||
|%captcha% *image.php? | |%captcha% *image.php? | ||
- | |%captcha% fixed: | + | |%captcha% fixed:%%http:// |
|%captcha% overlay: | |%captcha% overlay: | ||
Line 276: | Line 286: | ||
|reverse|When using // | |reverse|When using // | ||
|base64|using // | |base64|using // | ||
+ | |all matches\\ delimiter|When using //all matches=1// together with // | ||
===== A Small Example ===== | ===== A Small Example ===== | ||