Text area: "The fantomas spyFetcher(TM) Module: Automatic botBase Maintenance

SYSTEM REQUIREMENTS

INSTALLATION
- UNIX

UNINSTALLING THE PROGRAM

WORKING WITH fantomas spyFetcher(TM)

CONFIGURATION OF CRON JOBS

ERROR HANDLING

KNOWN ISSUES

UPDATES + PROGRAM CHRONOLOGY

CONTACT + SUPPORT

======================================================================

SYSTEM REQUIREMENTS
-------------------

Language
-------
Perl 5

Module
------
Perl module Wget

More info under:
< http://www.gnu.org/software/wget/wget.html >

UNIX
----
The Unix system requires an installed web server.
Execution of CGI scripts must be enabled.
A directory for execution of CGI scripts must be existent.
Usually, this will be directoy /cgi-bin/.

Tested under: 
SuSE LINUX with Apache
Red Hat Linux with Apache
BSDI Unix with Apache

Browser
-------
Script is called and executed via web browser.
You will currently achieve best results under MS Internet Explorer 5+.
Netscape 4.7 may require adjustment of font size.

Tested under: IE 6+, Netscape 4.7, Netscape 7+, Opera 6+

======================================================================
Close window ]

INSTALLATION
------------
The following files are included:

spyfetcher-e.cgi --- (program script)
fantomas.gif     --- (logo/graphics file)
sfehelp.txt      --- (documentation in TXT format) <--- THIS FILE YOU ARE READING!
fa_license-e.txt --- (License Agreement And Terms Of Usage - PLEASE READ!)

-----------------------------------
ADJUSTMENTS IN FILE
"spyfetcher-e.cgi"
(please edit in ASCII or plain text
editor like Notepad etc.)
-----------------------------------

UNIX
----
* Please check path to location of Perl.
The default path in the script is "/usr/bin/perl".
If you don't know this path, you can check it out under telnet
by entering Unix command "whereis perl".
You may have to adjust the first line in the script "spyfetcher-
e.cgi" accordingly.

* The variables in the script "spyfetcher-e.cgi": "$stats_dir",
"$robot_file", "$log_file", "$wget_cmd", "$sendmail",
"$from_mail", "$to_mail", "$subject", $cloak_for_google, "$user
and "$pw" may optionally be adjusted to your requirements.

A comprehensive description of these variables
can be found below in chapter
"WORKING WITH fantomas spyFetcher(TM)".

* The script "spyfetcher-e.cgi", the file "fantomas.gif" and the
help file "sfehelp.txt" must be copied into the Unix server's
CGI directory.

* The CGI directory must be endowed with the following
permissions: "chmod 755"  [drwxr-xr-x]

* Next, create the directory defined as variable "$stats_dir"
with the following permissions:
"chmod 777" [drwxrwxrwx](Default name is "stats".)

* When uploading via FTP, make sure to transfer ALL files in
ASCII mode.
EXCEPTION: the graphics file "fantomas.gif"
which must be transferred in BINARY or AUTOMATIC mode.

* Required file permissions:

spyfetcher-e.cgi: "chmod 755"  [-rwxr-xr-x]
fantomas.gif:     "chmod 444"  [-r--r--r--]
sfehelp.txt:      "chmod 444"  [-r--r--r--]

======================================================================
Close window ]

UNINSTALLING THE PROGRAM
------------------------
For complete uninstall, delete the following:

spyfetcher-e.cgi
fantomas.gif
sfehelp.txt

The directory "stats" or whatever directory you defined under
"$stats_dir" including contents.

======================================================================

WORKING WITH fantomas spyFetcher(TM)
------------------------------------

Program Description
-------------------

The fantomas spyFetcher(TM) is a script which allows you to get
the latest fantomas spiderSpy(TM) botBase as a packed archive in
.ZIP format.

The botBase will be unpacked and saved on your server in the
directory defined under "$stats_dir" with the file name defined
under "$robot_file".

Close window ]

---------------------------------
Customization of script variables
---------------------------------

The following variables may optionally be customized in script
"spyfetcher-e.cgi":

* $stats_dir
This variable defines the directory where the spider robots list
file shall reside as absolute path in this format:
Example: "/usr/www/htdocs/yourdomain/cgi-bin/stats"

* $robot_file
This variable defines the file name of the spider robots list
file. Default file name is "spiderspy.txt".

* $log_file
This variable defines the file name of the transfer log file.
Default file name is "transfer.log".

* $wget_cmd
This variable defines the command call for wget. Default
configuration is "/usr/bin/wget".
If you don't know this path, you can check it out under telnet
by entering Unix command "whereis wget".
Else, please inquire with your system administrator.

Email Error Message
-------------------
If the script is executed in batch mode via cron job, an email
error message will be generated if the transfer of the fantomas
spiderSpy(TM) botBase fails.

For this email functionality you will need to specify 
the following variables:

* $sendmail
This variable defines the command call for the mail program.
Default configuration is "/usr/lib/sendmail -t -n -oi".
If you don't know this path, you can check it out under telnet
by entering Unix command "whereis sendmail".
Else, please inquire with your system administrator.

* $from_mail
This variable defines the email error message sender's address.

* $to_mail
This variable defines where you want the email error message to
be sent.

* $subject
This variable defines the email error message's subject line.

* $cloak_for_google
If you want to cloak for Google, please set "$cloak_for_google =
1" and the Google spider entries in the fantomas spiderSpy(TM)
botBase will be activated.

User Authentication
-------------------
After the sign up for the spiderSpy service, you received your
user id and password for downloading the fantomas spiderSpy(TM)
botBase.

* $user
This variable defines your user id (case sensitive).

* $pw
This variable defines your password (case sensitive).

******************** VERY IMPORTANT! *********************
If the variables "$user" and "$pw" are not correct, 
the download will fail because access is forbidden.

SO PLEASE MAKE SURE TO SPECIFY YOUR ID AND PW EXACTLY AS
ISSUED DURING SIGNUP!
******************** VERY IMPORTANT! *********************

Close window ]

ONLINE MODE
-----------
* Script is activated by entering the appropriate URL into web
browser's location/address field, 
e.g. "http://www.yourdomain.com/cgi-bin/spyfetcher-e.cgi".

To start the download of the current version of the fantomas
spiderSpy(TM) botBase, click button "Submit!".

If the botBase is saved on your server, the next HTML template
will display the message:

"Transfer of fantomas spiderSpy(TM) botBase successful!"

BATCH MODE
----------
You can manage the transfers of fantomas spiderSpy(TM) botBase
automatically by defining a cron job.

======================================================================

CONFIGURATION OF CRON JOBS
--------------------------
Cron is a mechanism for planning and scheduling batch jobs.

The daemon "crond" is started automatically on system boot up.
It runs one check per minute to see if there are any jobs to
execute.

The list of jobs to execute is created by the program "crontab".

The following commands work from the assumption that you are
either logged in by Telnet or locally on your Unix system.

Entering the command "crontab -l" will display a list of current
entries. By default, only entries owned by the logged in User
will be displayed.

Existing lists can be removed/deleted with command "crontab -r".

To create a new list, it is recommended to read the entries from
a file using command "crontab filename".

The following examples will show you the format of this file.
The file itself is created with an ASCII text editor.

Example:

0 12 * * * /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher-e.cgi start

This entry consists of six parameters. The first five parameters
define the time schedule, whereas the sixth parameter contains
the command for executing the job.

In our example above, this command consists of:

- the full path and file name of the script
- an argument

Parameters defining the time schedule are:

minute(0-59) hour(0-23) day of month(1-31) month(1-12) day of week(0-6) 0 = Sun

Hence, the above sample entries: 0 12 * * *
can be translated as:

If Minute = 0  and Hour = 12, the script will be executed.

Because the last three scheduling parms are defined by wildcard character
"*", the job will be executed every day.

Close window ]

Scheduling Week Days
--------------------
If you wish to run the script on Mondays only, the following entry will
do the trick:

0 12 * * 1 /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher-e.cgi start

Scheduling Turn of Month
------------------------
You can schedule the turn of the month in this manner:

0 0 1 * * /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher-e.cgi start


To Summarize
------------
Create a text file (e.g. "crontab.txt") and write the
appropriate command on one single line.

We recommend downloading the fantomas spiderSpy(TM) botBase once
per day. 

The following syntax will generate (as explained above) a cron
job which will run once a day:

0 12 * * * /usr/www/htdocs/yourdomain/cgi-bin/spyfetcher-e.cgi start

IMPORTANT
=========
Please modify the TIME OF DAY argument specified for your cron
job to prevent all downloads happening at the same time - with
hundreds of subscribers, this could incur a server overload on
our system.

Prevention of abuse: Per day, a maximum of six downloads of the
botBase are permitted, beyond that the downloading IP will be
blocked by our system.

Enter the absolute path for the script as valid for *YOUR*
specific system configuration.

The argument to use is "start", as shown in our example above.

Next, the command "crontab crontab.txt" will transmit this file
to crontab.


IMPORTANT
=========
If crontab has been configured for prior jobs already, you must
include them in the new file "crontab.txt" (example), as the
command "crontab crontab.txt" will override all previous cron
jobs owned by the specific user calling crontab!

For further online explanations under Unix, you can choose one
of the following commands:

man crontab
man 5 crontab
man cron

======================================================================

Close window ]

ERROR HANDLING
--------------
This section covers individual error messages.

Stats directory
---------------
"Stats directory ... does not exist!"

Please create stats directory or adjust 
the directory name under variable "$stats_dir".

Download error
--------------
"Download of fantomas spiderSpy(TM) botBase failed!"

Possible issues:
* Call of wget is not functional.
Solution:
  Please check your system's wget functionality.
* The access data specified (user id and password) 
  for the botBase are invalid.
Solution:
  Please check your user ID and Password.
* In directory stats (defined under $stats_dir) new
  files could not created.
Solution:
  Please check permissions of directory:
  "chmod 777" i.e. [drwxrwxrwx]

Unzip error
-----------
"Unzip of fantomas spiderSpy(TM) botBase failed!"

Possible issues:
* Call of gunzip is not functional.
Solution:
  Please check your system's gunzip functionality.

Change mode error
-----------------
"Change mode of fantomas spiderSpy(TM) botBase failed!"

Possible issues:
* Call of chmod is not functional.
Solution:
  Please check your system's chmod functionality.

======================================================================

KNOWN ISSUES
------------

Graphics
--------
Graphics files uploaded to the CGI directory or to a directory below same
may not be displayed correctly under some web server configurations.

In this case you may create a directory outside of the cgi-bin.
You can then define the "$graphics_dir" variable in program file
spyfetcher-e.cgi accordingly.
Example:

$graphics_dir = "../graphics/";

Docs (Manual/Help files)
------------------------
If the help file is not displayed correctly, we recommend uploading it
to an alternate directory (outside of cgi-bin!) as well.
You can then define the "$doc_dir" variable in program file
spyfetcher-e.cgi accordingly.

Example:
$doc_dir = "../docs/";
Close window ]