RouterOS Italia

Versione completa: [Mikrotik] Script to monitor unexpected script failure
Al momento stai visualizzando i contenuti in una versione ridotta. Visualizza la versione completa e formattata.
A Problem FIXED!
Using scripting in ROS is one of the easiest ways to both configure, monitor & update your system, but their is ONE major flaw: When a script terminates unexpectedly with an error, diagnostic messages & subsequent action is very much a manual thing! Wouldn't it be great if you could have a script that can monitor your running script, and if it fails take some predefined action? Well now you can, because I have written it, or more accurately them, as they fit into 3 parts  ;-)
The 3 scripts are as follows:
Codice:
WATCHSCRIPT - The main script for this, please use this name if you just want to test this
YOURSCRIPT - The one here was called "FTPSCRIPT", please use this name if you want to test this.
YOURSCRIPTERROR - The final script that executes your 'on-failure' commands, this was called "FTPERRORSCRIPT", again use this to test the scripts.

There are also a number of global variables defined: 
Codice:
SCRIPTNAME - This is the name of YOURSCRIPT, i.e. the script you want executed, and is set in YOURSCRIPT on initial execution
SCRIPTWATCH - This is the name of the section of your script that is being monitored, and is set in YOURSCRIPT at various intervals
SCRIPTWATCHDELAY - This is how often the WATCHSCRIPT checks that YOURSCRIPT is running, this can be set to whatever you need
SCRIPTCOMPLETE - This should be initially set to 'false' within YOURSCRIPT, then set to 'true' when YOURSCRIPT completes
SCRIPTEXECUTE - This should be set at the start of YOURSCRIPT, and defines what script will be executed on failure, here this is set to FTPERRORSCRIPT
SCRIPTERRORNAME - This is set BY WATCHSCRIPT, and is passed to YOURSCRIPTERROR so you know what has failed.

HOW IT WORKS - When your script is executed, it starts a background task (the WATCHSCRIPT) which uses the "/system script job find=" function to confirm that your script is running. If it finds that your script has stopped, it checks a global variable (that your script sets) to see if it completed normally. If it hasn't, it then runs an error handling script to take actions on your behalf, the demonstration script just writes to the error log, but this could send you an email, start a subsequent script, or actually anything you could do with any other script! Simple eh?
POSSIBLE PROBLEMS - As this is a background process, and uses a single set global variables, if you executed 2 scripts to be watched, the global variables would cause issues and confuse the watch scripts. This can be overcome by utilising multiple watch scripts, with different names, and different global variable names.
Whilst I haven't tried it, I expect that using the error script to execute another 'watched' script could also cause problems, and if you do this, I would suggest that you implement a check in the error script to ensure that the initial watch script has completed, before it executes a subsequent one.
OTHER THINGS TO CONSIDER - I would advise that you don't confuse script development by implementing this from the start, certainly NOT until you are familiar with the functioning of this. You can add the global variable 'SCRIPTWATCH' to your script under development, and change its value 'ready' for using it with the watch script, without too much worry that it will cause you issues while debugging.
Obviously, when you are familiar with how this script works, it can be a great debugging aid, as-well-as a tool for live environments, that said (and I cant imagine anybody does this) *IF* you are using a live system to develop scripts, any script that already uses this WATCHSCRIPT could be significantly effected by developing scripts to also make use of it.

And the usual disclaimer: You use this on an as-provided basis!! There is NO warranty or guarantee that it will be suitable for your needs, or is fit for any purpose, or is without defect. If you make use of these scripts, and/or code snippets from them, and you trash your system ~ DON'T BLAME ME!!!! i.e. YOU USE THESE COMPLETELY AT YOUR OWN RISK!!!!
All that said, these scripts do nothing out of the ordinary, and should NOT cause any problems ~ Your call!
A 'Thank you' note is always appreciated if you use this in your live environments - and donations of MT licenses are always welcome ;-)

WATCHSCRIPT
This script is the main code for monitoring the script, whilst you can obviously make changes and/or use this as a basis for your own scripts, this should actually work as-is on any system. Just copy and past the script below into your scripts, as it is your main script that executes this.

Codice:
:global SCRIPTNAME
:global SCRIPTWATCH
:global SCRIPTWATCHDELAY
:global SCRIPTCOMPLETE
:global SCRIPTEXECUTE
:global SCRIPTERRORNAME
:local running ""
:local errorscript ""
:local currentwatch ""

:if ($SCRIPTNAME != "") do= {
  :local LOGNAME ("WS-" . $SCRIPTNAME)
  :log info ($LOGNAME . "  -  Starting WATCHSCRIPT")
  :if ($SCRIPTWATCHDELAY ="") do= {
    :set SCRIPTWATCHDELAY "500ms"
  }

  :if ($SCRIPTWATCH = "") do= {
    :set SCRIPTWATCH "UNKNOWN"
  }

  :set errorscript [/system script find name=$SCRIPTEXECUTE]
  :if (errorscript != "") do= {
    :log info ($LOGNAME . "  -  Error script found")
    :log info ($LOGNAME . " - Checking process : " . $SCRIPTNAME)

    :do {
      :set running [/system script job find script="$SCRIPTNAME"]
      :if ($currentwatch != $SCRIPTWATCH) do= {
        :log info ($LOGNAME . "  -  Watching : " , $SCRIPTWATCH)
        :set currentwatch $SCRIPTWATCH
      }

      :if ($running != "") do= {
        :log debug ($LOGNAME . "  -  JOB : " . running . "  -  Running : " . $SCRIPTWATCH . "  -  Delay : " . $SCRIPTWATCHDELAY)
        :delay $SCRIPTWATCHDELAY
      } else= {
        :log info ($LOGNAME . " - NOT RUNNING")
      }
    } while ($running != "")

    :if ($SCRIPTCOMPLETE != "true") do= {
      :set SCRIPTERRORNAME $SCRIPTWATCH
      :log error ($LOGNAME . " - Failed whilst executing : " . $SCRIPTERRORNAME)
      :log info ($LOGNAME . " - Running : " . $SCRIPTEXECUTE)
      :set SCRIPTCOMPLETE "true"
      :set SCRIPTNAME ""
      :set SCRIPTWATCH ""
      :set SCRIPTWATCHDELAY ""
      :set errorscript [/system script find name=$SCRIPTEXECUTE]
      :if (errorscript != "") do= {
        :execute $SCRIPTEXECUTE
      } else= {
        :log error ($LOGNAME . "  -  Error script found at start, but subsequently removed - ERROR SCRIPT CANNOT RUN")
      }
    } else= {
      :log info ($LOGNAME . "  -  Completed OK")
    }
  } else= {
      :log error ($LOGNAME . "  -  Error script NOT found - WATCHSCRIPT TERMINATING")
  }
} else= {
  :log error "COULD NOT START WATCHSCRIPT - SCRIPTNAME GLOBAL VARIABLE NOT SET"
}
YOUR SCRIPT
Obviously, there needs to be some changes to your script(s) to allow this to work, below is a demonstration script (a couple of 'fetch' & 'delete' commands) that should give you a real feeling for how this works
Codice:
# ----------------------------------- ADD THIS CODE AT THE START OF THE SCRIPT YOU WANT WATCHED -----------------------------------

Codice:
:global SCRIPTNAME

Codice:
:global SCRIPTWATCH

Codice:
:global SCRIPTWATCHDELAY

Codice:
:global SCRIPTCOMPLETE

Codice:
:global SCRIPTEXECUTE

Codice:
:local watchscriptrunning


Codice:
# Set the name of the script to be watched (i.e. THIS SCRIPT)

Codice:
:set SCRIPTNAME "FTPSCRIPT"


Codice:
# Set the global variable to show that the script is running

Codice:
:set SCRIPTCOMPLETE "false"


Codice:
# Set the global variable to set the watch timer

Codice:
:set SCRIPTWATCHDELAY "1000ms"


Codice:
# Set the global variable to set the errorscript name

Codice:
:set SCRIPTEXECUTE "FTPERRORSCRIPT"


Codice:
:set SCRIPTWATCH "WATCHSCRIPT-STARTUP"

Codice:
:if ([/system script find name="WATCHSCRIPT"] = "") do= {

Codice:
  :log error ($SCRIPTNAME . " - WATCHSCRIPT NOT FOUND - Script carrying on, but will not monitor errors!")

Codice:
} else= {

Codice:
  :log info ($SCRIPTNAME . " - WATCHSCRIPT FOUND - Script will be monitored for errors!")

Codice:
  :execute "WATCHSCRIPT"

Codice:
  :do {

Codice:
      :set watchscriptrunning  [/system script job find script="WATCHSCRIPT"]

Codice:
      :log info ($SCRIPTNAME . "  -  Waiting for WATCHSCRIPT TO START")

Codice:
      :delay "500ms"

Codice:
  } while ($watchscriptrunning = "")


Codice:
}


Codice:
:log info ($SCRIPTNAME . "  -  WATCHSCRIPT STARTED - Executing the main script")


Codice:
# You could also use the "} else= {" in the IF statement above to enclose YOUR WHOLE script and therefore STOP this script from running

Codice:
#    if the WATCHSCRIPT does not exist.


Codice:
# ----------------------------------- START YOUR NORMAL SCRIPT HERE ----------------------------------- #


Codice:
:local LOGNAME $SCRIPTNAME


Codice:
# Set the name of the section we are watching - this is passed to the error script, so we know what has failed

Codice:
:set SCRIPTWATCH "FTP1"


Codice:
:if ([/tool fetch address=[:resolve "www.mikrotik.com"] host="www.mikrotik.com" port="80" src-path="/" dst-path="deleteme1.tmp" mode=http] != 0) do={

Codice:
  :log info ($LOGNAME . " - FTP1 - File retieved")

Codice:
}


Codice:
# We are now moving to the next section, so we need to update the SCRIPTWATCH global variable

Codice:
:set SCRIPTWATCH "DELTMP1"

Codice:
:if ([/file remove "deleteme1.tmp"] = "") do={

Codice:
  :log info ($LOGNAME . " - deleteme1.tmp - File removed")

Codice:
}


Codice:
# We are now moving to the next section, so we need to update the SCRIPTWATCH global variable, again!

Codice:
:set SCRIPTWATCH "FTP2"

Codice:
:if ([/tool fetch address="www.mikrotik.com" dst-path="deleteme2.tmp" src-path="/nothere.tmp" mode=http] != 0) do={

Codice:
  :log info ($LOGNAME . " - FTP1 - File retieved")

Codice:
}


Codice:
# We are now moving to the next section, so we need to update the SCRIPTWATCH global variable, again!!!

Codice:
:set SCRIPTWATCH "DELTMP2"

Codice:
:if ([/file remove "deleteme2.tmp"] != "") do={

Codice:
  :log info ($LOGNAME . " - deleteme2.tmp - File removed")

Codice:
}


Codice:
# ----------------------------------- END OF YOUR NORMAL SCRIPT -----------------------------------


Codice:
# When we are sure the script has completed properly, we set the global variable SCRIPTCOMPLETE

Codice:
:set SCRIPTCOMPLETE "true"


YOUR SCRIPTERROR

Finally, you need a script to take the actions required, when (if) your main script fails:

Codice:
:global SCRIPTERRORNAME

Codice:
:log error "FTPSCRIPT - FTPERROR SCRIPT STARTED"


Codice:
:if ($SCRIPTERRORNAME = "FTP1") do= {

Codice:
  :log error "FTPSCRIPT - Failed in FTP1"

Codice:
} else= {

Codice:
  :if ($SCRIPTERRORNAME = "FTP2") do= {

Codice:
    :log error "FTPSCRIPT - Failed in FTP2"

Codice:
  } else= {

Codice:
    :if ($SCRIPTERRORNAME = "DELTMP1") do= {

Codice:
      :log error "FTPSCRIPT - Failed in DELTMP1"

Codice:
    } else= {

Codice:
      :if ($SCRIPTERRORNAME = "DELTMP2") do= {

Codice:
          :log error "FTPSCRIPT - Failed in DELTMP2"

Codice:
      }

Codice:
    }

Codice:
  }

Codice:
}