Stop and Start the D0 Db Servers (except SAM) if crashed. 

 

Steps to restart the server in case of failure :

 

  1. Check the status of the Server(s) at:

http://d0db-prd.fnal.gov/sam_admin/cgi/nameService

 

*This page defines what hostname/ip the servers are running on

** A  RED light indicates a problem with the server.

 

IP Address for d0dbsrv4

(Primary Node for Calib User Servers except RCP and Trigger ) :     131.225.223.130

 

IP Address for d0dbsrv5

(Primary Node for Calib Farm Servers except RCP and Trigger ) : 131.225.223.41

 

Also to understand if the D0 db servers have been failed over to Secondary Node:

 

Note : To logon to d0dbsrv4/d0dbsrv5/d0dbsrv6/d0dbsrv2 nodes as d0db account, one has to  have a  root kerberose principle and should be listed in .k5login  file of d0db account.  Mail can be sent to css-dsg@fnal.gov  for you to be  added to .k5login of production node(s).  .k5login file is shared on all production nodes for servers.

If added to one then you are added to all.   

 

 

Log on  as d0db on any of nodes d0ora1/d0dbsrv4/d0dbsrv5/d0dbsrv6/d0dbsrv2 

cd to the private/log directory  i.e. /d0ora1/home/d0db/private

 

1.      To check if farm Servers (Primary Node d0dbsrv4) had been failoved

                              over to Secondary Node (d0dbsrv6) :

if file ‘~d0db/private/log/dbs_calib_farm_failover_kickoff” file is present then they have been failed over to d0dbsrv5

 

2.      To check if User Servers (Primary Node d0dbsrv5) had been failoved

                              over to Secondary Node (d0dbsrv2) :

if file ‘~d0db/private/log/dbs_calib_user_failover_kickoff” file is present then they have been failed over to d0dbsrv4

 

  1. Log on to hostname as user d0db,  the servers are located on the following machines:
    1. d0ora1 – trigger and rcp (no failover)
    2. d0dbsrv4 – other D0 Db Calib Farm Servers (failover to d0dbsrv6)
    3. d0dbsrv5 – other D0 Db Calib User Servers (failover to d0dbsrv2)

 

 

( Note : to logon to d0dbsrv4 and d0dbsrv5 nodes as d0db account, one has to

have a  root kerberose principle and should be listed in .k5login  file of d0db account ).  

 

Go to the /private area (i.e. /d0ora1/home/d0db/private) on the node where servers need to be started.  Again d0dbsrv5 is Primary node for d0 db Calib Users Servers

And d0dbsrv4 is the primary node for d0 db Calib Farm Servers.

 

  1. If all the servers are being restarted go to Step 6.  If you are restarting ONLY 1 server:  

             copy the bootstrap startup file i.e.   the <hostname>_server_list.txt file to

<hostname>_server_list.txt.sav   

 

Note :  <hostname> is hostname where server(s) running.

 

    example:   cp d0dbsrv4_server_list.txt  to d0dbsrv4_server_list.txt.sav

 

  1. Delete all the D0 database server entries except for the server which needs to be started in file <hostname>_server_list.txt  e.g. d0dbsrv1_server_list.txt . 
    1. Example before editing script:

                                                               i.      dbserverDAN d0_calibration_db_server prd v2_1 d0_config_prd.py 

                                                             ii.      dbserverDAN smt_calibration_db_server prd v2_1 smt_config_prd.py 

                                                            iii.      dbserver config_db_server_old  prd v1_0_7 CONFIG  omni …

    1. Example after editing script:

                                                               i.      dbserver   config_db_server_old prd v1_0_7 CONFIG omni 

*only the server you want to restart is located in the file

 

Note :  Right now for configDbServer v1_0_7 and v2_2 are running.  Some of the applications still using v1_0_7 version. So be careful which version of configDBServer  need to be restarted.

 

  1. Execute the command ups stop db_server_bootstrap  It will stop the  server(s) in the bootstrap file.

 

Note :   if one want to stop the Failed over D0 calib Servers  then

              ups stop db_server_bootstrap –O <node_name>

 

e.g. if D0 calib User Servers(Primary on d0dbsrv5) had been failed over to Secondary

      node (d0dbsrv2) then to stop these servers on d0dbsrv2 node

            ups stop db_server_bootstrap –O d0dbsrv5

 

 

  1. Execute the command ups start db_server_bootstrap  and this will start the server(s) in the bootstrap file.

 

Note :   if one want to start the Failed over D0 calib Servers  then

              ups start db_server_bootstrap –O <node_name>

 

e.g. if D0 calib User Servers(Primary on d0dbsrv5) had been failed over to Secondary

      node (d0dbsrv2) then to start these servers on d0dbsrv2 node

 

            ups start db_server_bootstrap –O d0dbsrv5

 

 

  1. Check the status of the server at:   

http://d0db-prd.fnal.gov/sam_admin/cgi/nameService

 

  1. Copy back <hostname>_server_list.txt.sav  to <hostname>_server_list.txt

 

 

Mail Inquries to :  css-dsg@fnal.gov

 

Creation Date : June 27, 2002

Last Modified:  Apr 30, 2003