Stop and Start the D0 Db Servers (except SAM)
if crashed.
Steps to restart the server in case of
failure :
http://d0db-prd.fnal.gov/sam_admin/cgi/nameService
*This page defines what
hostname/ip the servers are running on
** A RED light indicates a problem with the server.
IP
Address for d0dbsrv4
(Primary
Node for Calib User Servers except RCP and Trigger ) : 131.225.223.130
IP
Address for d0dbsrv5
(Primary
Node for Calib Farm Servers except RCP and Trigger ) : 131.225.223.41
Also to understand if the D0 db
servers have been failed over to Secondary Node:
Note : To logon to
d0dbsrv4/d0dbsrv5/d0dbsrv6/d0dbsrv2 nodes as d0db account, one has to have a
root kerberose principle and should be listed in .k5login file of d0db account. Mail can be sent to css-dsg@fnal.gov for you to be added to
.k5login of production node(s).
.k5login file is shared on all production nodes for servers.
If added to one then you are added to all.
Log
on as d0db on any of nodes
d0ora1/d0dbsrv4/d0dbsrv5/d0dbsrv6/d0dbsrv2
cd to
the private/log directory i.e.
/d0ora1/home/d0db/private
1.
To check if
farm Servers (Primary Node d0dbsrv4) had been failoved
over to Secondary Node (d0dbsrv6) :
if file ‘~d0db/private/log/dbs_calib_farm_failover_kickoff”
file is present then they have been
failed over to d0dbsrv5
2.
To check if
User Servers (Primary Node d0dbsrv5) had been failoved
over to Secondary Node (d0dbsrv2) :
if file ‘~d0db/private/log/dbs_calib_user_failover_kickoff” file is present then they have been failed over to d0dbsrv4
( Note : to logon to d0dbsrv4 and
d0dbsrv5 nodes as d0db account, one has to
have a root kerberose principle and should be listed in .k5login file of d0db account ).
Go to the /private area (i.e.
/d0ora1/home/d0db/private) on the node where servers need to be started. Again d0dbsrv5 is Primary node for d0 db
Calib Users Servers
And d0dbsrv4 is the primary node for d0 db Calib Farm Servers.
copy the bootstrap startup file
i.e. the <hostname>_server_list.txt file to
<hostname>_server_list.txt.sav
Note : <hostname> is hostname where server(s) running.
example: cp
d0dbsrv4_server_list.txt to
d0dbsrv4_server_list.txt.sav
i.
dbserverDAN
d0_calibration_db_server prd v2_1 d0_config_prd.py …
ii.
dbserverDAN
smt_calibration_db_server prd v2_1 smt_config_prd.py …
iii.
dbserver config_db_server_old prd v1_0_7 CONFIG omni …
i.
dbserver config_db_server_old prd v1_0_7 CONFIG
omni …
*only the server you want to
restart is located in the file
Note :
Right now for configDbServer v1_0_7 and v2_2 are running. Some of the applications still using v1_0_7
version. So be careful which version of configDBServer need to be restarted.
Note :
if one want to stop the Failed over D0 calib Servers then
ups stop db_server_bootstrap –O <node_name>
e.g. if D0 calib User Servers(Primary on
d0dbsrv5) had been failed over to Secondary
node (d0dbsrv2) then to stop these servers on d0dbsrv2 node
ups stop db_server_bootstrap –O d0dbsrv5
Note :
if one want to start the Failed over D0 calib Servers then
ups start db_server_bootstrap –O <node_name>
e.g. if D0 calib User Servers(Primary on
d0dbsrv5) had been failed over to Secondary
node (d0dbsrv2) then to start these servers on d0dbsrv2 node
ups start db_server_bootstrap –O d0dbsrv5
http://d0db-prd.fnal.gov/sam_admin/cgi/nameService
Mail Inquries to : css-dsg@fnal.gov
Creation Date : June 27, 2002
Last Modified: Apr 30, 2003