Monday, November 17, 2008

VCS / SFRAC

VCS

Veritas Cluster

LLT and GRAB

VCS uses two components, LLT and GAB to share data over the private networks among systems.
These components provide the performance and reliability required by VCS.
LLT LLT (Low Latency Transport) provides fast, kernel-to-kernel comms and monitors network connections. The system admin configures the LLT by creating a configuration file (llttab) that describes the systems in the cluster and private network links among them. The LLT runs in layer 2 of the network stack
GAB GAB (Group membership and Atomic Broadcast) provides the global message order required to maintain a synchronised state among the systems, and monitors disk comms such as that required by the VCS heartbeat utility. The system admin configures GAB driver by creating a configuration file ( gabtab).

LLT and GAB files

/etc/llthosts


The file is a database, containing one entry per system, that links the LLT system ID with the hosts name. The file is identical on each server in the cluster.

/etc/llttab


The file contains information that is derived during installation and is used by the utility lltconfig.

/etc/gabtab


The file contains the information needed to configure the GAB driver. This file is used by the gabconfig utility.

/etc/VRTSvcs/conf/config/main.cf


The VCS configuration file. The file contains the information that defines the cluster and its systems.

Gabtab Entries

/sbin/gabdiskconf - i /dev/dsk/c1t2d0s2 -s 16 -S 1123
/sbin/gabdiskconf - i /dev/dsk/c1t2d0s2 -s 144 -S 1124
/sbin/gabdiskhb -a /dev/dsk/c1t2d0s2 -s 16 -p a -s 1123
/sbin/gabdiskhb -a /dev/dsk/c1t2d0s2 -s 144 -p h -s 1124
/sbin/gabconfig -c -n2

gabdiskconf


-i Initialises the disk region
-s Start Block
-S Signature
gabdiskhb (heartbeat disks)


-a Add a gab disk heartbeat resource
-s Start Block
-p Port
-S Signature
gabconfig


-c Configure the driver for use
-n Number of systems in the cluster.

LLT and GAB Commands
Verifying that links are active for LLT lltstat -n
verbose output of the lltstat command lltstat -nvv | more
open ports for LLT lltstat -p
display the values of LLT configuration directives lltstat -c
lists information about each configured LLT link lltstat -l
List all MAC addresses in the cluster lltconfig -a list
stop the LLT running lltconfig -U
start the LLT lltconfig -c
verify that GAB is operating

gabconfig -a

Note: port a indicates that GAB is communicating, port h indicates that VCS is started
stop GAB running gabconfig -U
start the GAB gabconfig -c -n
override the seed values in the gabtab file gabconfig -c -x

GAB Port Memberbership
List Membership

gabconfig -a
Unregister port f /opt/VRTS/bin/fsclustadm cfsdeinit
Port Function a gab driver
b I/O fencing (designed to guarantee data integrity)
d ODM (Oracle Disk Manager)
f CFS (Cluster File System)
h VCS (VERITAS Cluster Server: high availability daemon)
o VCSMM driver (kernel module needed for Oracle and VCS interface)
q QuickLog daemon
v CVM (Cluster Volume Manager)
w vxconfigd (module for cvm)

Cluster daemons
High Availability Daemon had
Companion Daemon hashadow
Resource Agent daemon Agent
Web Console cluster managerment daemon CmdServer

Cluster Log Files
Log Directory /var/VRTSvcs/log
primary log file (engine log file) /var/VRTSvcs/log/engine_A.log

Starting and Stopping the cluster

"-stale" instructs the engine to treat the local config as stale
"-force" instructs the engine to treat a stale config as a valid one
hastart [-stale|-force]

Bring the cluster into running mode from a stale state using the configuration file from a particular server
hasys -force
stop the cluster on the local server but leave the application/s running, do not failover the application/s hastop -local
stop cluster on local server but evacuate (failover) the application/s to another node within the cluster hastop -local -evacuate

stop the cluster on all nodes but leave the application/s running
hastop -all -force

Cluster Status
display cluster summary hastatus -summary
continually monitor cluster hastatus
verify the cluster is operating hasys -display

Cluster Details
information about a cluster haclus -display
value for a specific cluster attribute haclus -value
modify a cluster attribute haclus -modify
Enable LinkMonitoring haclus -enable LinkMonitoring
Disable LinkMonitoring haclus -disable LinkMonitoring

Users
add a user hauser -add
modify a user hauser -update
delete a user hauser -delete
display all users hauser -display

System Operations
add a system to the cluster hasys -add
delete a system from the cluster hasys -delete
Modify a system attributes hasys -modify
list a system state hasys -state
Force a system to start hasys -force
Display the systems attributes hasys -display [-sys]
List all the systems in the cluster hasys -list
Change the load attribute of a system hasys -load
Display the value of a systems nodeid (/etc/llthosts) hasys -nodeid
Freeze a system (No offlining system, No groups onlining)

hasys -freeze [-persistent][-evacuate]

Note: main.cf must be in write mode
Unfreeze a system ( reenable groups and resource back online)

hasys -unfreeze [-persistent]

Note: main.cf must be in write mode

Dynamic Configuration

The VCS configuration must be in read/write mode in order to make changes. When in read/write mode the
configuration becomes stale, a .stale file is created in $VCS_CONF/conf/config. When the configuration is put
back into read only mode the .stale file is removed.
Change configuration to read/write mode haconf -makerw
Change configuration to read-only mode haconf -dump -makero
Check what mode cluster is running in

haclus -display |grep -i 'readonly'

0 = write mode
1 = read only mode
Check the configuration file

hacf -verify /etc/VRTS/conf/config

Note: you can point to any directory as long as it has main.cf and types.cf
convert a main.cf file into cluster commands hacf -cftocmd /etc/VRTS/conf/config -dest /tmp
convert a command file into a main.cf file

hacf -cmdtocf /tmp -dest /etc/VRTS/conf/config

Service Groups
add a service group haconf -makerw
hagrp -add groupw
hagrp -modify groupw SystemList sun1 1 sun2 2
hagrp -autoenable groupw -sys sun1
haconf -dump -makero
delete a service group haconf -makerw
hagrp -delete groupw
haconf -dump -makero
change a service group

haconf -makerw
hagrp -modify groupw SystemList sun1 1 sun2 2 sun3 3
haconf -dump -makero

Note: use the "hagrp -display " to list attributes
list the service groups hagrp -list
list the groups dependencies hagrp -dep
list the parameters of a group hagrp -display
display a service group's resource hagrp -resources
display the current state of the service group hagrp -state
clear a faulted non-persistent resource in a specific grp hagrp -clear [-sys]
Change the system list in a cluster

# remove the host
hagrp -modify grp_zlnrssd SystemList -delete

# add the new host (don't forget to state its position)
hagrp -modify grp_zlnrssd SystemList -add 1

# update the autostart list
hagrp -modify grp_zlnrssd AutoStartList

Service Group Operations
Start a service group and bring its resources online hagrp -online -sys
Stop a service group and takes its resources offline hagrp -offline -sys
Switch a service group from system to another hagrp -switch to
Enable all the resources in a group hagrp -enableresources
Disable all the resources in a group hagrp -disableresources
Freeze a service group (disable onlining and offlining)

hagrp -freeze [-persistent]

note: use the following to check "hagrp -display | grep TFrozen"
Unfreeze a service group (enable onlining and offlining)

hagrp -unfreeze [-persistent]

note: use the following to check "hagrp -display | grep TFrozen"
Enable a service group. Enabled groups can only be brought online

haconf -makerw
hagrp -enable [-sys]
haconf -dump -makero

Note to check run the following command "hagrp -display | grep Enabled"
Disable a service group. Stop from bringing online

haconf -makerw
hagrp -disable [-sys]
haconf -dump -makero

Note to check run the following command "hagrp -display | grep Enabled"
Flush a service group and enable corrective action. hagrp -flush -sys

Resources
add a resource haconf -makerw
hares -add appDG DiskGroup groupw
hares -modify appDG Enabled 1
hares -modify appDG DiskGroup appdg
hares -modify appDG StartVolumes 0
haconf -dump -makero
delete a resource haconf -makerw
hares -delete
haconf -dump -makero
change a resource

haconf -makerw
hares -modify appDG Enabled 1
haconf -dump -makero

Note: list parameters "hares -display "
change a resource attribute to be globally wide hares -global
change a resource attribute to be locally wide hares -local
list the parameters of a resource hares -display
list the resources hares -list
list the resource dependencies hares -dep

Resource Operations
Online a resource hares -online [-sys]
Offline a resource hares -offline [-sys]
display the state of a resource( offline, online, etc) hares -state
display the parameters of a resource hares -display
Offline a resource and propagate the command to its children hares -offprop -sys
Cause a resource agent to immediately monitor the resource hares -probe -sys
Clearing a resource (automatically initiates the onlining) hares -clear [-sys]

Resource Types
Add a resource type hatype -add
Remove a resource type hatype -delete
List all resource types hatype -list
Display a resource type hatype -display
List a partitcular resource type hatype -resources
Change a particular resource types attributes hatype -value

Resource Agents
add a agent pkgadd -d .
remove a agent pkgrm
change a agent n/a
list all ha agents haagent -list
Display agents run-time information i.e has it started, is it running ? haagent -display
Display agents faults haagent -display |grep Faults

Resource Agent Operations
Start an agent haagent -start [-sys]
Stop an agent haagent -stop [-sys]



SFRAC Setup

== Configure IO Fencing (VXFEN) ==

==== Setup the coordg Diskgroup ====
NOTE: You must have an odd number of VxVM controlled devices in the Coordinator Disk Group or VxFEN will not start:

VxFEN Error Message: VXFEN vxfenconfig ERROR V-11-2-1004 There must be an odd number of coordinator disks defined





If for some reason you have not initialized your LUNs for use with Veritas, you will need to do that with vxdisksetup before you can proceed with this step.

# for x in `vxdisk list | grep FAS | awk '{print $1}'`; do vxdisksetup -fi ${x}; done


Create a diskgroup to house the coordinator disks and add at least 3 luns. When choosing disks to use as coordinator disks, you want to distribute them across both heads on your filer.



# vxdg init coordg disk0=FAS9600_6
# vxdg -g coordg adddisk disk1=FAS9600_7
# vxdg -g coordg adddisk disk2=FAS9601_6
# vxdg deport coordg

==== Setup the VxFEN Configuration Files ====

Run the following on all nodes:

# echo coordg > /etc/vxfentab
# echo coordg > /etc/vxfendg

NOTE: For 5.0, you must setup the /etc/vxfenmode file. This is not required in 4.1.

# cp /etc/vxfen.d/vxfenmode_scsi3_dmp /etc/vxfenmode


==== Start Up Fencing ====

Run the following on all nodes:

# vxdctl enable
# /etc/init.d/vxfen start


==== Verify Fencing Has Started ====

* Run gabconfig -a. You should now see port b as well as port a. You should have an entry for each node of the cluster.

# gabconfig -a

GAB Port Memberships
===============================================================
Port a gen 9d4d01 membership 012
Port b gen 9d4d04 membership 012



* Run vxfenadm -d to show the cluster wide fencing status. You should see an entry for each node in the cluster. If you do not, you should investigate why it didn't start by reviewing the log file on the host at /var/VRTSvcs/log/vxfen.log. NOTE: The output below is from 5.0MP1. 4.1MP1 will not show the lines that start with "Fencing".

# /sbin/vxfenadm -d


I/O Fencing Cluster Information:
================================

Fencing Protocol Version: 201
Fencing Mode: SCSI3
Fencing SCSI3 Disk Policy: dmp
Cluster Members:

* 0 (sunv440-shu04)
1 (sunv440-shu05)
2 (sunv440-shu06)

RFSM State Information:
node 0 in state 8 (running)
node 1 in state 8 (running)
node 2 in state 8 (running)




* Run vxfenadm -G all -f /etc/vxfentab. This will show you the keys for each node. The example below is shortened, but you should see keys for each node on each disk.

# vxfenadm -G all -f /etc/vxfentab



Device Name: /dev/vx/rdmp/c0t0d161s2
Total Number Of Keys: 8
key[0]:
Node ID: 1 Node Name: sunv440-shu05
Key Value:
key[1]:
Node ID: 1 Node Name: sunv440-shu05
Key Value:
key[2]:
Node ID: 1 Node Name: sunv440-shu05
Key Value:
key[3]:
Node ID: 1 Node Name: sunv440-shu05
Key Value:
key[4]:
Node ID: 2 Node Name: sunv440-shu06
Key Value:






== Configure The Cluster To Use SCSI3 Fencing and Start The Cluster ==

The cluster should currently be down. GAB, LLT, and Fencing should be running.

==== Modify the main.cf to add UseFence = SCSI3 ====

# vi /etc/VRTSvcs/conf/config/main.cf

include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"

cluster NAME_OF_CLUSTER (
UserNames = { admin = hllSlkLjlFmfLfiF }
Administrators = { admin }
UseFence = SCSI3 (ADD THIS LINE TO main.cf)
)


==== Distribute the main.cf to the other nodes ====

Do the following for the remaining nodes:

# rcp /etc/VRTSvcs/conf/config/main.cf :/etc/VRTSvcs/conf/config


==== Start the cluster====

Run the following on each node in the cluster:

# hastart


==== Verify That Everything Started ====

# hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A sunv440-shu04 RUNNING 0
A sunv440-shu05 RUNNING 0
A sunv440-shu06 RUNNING 0

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 9d4d01 membership 012
Port b gen 9d4d04 membership 012
Port h gen 9d4d07 membership 012


==== Verify That The Cluster Is Using SCSI3 Fencing ====

# haclus -display | grep UseFence

UseFence SCSI3


== Configure The RAC Portion of the Cluster ==

==== Initialize the CVM configuration ====

# cfscluster config

==== Check Cluster Status ====

# cfscluster status

Node : sunv440-shu04
Cluster Manager : running
CVM state : not-running
No mount point registered with cluster configuration


Node : sunv440-shu05
Cluster Manager : running
CVM state : not-running
No mount point registered with cluster configuration


Node : sunv440-shu06
Cluster Manager : running
CVM state : not-running
No mount point registered with cluster configuration


== Setup the SFCS Diskgroup and Volume ==

==== Configure the SFCFS Diskgroup ====

NOTE: This step assumes that you have already initialized the desired luns. The number of luns used in this step is up to the user and test requirements.

# vxdg init SFCFS disk0=
# vxdg -g SFCFS adddisk disk1=
...
...

# vxdg deport SFCFS

==== Import the SFCFS Diskgroup As Shared ====

Determine the master node:

# vxdctl -c mode

mode: enabled: cluster active - MASTER
master: sunv440-shu04

No comments: