Highly Available Network File System

Standard

In the previous article, we discussed the way to create a highly available block device by replication.  We continue and attempt making a network file system (NFS) on top of it.  We first discuss the procedures to start and stop the service.  Then we have the script…  Some parts are deliberately missing due to my conflict of interest.

NFS Configuration

Since it is not our goal here, we only do minimal NFS configuration in this example.  In short, the export(5) file “/etc/exports” is being modified like as follows.  This implies the directory “/nfs” is shared with the given two IP subnets.

/nfs -network=10.65.10.0/24
/nfs -network=127.0.0.0/24

Unlike previous setting, we do not use the “/etc/rc.conf” file to start the service.  This is because we like to control when a service is started, instead of blindly just after boot.  In FreeBSD, services can be started with the “onestart” command.

Firewall Configuration

Configuring NFS for a tight firewall is tricky, because it uses random ports.  For convenience, a simple IP address-based whitelist can be implemented.  In this example, we have the server IP 10.65.10.13 (see later), and the client IP 10.65.10.21.  If you simply do not have a firewall, skip this part.  On the server side, the PF can be configured with:

pass in quick on vtnet1 from 10.65.10.21 to 10.65.10.13 keep state

On the client side, the PF can be configured with:

pass in quick on vtnet1 from 10.65.10.13 to 10.65.10.21 keep state

Starting the Service

When we start the service, we want the following to happen:

  1. Acquire the IP address, say 10.65.10.13, regardless which machine it is running.
  2. Activate the HAST subsystem so to become the primary role.
  3. Wait for the HAST device to be available.  If the device is in secondary role, the device file in “/dev/hast” will not appear so we can go to sleep a while.
  4. Run the file system check just in case the file system was corrupted in the last unmount.
  5. Mount the file system for use (in this example, “/nfs”)
  6. Start the NFS-related services in order: the remote procedural call binding daemon, the mount daemon, the network file system daemon, the statistic daemon, and the lock daemon.
  7. Once the step 5 completes, the service is available to the given clients as instructed to the NFS and allowed by the firewall.

For the inpatient, one can jump to the second last section for the actual source code.

Stopping the Service

Stopping the service is the reverse of starting, except some steps can be less serious.

  1. Stop the NFS-related services in order: the lock daemon, the statistic daemon, the network file system daemon, the mount daemon, and finally the remote procedural call binding daemon.
  2. Unmount the file system.
  3. Make the HAST device in secondary role.
  4. Release the iP address so neighbours can reuse.

Also, one can jump to the second last section for the actual source code.

Service Check Script

There are two types of checking.  The first one ensures all the components (like the IP address, mount point service, etc) are present and valid.  The procedure returns success (zero) only when the components are all turned on.  Whenever a component is missing, it will be reported as a failure (non-zero return code).

The second one ensures all the components are simply turned off, so that the service can be started on elsewhere.  The procedure returns success (zero) only when all the components are turned off.  Whenever a component is present, it will be reported as a failure (non-zero return code).

What is Missing

Once we master how to start and stop the service on one node, we need the mechanism to automatically start and stop the service as appropriate.  In particular, it is utmost important not to run the service concurrently on two hosts, as this may damage the file system and confuse the TCP/IP network.  This part should be done out of the routine script.

The Script

Finally, the script is as follows…

#!/bin/sh -x

start() {
  ifconfig vtnet1 add 10.65.10.13/24
  hastctl role primary nfs_block
  while [ ! -e /dev/hast/nfs_block ]
  do
    sleep 1
  done
  fsck -t ufs /dev/hast/nfs_block
  mount /dev/hast/nfs_block /nfs
  service rpcbind onestart
  service mountd onestart
  service nfsd onestart
  service statd onestart
  service lockd onestart
}

stop() {
  service lockd onestop
  service statd onestop
  service nfsd onestop
  service mountd onestop
  service rpcbind onestop
  umount /nfs
  hastctl role secondary nfs_block
  ifconfig vtnet1 delete 10.65.10.13
}

status() {
  ifconfig vtnet1 | grep 10.65.10.13 && \
  service rpcbind onestatus && \
  showmount -e | grep /nfs && \
  mount | grep /nfs && \
  ls /dev/hast/nfs_block
}

residue() {
  ifconfig vtnet1 | grep 10.65.10.13 || \
  (service rpcbind onestatus && showmount -e | grep /nfs) || \
  mount | grep /nfs || \
  ls /dev/hast/nfs_block
}

clean() {
  residue
  if [ $? -ne 0 ]
  then
    exit 0
  fi
  exit 1
}

if [ "$1" == "start" ]
then
  start
elif [ "$1" == "stop" ]
then
  stop
elif [ "$1" == "status" ]
then
  status
elif [ "$1" == "clean" ]
then
  clean
fi

Testing

To test, fine the designated computer and mount the file system.  Assume the file system has been running on the host “store1”, make a manual failover to see…  The file client does not need to explicitly remount the file system; it cab be remounted automatically.

client# mount 10.65.10.13:/nfs /mnt
client# ls /mnt
.snap
client# touch /mnt/helloworld
store1# ./nfs_service.sh stop
store2# ./nfs_service.sh start
client# ls /mnt
.snap helloworld
Advertisements

3 thoughts on “Highly Available Network File System

  1. Pingback: Highly Available Storage Target on FreeBSD | Virtualisation Works

  2. Pingback: Installing FreeBSD from Scratch and Reinstalling the Boot Loader | Virtualisation Works

  3. Pingback: Highly Available MySQL Server | Virtualisation Works

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s