In the previous article, we discussed the way to create a highly available block device by replication. We continue and attempt making a network file system (NFS) on top of it. We first discuss the procedures to start and stop the service. Then we have the script… Some parts are deliberately missing due to my conflict of interest.
Since it is not our goal here, we only do minimal NFS configuration in this example. In short, the export(5) file “/etc/exports” is being modified like as follows. This implies the directory “/nfs” is shared with the given two IP subnets.
Unlike previous setting, we do not use the “/etc/rc.conf” file to start the service. This is because we like to control when a service is started, instead of blindly just after boot. In FreeBSD, services can be started with the “onestart” command.
Configuring NFS for a tight firewall is tricky, because it uses random ports. For convenience, a simple IP address-based whitelist can be implemented. (It is possible to fix the ports if you are willing to take extra steps). In this example, we have the server IP 10.65.10.13 (see later), and the client IP 10.65.10.21. If you simply do not have a firewall, skip this part. On the server side, the PF can be configured with:
pass in quick on vtnet1 from 10.65.10.21 to 10.65.10.13 keep state
On the client side, the PF can be configured with:
pass in quick on vtnet1 from 10.65.10.13 to 10.65.10.21 keep state
Starting the Service
When we start the service, we want the following to happen:
- Acquire the IP address, say 10.65.10.13, regardless which machine it is running.
- Activate the HAST subsystem so to become the primary role.
- Wait for the HAST device to be available. If the device is in secondary role, the device file in “/dev/hast” will not appear so we can go to sleep a while.
- Run the file system check just in case the file system was corrupted in the last unmount.
- Mount the file system for use (in this example, “/nfs”)
- Start the NFS-related services in order: the remote procedural call binding daemon, the mount daemon, the network file system daemon, the statistic daemon, and the lock daemon.
- Once the step 5 completes, the service is available to the given clients as instructed to the NFS and allowed by the firewall.
For the inpatient, one can jump to the second last section for the actual source code.
Stopping the Service
Stopping the service is the reverse of starting, except some steps can be less serious.
- Stop the NFS-related services in order: the lock daemon, the statistic daemon, the network file system daemon, the mount daemon, and finally the remote procedural call binding daemon.
- Unmount the file system.
- Make the HAST device in secondary role.
- Release the iP address so neighbours can reuse.
Also, one can jump to the second last section for the actual source code.
Service Check Script
There are two types of checking. The first one ensures all the components (like the IP address, mount point service, etc) are present and valid. The procedure returns success (zero) only when the components are all turned on. Whenever a component is missing, it will be reported as a failure (non-zero return code).
The second one ensures all the components are simply turned off, so that the service can be started on elsewhere. The procedure returns success (zero) only when all the components are turned off. Whenever a component is present, it will be reported as a failure (non-zero return code).
What is Missing
Once we master how to start and stop the service on one node, we need the mechanism to automatically start and stop the service as appropriate. In particular, it is utmost important not to run the service concurrently on two hosts, as this may damage the file system and confuse the TCP/IP network. This part should be done out of the routine script.
Finally, the script is as follows…
ifconfig vtnet1 add 10.65.10.13/24
hastctl role primary nfs_block
while [ ! -e /dev/hast/nfs_block ]
fsck -t ufs /dev/hast/nfs_block
mount /dev/hast/nfs_block /nfs
service rpcbind onestart
service mountd onestart
service nfsd onestart
service statd onestart
service lockd onestart
service lockd onestop
service statd onestop
service nfsd onestop
service mountd onestop
service rpcbind onestop
hastctl role secondary nfs_block
ifconfig vtnet1 delete 10.65.10.13
ifconfig vtnet1 | grep 10.65.10.13 && \
service rpcbind onestatus && \
showmount -e | grep /nfs && \
mount | grep /nfs && \
ifconfig vtnet1 | grep 10.65.10.13 || \
(service rpcbind onestatus && showmount -e | grep /nfs) || \
mount | grep /nfs || \
if [ $? -ne 0 ]
if [ "$1" == "start" ]
elif [ "$1" == "stop" ]
elif [ "$1" == "status" ]
elif [ "$1" == "clean" ]
To test, fine the designated computer and mount the file system. Assume the file system has been running on the host “store1”, make a manual failover to see… The file client does not need to explicitly remount the file system; it cab be remounted automatically.
client# mount 10.65.10.13:/nfs /mnt
client# ls /mnt
client# touch /mnt/helloworld
store1# ./nfs_service.sh stop
store2# ./nfs_service.sh start
client# ls /mnt