Network Booting FreeBSD

Standard

Installing an operating system is easy for a computer or two.  What about installing for multiples of them?  In some situations, one wants to save money and time from buying that many boot devices and maintain the disk images separately.  Network boot comes handy.  It is relatively easy with FreeBSD, especially since we do not need whatever RAM disk image just to load whatever drivers.

The idea is, we have one computer holding the disk image, and let other computers obtain the image from there and start their journeys.  We first let these computers obtain their network information.  Together with the network configuration, these computers also get the location to download the network boot loader.  With the network boot loader, they will able to mount a network file system.

I tried to avoid GPL software.  Otherwise I would have tried something more integrated.  Anyway, that software piece got discovered huge vulnerabilities recently…

Assumptions

I will assume you have your cloud environment, with a network isolated from elsewhere.  For example, in Virtual Box, create a “host-only” network and plug in your computers (vtnet1, 10.0.250.0/24).  To make dynamic IP address allocation possible, one will want to enable promiscuous mode of the interfaces.  Of course, in addition, one will need a out-going “NAT” network for internet access (vtnet0).  I leave other details as readers’ exercise.

Operating System Files

Similar to setting up a full-blown jail or installing without an installer, one needs to deploy a directory to hold the operating system files.  In this exercise, I randomly picked a directory “/compute”.

# mkdir /compute
# tar Jxf base.txz -C /compute
# tar Jxf kernel.txz -C /compute

Dynamic Host Configuration

Typical computers nowadays can be configured to boot from the network in the BIOS.  The first thing such computers do is to of course obtain network configuration.  Therefore, the dynamic host configuration server is the first thing we need to set.  In this practice, I used the OpenBSD DHCP Daemon.  The usage is similar to that described in the handbook.

# pkg install dhcpd

The file dhcpd.conf(5) “/usr/local/etc/dhcpd.conf” is as follows.  In short, it allows IP addresses be allocated for 7200 seconds per lease.  And for the subnet 10.0.250.0/24, the addresses 10.0.250.10 to 10.0.250.250 can be allocated to the computers dynamically.  The default gateway will be 10.0.250.1.  More importantly, when the computer boots, it should obtain the file “/boot/pxeboot” (from the DHCP server, by default), and mount the NFS location “10.0.250.1:/compute” as root file system.

default lease-time 7200;
max-lease-time 7200;
subnet 10.0.250.0 netmask 255.255.255.0 {
  range 10.0.250.10 10.0.250.250;
  option routers 10.0.250.1;
  option rootpath "10.0.250.1:/compute"
  filename "/boot/pxeboot"
}

And the rc.conf(5) file “/etc/rc.conf” appended as follows.  As you guess it right, just one statement to enable the DHCP server.

dhcpd_enable="YES"

Trivial File Transfer

By no means I say the file transfer protocol is trivial.  The so-called “trivial file transfer” is yet another protocol for transferring file without complicated handshakes.  Here I used the package “tftp-hpa”.

# pkg install tftp-hpa

By default, it uses the directory “/usr/local/tftp”.  But we can act lazy.  The rc.conf(5) gets appended as follows:

tftpd_enable="YES"
tftpd_flags="-s /compute"

If you concern the security, you should make a single directory (like the default /usr/local/tftp) and copy the file “/compute/boot/pxeboot” there.  Right, one file will do.

Network File System

To let the system boot, we prepare a network file system, and make the /compute accessible.  The exports(5) file “/etc/exports” look as follows:

/compute -network 10.0.250.0/24 -alldirs -maproot=root

And the rc.conf(5) is appended as follows.  You may want to refer here if you want to run it behind a firewall.

nfs_server_enable="YES"

Starting the Services

As a friendly reminder, you will need to enable and start the services – the DHCP daemon, the TFTP daemon, and the network file system.  You will need proper firewall rules to allow network traffic.  In particular, you will need UDP port 67, 68, and 69 for DHCP and TFTP to work.

How it Works

DHCP Requests: At the beginning, the computer tries to obtain network configuration by broadcasting requests.  If you are stuck here, try to fix the DHCP daemon.

螢幕快照 2017-10-20 上午12.21.16

TFTP for Boot Loader: After obtaining the network configuration, the computer tries to obtain the boot loader with TFTP.  If you are stuck here, fix the TFTP configurations.

螢幕快照 2017-10-20 上午12.23.35

Network File System: With the first stage boot loader ready, it tries to mount the network file system.  If you are stuck here, check the network file system configurations.

螢幕快照 2017-10-20 上午12.22.14

The Loader: When you see this screen, most of the services, DHCP, FTFP, and NFS, are already used.  If you are stuck beyond this point, read the error message, and good luck…

螢幕快照 2017-10-20 上午12.23.52

Advertisements

Network File System with Firewall in FreeBSD

Standard

Network file system nfs(8) in FreeBSD is built on top of rpc(3) infrastructure where rpcbind(8) daemon is responsible binding the services for the clients.  Together with the companion services rpc.lockd(8)rpc.statd(8), and mountd(8), providing the total service with a firewall is tricky since the ports are different all the time.  In addition, sometimes the services are better to be available to the intranet only, not the internet.

Server Flags

Here is how to lock these services to the non-changing ports and non-wildcard addresses in the rc.conf(5).  In the example, I assumed the IP address to provide the service as “10.0.250.1”:

rpcbind_enable="YES"
rpcbind_flags="-h 10.0.250.1"
rpc_lockd_enable="YES"
rpc_lockd_flags="-h 10.0.250.1 -p 2632"
rpc_statd_enable="YES"
rpc_statd_flags="-h 10.0.250.1 -p 2633"
mountd_enable="YES"
mountd_flags="-h 10.0.250.1 -p 2634"
nfs_server_enable="YES"
nfs_server_flags="-h 10.0.250.1 -t -u"

Network Ports to Open

Here are the ports to open for the aforementioned services, both UDP and TCP.  The firewall can be adjusted accordingly:

Highly Available Network File System

Standard

In the previous article, we discussed the way to create a highly available block device by replication.  We continue and attempt making a network file system (NFS) on top of it.  We first discuss the procedures to start and stop the service.  Then we have the script…  Some parts are deliberately missing due to my conflict of interest.

NFS Configuration

Since it is not our goal here, we only do minimal NFS configuration in this example.  In short, the export(5) file “/etc/exports” is being modified like as follows.  This implies the directory “/nfs” is shared with the given two IP subnets.

/nfs -network=10.65.10.0/24
/nfs -network=127.0.0.0/24

Unlike previous setting, we do not use the “/etc/rc.conf” file to start the service.  This is because we like to control when a service is started, instead of blindly just after boot.  In FreeBSD, services can be started with the “onestart” command.

Firewall Configuration

Configuring NFS for a tight firewall is tricky, because it uses random ports.  For convenience, a simple IP address-based whitelist can be implemented.  (It is possible to fix the ports if you are willing to take extra steps).  In this example, we have the server IP 10.65.10.13 (see later), and the client IP 10.65.10.21.  If you simply do not have a firewall, skip this part.  On the server side, the PF can be configured with:

pass in quick on vtnet1 from 10.65.10.21 to 10.65.10.13 keep state

On the client side, the PF can be configured with:

pass in quick on vtnet1 from 10.65.10.13 to 10.65.10.21 keep state

Starting the Service

When we start the service, we want the following to happen:

  1. Acquire the IP address, say 10.65.10.13, regardless which machine it is running.
  2. Activate the HAST subsystem so to become the primary role.
  3. Wait for the HAST device to be available.  If the device is in secondary role, the device file in “/dev/hast” will not appear so we can go to sleep a while.
  4. Run the file system check just in case the file system was corrupted in the last unmount.
  5. Mount the file system for use (in this example, “/nfs”)
  6. Start the NFS-related services in order: the remote procedural call binding daemon, the mount daemon, the network file system daemon, the statistic daemon, and the lock daemon.
  7. Once the step 5 completes, the service is available to the given clients as instructed to the NFS and allowed by the firewall.

For the inpatient, one can jump to the second last section for the actual source code.

Stopping the Service

Stopping the service is the reverse of starting, except some steps can be less serious.

  1. Stop the NFS-related services in order: the lock daemon, the statistic daemon, the network file system daemon, the mount daemon, and finally the remote procedural call binding daemon.
  2. Unmount the file system.
  3. Make the HAST device in secondary role.
  4. Release the iP address so neighbours can reuse.

Also, one can jump to the second last section for the actual source code.

Service Check Script

There are two types of checking.  The first one ensures all the components (like the IP address, mount point service, etc) are present and valid.  The procedure returns success (zero) only when the components are all turned on.  Whenever a component is missing, it will be reported as a failure (non-zero return code).

The second one ensures all the components are simply turned off, so that the service can be started on elsewhere.  The procedure returns success (zero) only when all the components are turned off.  Whenever a component is present, it will be reported as a failure (non-zero return code).

What is Missing

Once we master how to start and stop the service on one node, we need the mechanism to automatically start and stop the service as appropriate.  In particular, it is utmost important not to run the service concurrently on two hosts, as this may damage the file system and confuse the TCP/IP network.  This part should be done out of the routine script.

The Script

Finally, the script is as follows…

#!/bin/sh -x

start() {
  ifconfig vtnet1 add 10.65.10.13/24
  hastctl role primary nfs_block
  while [ ! -e /dev/hast/nfs_block ]
  do
    sleep 1
  done
  fsck -t ufs /dev/hast/nfs_block
  mount /dev/hast/nfs_block /nfs
  service rpcbind onestart
  service mountd onestart
  service nfsd onestart
  service statd onestart
  service lockd onestart
}

stop() {
  service lockd onestop
  service statd onestop
  service nfsd onestop
  service mountd onestop
  service rpcbind onestop
  umount /nfs
  hastctl role secondary nfs_block
  ifconfig vtnet1 delete 10.65.10.13
}

status() {
  ifconfig vtnet1 | grep 10.65.10.13 && \
  service rpcbind onestatus && \
  showmount -e | grep /nfs && \
  mount | grep /nfs && \
  ls /dev/hast/nfs_block
}

residue() {
  ifconfig vtnet1 | grep 10.65.10.13 || \
  (service rpcbind onestatus && showmount -e | grep /nfs) || \
  mount | grep /nfs || \
  ls /dev/hast/nfs_block
}

clean() {
  residue
  if [ $? -ne 0 ]
  then
    exit 0
  fi
  exit 1
}

if [ "$1" == "start" ]
then
  start
elif [ "$1" == "stop" ]
then
  stop
elif [ "$1" == "status" ]
then
  status
elif [ "$1" == "clean" ]
then
  clean
fi

Testing

To test, fine the designated computer and mount the file system.  Assume the file system has been running on the host “store1”, make a manual failover to see…  The file client does not need to explicitly remount the file system; it cab be remounted automatically.

client# mount 10.65.10.13:/nfs /mnt
client# ls /mnt
.snap
client# touch /mnt/helloworld
store1# ./nfs_service.sh stop
store2# ./nfs_service.sh start
client# ls /mnt
.snap helloworld