Simple Experiment with Jails and Resource Control

Standard

While it has been a good practise to have one computer doing one role, it has also been  asked if a computer can be split for multiple functions as if it is a group of computers.  One would immediately answer virtual machines, but it can be too heavy and complicated, say like having virtual machines inside the virtual machines you ordered in a public cloud.  Another would say container and cgroup in L…, em, what?  In other operating systems, we have other more battle-proven stuffs like zones in Solaris and Jails in FreeBSD.  Jails in FreeBSD had not been too attractive to me because it obviously lacked the resource control so that out-of-control processes can make resource starvation for others.  Thankfully, since FreeBSD 9.0, the rctl(8) has made the world a fairer place.

How simple is it to make a jail and try resource control, and gain confidence it works?  The suggested method in the FreeBSD handbook is to make a directory and reinstall FreeBSD there with the appropriate distribution files.  It is a generic method but it can be more inspiring.  Let’s begin.  In this article, we will demonstrate writing a dummy program, instantiate it in a jail, and fiddle the resource control.

System Setup

First of all, in order to use the resource control, add the following line to the loader.conf(5) /boot/loader.conf.  The file could be missing on a fresh installation, you can create a new one if it is the case.  After this, reboot to make it effective.

kern.racct.enable="1"

The Program and Compilation

Let us go to do some programming.  We want a dummy program that consumes some memory and processing power.  Here is my randomly crafted C program.  It is deliberately running some nonsense loops and leaking some memory.  Once every 10 seconds, it allocates 16 MB (malloc) of memory and uses 1 MB (memset) of it.  (The remaining 15 MB is lazily allocated and forgotten.)  By watching the lines printed, you get a sense how fast the process runs.  The longer it runs, the more memory it consumes.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(int argc, char** argv) {
  time_t then, now;
  int i, j, allocated;
  void* region;

  time(&then);
  time(&now);
  i = j = allocated = 0;
  while (1) {
    while(now - then < 10) {
      for (i = 0; i < (1 << 25); ++i)
        j *= i;
      time(&now);
      printf("now: %ld\n", now);
    }
    then = now;
    region = malloc(1 << 24);
    memset(region, j % 23, (1 << 20));
    allocated += 16;
    printf("allocated: %d\n", allocated);
  }
  return 0;
}

Usually, the program can be compiled simply with clang(1) command.  In other to compile for the jail in the most lazy method, call clang with the “-static” option.  The executable is significantly larger.  Mark down the absolute path of the directory so you can do the next step.

# ls
resource.c
# clang resource.c -o resource -static -Wall
# ls
resource resource.c
# ldd resource
ldd: resource: not a dynamic ELF executable
# pwd
/home/ssching

Running inside a Jail

Then, it’s time to start the jail.  The simplest way is to create a whatever jail(1), encapsulate at the path at the ssching’s home, and execute the executable we compiled.

# jail -c -u ssching path=/home/ssching \
  exec.start='./resource' name=myjail

Open another terminal and issue the top(1) command.  Press key “j” once and you see a process with non-zero jail number.  We see the process is consuming 100% CPU power and the memory usage is increasing.

 PID   JID  ...  SIZE  RES STATE C TIME    WCPU COMMAND
2648    14  ... 8288K 872K CPU1  1 0:09 100.67% resource

Stopping a Jail

Stopping a jail, attached or not, can be as simple as “-r” command.  Upon execution, all the processes in the jail will be killed.

# jail -r myjail

Processor Resource Control

Open yet another terminal and issue the following commands.

# rctl -a jail:myjail:pcpu:deny=50

The usage becomes a bit tamed but it fluctuates a lot from time to time.

 PID   JID ...  SIZE    RES STATE C TIME    WCPU COMMAND     
2648    14 ...  392M 25544K CPU3  3 3:52  50.83% resource

Memory Resource Control

Similarly, the virtual memory usage, or physical memory usage can be constrained as follows.  The virtual memory counts the total memory (SIZE in the top display) allocated, the physical memory counts the actual memory used (RES in the top display).

# rctl -a jail:myjail:vmemoryuse=240M
# rctl -a jail:myjail:memoryuse=120M

If memory usage is beyond the allowed, the system refuses to allocate more memory.  Given the simple design of the program code above, it breaks with signal 11.

now: 1505838082
now: 1505838083
jail: ./resource: exited on signal 11

Resource Control Rules

To check what resource control rules are applied, simply a “rctl” command will do.

# rctl 
jail:myjail:pcpu:deny=50
jail:myjail:vmemoryuse:deny=251658240
jail:myjail:memoryuse:deny=125829120

Clearing Resource Rules

Clearing the existing rules can be as simple as using the “-r” command, for example:

# rctl -r jail:myjail:memoryuse
# rctl -r jail:myjail
# rctl -r jail:

Actual Resource Used

Last but not the last, usage of a jail can be listed.

# rctl -u jail:myjail
cputime=284
datasize=4096
stacksize=0
coredumpsize=0
memoryuse=37724160
memorylocked=0
maxproc=1
openfiles=0
vmemoryuse=595689472
pseudoterminals=0
swapuse=592003072
nthr=1
msgqqueued=0
msgqsize=0
nmsgq=0
nsem=0
nsemop=0
nshm=0
shmsize=0
wallclock=358
pcpu=38
readbps=0
writebps=0
readiops=0
writeiops=0

Other Thoughts

In fact, the resource control can be also applied per process, per user, per group, etc.  The command is mostly similar, just replace “jail” with “process”, “user”, “group”, etc.  Say, a computer is being shared among a few friends.  Even if they do not want to be in jails (who wants?  Pun intended…), their resource usage can be constrained and accounted by the command.

In addition to the processor and memory usage, the resource control can also put limitations on other items like disk read / write (hint: use the word throttle instead of deny), which is useful when the IOPs is counted in a public cloud…

In the next article, we will explore how to put a complicated program, with dynamic linked libraries, into a jail.

Advertisements

Proxy Server with FreeBSD and Squid (Part 2)

Standard

Previously, I discussed how to configure a Squid proxy.  The proxy is opaque that the web browsers have to be configured.  I continue to explain how a proxy can be made transparent; when web browsers go to the Internet, the requests gets intercepted and be processed by the proxy.  Like before, I use PF firewall and let it redirect the packets for me.

Step 1: Configure Network Gateway

In order to configure a a network router, it needs to have two network interfaces, virtual or physical.  One of them connects to the external world (through another router, maybe).  Another one connects to the intranet.  In PF, it is recommended to set up macros to determine the external and internal interfaces.  An example rule set will be as follows, where a Realtek was used as external and a Broadcom as internal.

extif="re0"
intif="bge0"
nat pass on $extif from $intif:network to any -> ($extif)
pass in quick from $intif:network to any
pass out quick

In order for a FreeBSD server act as a router, it has to have the gateway variable enabled in /etc/rc.conf:

gateway_enable="YES"

Once these are configured, reload the firewall rules for a smoke test.  Good luck.

Step 2: Configure Network Clients

Pick a computer and configure its network traffic through the router.  Technically, we change the gateway.

Microsoft Windows: Control Panel > Network and Sharing Centre > Network Interfaces > Properties > TCP/IP Version 4 > Configure > Gateway

Mac OS X: System Preference > Network > Gateway

FreeBSD: Update variable “defaultrouter” in /etc/rc.conf, then reboot

Everything should behave similar, except the network goes through the router.  Hopefully, the network link LEDs could give you some hints.  (Sorry being lazy not telling the proper way…)

Step 3: Packet Redirection and Squid

In PF configuration, add this line right after the NAT rule, and then reload:

rdr pass on $intif proto tcp from any to any port 80 -> ($intif) port 3129

In Squid configuration, add this line right after the original http port statement:

http_port 3129 intercept

I may explain what ‘intercept’ mode means in the next article…

Step 4: Testing

Use the client configured in the step 2 to browse the web.  Like last time, there should be some pages cached.  But make sure you visit pages that are not encrypted (like https); otherwise the proxy will not take effect.

Step 5: To be Continued

In the part 3 of this series, I will explain how to to intercept HTTPS connections as well.

Highly Available Web Pages, Apache and PHP as an Example

Standard

Several weeks ago, I discussed how to have a highly available file storage and a highly available relational database.  With a robust supply of file system and database service, we can stack more services on top.  Today, I take Apache and PHP as an example how to have a highly available web server.  As usual, I use FreeBSD for the purpose.

Why Apache and PHP!?

Quite some people would argue Apache and PHP are outdated.  I am not going to make a comment on this.  I hope you will appreciate the shortness of this article because of this choice.  The concepts you acquire from this example can be applied with your favourite platform.

Active / Active Web Servers

The example today is active / active pair.  The two servers can serve webpages together without error.  In fact, one can have unlimited amount of hosts working together.  In contrast, in the previous examples of NFS, only one server is active and another server is passively receiving updates and standby to take over.

To make this happen, one needs to ensure intermediate state of executions of one servers are saved so they can be taken up by the other servers.  In this example, we will move these state to the highly available storage.  With role separation, those intermedia data will not be lost no matter what happens to the web servers.  Another benefit is that web servers can be easily added (handle more load), replaced (for system updates), or reduced (for economy) without affecting any user sessions.

Session Data

When running a cluster of web hosting servers, one needs to take care of the session data.  HTTP servers are stateless by itself.  Cookies (notes to Europeans) are saved on the client side so servers need not to worry about.  Sessions storage, provided by most web programming environments, expect the storage on the server is consistent and persistent for each user session.

Session data is useful storing information that should not be understood or modify the the web users.  This could be some intermediate game states, some login privileges, etc.  It is therefore important to make sure such data is not lost when a user encounters another web server in the cluster.

Packages

At the time of writing, Apache 2.4 and PHP 7.1 are the newest.

In order to run the PHP in the most lazy way, install “mod_php” packages.  In addition, install some common PHP modules.  There is a moderate collection called “php71-extensions”.  I would also install “php71-mysqli” and “php71-gd”.  They are for database connections and image processing respectively.

# pkg install mod_php71 php71-mysqli php71-gd

What about Apache?  It is installed automatically as an dependency when you request installing the “mod_php”.

Shared Directories

To build on top of the previous example, I am mounting the NFS prepared previously to /mnt/nfs.  On the shell, overtime you want to mount:

# mkdir /mnt/nfs
# mount 10.65.10.13:/nfs /mnt/nfs
#

Alternatively, in the “/etc/fstab”:

# mkdir /mnt/nfs
# cat >> /etc/fstab << EOF
10.65.10.13:/nfs.   /mnt/nfs    nfs    rw    0    0
EOF
# mount /mnt/nfs
#

Configuring the Apache

Modify Apache so that the web page and script locations are in the NFS share.  The file to be modified is “/usr/local/etc/apache24/httpd.conf”.  There are originally two directories for normal web pages and CGI pages in “/usr/local/www/apache24”.  We are moving it to “/mnt/nfs”.

# sed -ibak 's|/usr/local/www/apache24|/mnt/nfs|' \
  /usr/local/etc/apache24/httpd.conf
# cp -a /usr/local/www/apache24/cgi-bin /mnt/nfs/
# cp -a /usr/local/www/apache24/data /mnt/nfs/
# cat >> /etc/rc.conf << EOF
apache24_enable="YES"
EOF
#

Also, update the “/usr/local/etc/apache24/httpd.conf” so that it decodes PHP files.  Modify the following parts manually:

<IfModule dir_module>
    DirectoryIndex index.html index.php
</IfModule>

<FilesMatch "\.php$">
    SetHandler application/x-httpd-php
</FilesMatch>
<FilesMatch "\.phps$">
    SetHandler application/x-httpd-php-source
</FilesMatch>

Configuring the PHP

PHP requires only copying the default configuration, and then modifying the session path.  One can also modify the upload path similarly, but I do not think it is necessary.

# cd /usr/local/etc/
# cp php.ini-production php.ini
# vi php.ini

The line to be added is just as follows:

; http://php.net/session.save-path
;session.save_path = "/tmp"
session.save_path = "/mnt/nfs/tmp"

Example Code

To make effect, reboot Apache24.  Here is a simple code that prints a counter every time it is loaded.  Write a file “/mnt/nfs/data/counter.php”

<?php
session_start();
if (isset($_SESSION['counter']))
  $_SESSION['counter'] += 1;
else
  $_SESSION['counter'] = 1;
print($_SESSION['counter']);
session_write_close();
?>

Testing

In order to test the service, we first turn on the web servers.  Then, we alternatively point a domain name to either one of the IP addresses.  The counter code above is repetitively executed as the map changes.  If successful, the counter continues increasing.

The laziest way to change the domain name mapping is of course editing the host table where the web browser is run.  For BSD and BSD-like systems, edit “/etc/hosts”.  For Windows, edit “C:\Windows\System32\Drivers\etc\hosts”.

Upcoming Steps

Once there are multiple web servers ready, one can consider having a load balancer or a content distribution network.  A load balancer service can be found in some value-added cloud service providers.  (Or else, you can set up one yourself.)  A content distribution network can be found anywhere around the globe.  Put aside their fundamental difference, they both accept multiple web server addresses and route network traffic accordingly.

Troubleshoot

Getting a blank page of PHP:  If you follow my instructions, you have configured PHP in a production mode.  Errors messages are not displayed but saved to the Apache log, which is “/var/log/httpd-error.log” in our context.  Read it and you will get an idea.

Function “session_start” undefined:  Make sure you have installed the package “php71-session” and restarted Apache.  It should be installed as an dependency when you install the package “php71-extensions”.

Appending Distribution Files after Installing FreeBSD

Standard

Previously, it was discussed how to install FreeBSD with the installer.  In the Question 4, The installer allows administrators to select what distribution to be installed – 32-bit compatibility libraries, source code, debug symbols, etc.

Sometimes, maybe due to a mistaken omission, or maybe due to a new purpose, more distribution files have to be added.  In the good old days of FreeBSD 4.x, I could easily run the “/stand/install” again and let it be reconfigured.  The new installer since 9.x becomes unknown to me and I get to do it myself.

Thankfully, it is much easier than one could have thought of.

Downloading the Files

Downloading the distribution file is relatively simple with FTP.  There is an FTP client coming with the default minimal FreeBSD installation.  From there, we can download the distributions files.  For simplicity, I have skipped the directory listing messages.  The filenames will be self-explanatory as you encounter them.

# ftp -a ftp.freebsd.org
Connected to ftp.geo.freebsd.org.
(Output truncated)
220 This is ftp.geo.freebsd.org - hosted at ISC.org
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd pub/FreeBSD/releases
ftp> ls
150 Here comes the directory listing.
(Output truncated)
226 Directory send OK.
ftp> cd amd64
ftp> ls
150 Here comes the directory listing.
(Output truncated)
226 Directory send OK.
ftp> cd 11.0-RELEASE
ftp> ls
150 Here comes the directory listing.
(Output truncated)
226 Directory send OK.
ftp> mget kernel-dbg.txz base-dbg.txz
mget kernel-dbg.txz [anpqy?]? a
Prompting off for duration of mget
229 Entering Extended Passive Mode
150 Opening BINARY mode data connection for kernel-dbg.txz
226 Transfer complete
229 Entering Extended Passive Mode for base-dbg.txz
226 Transfer complete
ftp> exit
221 Goodbye

Installing the Files

If you want to preview what files are inside, you can use “tar tf” command directly, such as…

# tar tf kernel-dbg.tgz
# tar tf base-dbg.tgz

Installing the files is a simple Bzip2 tarball decompression to the root directory.  For example…

# tar jxf kernel-dbg.txz -C /
# tar jxf base-dbg.txz -C /

Here, the “j” stands for Bzip2, “x” stands for decompress, “f” stands for filename, and “C” stands for changing to a given directory (which is the root in our case).

Updating FreeBSD

It is likely the system has been patched since the “release” installation.  To make sure the files you installed match with your updated system, you can consider running the FreeBSD update once.  Please note the commands have to be run on interactive terminals.  Make backups if the system holds files that you cannot lose.

# freebsd-update fetch
# freebsd-update install

Installing without Installer?

Replying questions of the FreeBSD Installer can be boring.  Technically, installing a minimal FreeBSD can be as simple as:

  1. Boot a temporary operating system environment (like live CD)
  2. Partition the drives and install the boot loader (like Question 8 of here)
  3. Download and decompress the distribution files “kernel.txz” and “base.txz”
  4. Configure the essential config files, “/etc/fstab” and “/etc/rc.conf”
  5. Remove any temporary boot media and reboot

Will it work?  Well…

Highly Available MySQL Server

Standard

Previously, I have discussed how to setup a highly available block device and also a highly available file system.  In this article, I further demonstrate how to setup a highly MySQL database service.

Installing the Packages

As usual, one will need to install the package on two hosts and it can be easily done by:

# pkg install mysql57-server

I know there are alternatives.  Forgive my laziness.

Running for the First Time

We are starting MySQL once so that it generates the file structures.  Try to login, and then Ctrl-D to exit.

# service mysql-server onestart
# cat /root/.mysql_secret
# mysql -u root -p
Password: **********
root@localhost [(none)]> ^D
# service mysql-server onestop

It is then discovered (through educated guess) some directories are created in the “/var/db” directory, namely “mysql”, “mysql_tmpdir”, and “mysql_secure”.  Suppose you already have the “/db” mounted (as in the previous article), move them there and make the replacement symbolic links.

# mv /var/db/mysql /db
# mv /var/db/mysql_tmpdir /db
# mv /var/db/mysql_secure /db
# ln -s /db/mysql /var/db/
# ln -s /db/mysql_tmpdir /var/db/
# ln -s /db/mysql_secure /var/db/
# ls -ld /var/db/mysql*
lrwxr-xr-x  1 root  wheel   9 Apr 19 20:37 /var/db/mysql -> /db/mysql
lrwxr-xr-x  1 root  wheel  16 Apr 19 20:37 /var/db/mysql_secure -> /db/mysql_secure
lrwxr-xr-x  1 root  wheel  16 Apr 19 20:37 /var/db/mysql_tmpdir -> /db/mysql_tmpdir

Some would question why not change the configuration for the new paths.  I find it mostly a matter of taste.  If you want to make lives easier for those who have recited the default paths, do make the symbolic links.

Configurations

You will want to modify the configuration file “/usr/local/etc/mysql/my.cnf”.  For your reference, there is a sample file “my.cnf.sample”.  At minimum, you will need to modify the bind address (default 127.0.0.1) so that the service is available not just locally, but to the other computers in the same intranet.

The Script

The script for starting and stopping the MySQL server is simpler than the NFS one and are as follows.  Like last time, automatic switching is skipped due to my conflict of interest.  You will need a mechanism to call “start” and “stop” properly.

#!/bin/sh -x

start() {
 ifconfig vtnet1 add 10.65.10.14/24
 hastctl role primary db_block
 while [ ! -e /dev/hast/db_block ]
 do
 sleep 1
 done
 fsck -t ufs /dev/hast/db_block
 mount /dev/hast/db_block /db
 service mysql-server onestart
}

stop() {
 service mysql-server onestop
 umount /db
 hastctl role secondary db_block
 ifconfig vtnet1 delete 10.65.10.14
}

status() {
 ifconfig vtnet1 | grep 10.65.10.14 && \
 service mysql-server onestatus && \
 ls /dev/hast/db_block
}

residue() {
 ifconfig vtnet1 | grep 10.65.10.14 || \
 service mysql-server onestatus || \
 mount | grep /db || \
 ls /dev/hast/db_block
}

clean() {
 residue
 if [ $? -ne 0 ]
 then
 exit 0
 fi
 exit 1
}

if [ "$1" == "start" ]
then
  start
elif [ "$1" == "stop" ]
then
  stop
elif [ "$1" == "status" ]
then
  status
elif [ "$1" == "clean" ]
then
  clean
fi

Troubleshoot

If there are any issues MySQL fails to start, you can verify its absence with the command “service mysql-server onestatus”.  There are also log files located in the MySQL data directory; in our context, it is “/db/mysql/<hostname>.err”.  Please note the end of the log is most likely a graceful shutdown.  You will need to scroll upwards for the actual reason why the startup failed.

System Performance with FreeBSD (Minecraft Server as Example)

Standard

Quite some time ago, we discussed how to get compile Minecraft and get it running on FreeBSD.  In this article, we take the server as an example how we can monitor system performance.

Minecraft Memory Usage

Minecraft is a Java program.  Java programs generally consume more memory than the counterparts made of unmanaged languages.  Thankfully, Java programs run inside their own sandboxes and have memory usage allocated and constrained.  In the previous article, we defined memory to be 1024 megabytes and expands up to 1024 megabytes only:

java -Xmx1024M -Xms1024M -jar spigot*.jar

It is important not to have Minecraft overrun the system memory.  As folklore, a Java program running on a dedicated computer should not be higher than 60% of total memory.  For example, my virtual machine has 2048 megabytes of memory and 60% of it is about 1200 megabytes.  I deducted myself further for 200 megabytes as safety margin.

General Process Monitoring

FreeBSD provides top(1) utility to check for system saturation and utilisation.  Generally speaking, a system is considered saturated if the number of threads ready to run is higher than the number of processor cores, and considered fully utilised if the utilisation numbers are near 100%.

螢幕快照 2017-06-06 下午10.57.56.png

In the top part, there are three numbers in decimal point, representing the load averages in 1, 5, and 15 minutes.  This is calculated by average number of threads ready to run and is regarded as the saturation of the processors.  In the third and forth row, the overall processor and memory utilisation ratios / rates are shown.

In the picture, the system not yet saturated as there is about 0.6 threads to run per second.  It is under some computation load that around 20% processing power and 80% memory are utilised.  This is a healthy situation that the system is being used, with some slacks to handle possible usage spikes.

In the bottom part, we have detailed breakdown per process.  The fields “TIME” and “WCPU” represents the total time and current portion of processor each process is using. Actual amount of memory usage are represented in the “RES” column.

In the picture, we see the Java process using 44% of current processing power and it has accumulated 295 hours.  Although the Java process is instructed to use only 1024 megabytes of memory, eventually it consumes 1368 megabytes.  This echoes why the folklores recommend only specify 60% of system memory to a Java virtual machine.

System Input / Output Monitoring

FreeBSD provides systat(1) utility for system usage statistics.  Since the general process monitoring can be done by the top(1) command above, it is mostly useful for detailed input and output monitoring.  By default, it shows a “pig” screen that refreshes every 10 seconds.  You can specify what you want to see by specifying in commands, for example, to see network interface usage every second:

systat -ifstat 1

螢幕快照 2017-06-06 下午11.02.03

The screen works like vi(1).  You can press the colon sign (:) and to enter a command.  The first command you want to try is “help”.  It immediate tells you the list of available pages you can switch to.  You can then use the colon sign again and enter the page you want.

I would tell you vmstat is my favourite page.  I learned it on the very first days I was taught FreeBSD.  It shows a lot of comprehensive information from system utilisation to interrupts and disk accesses.

systat -vmstat 1

螢幕快照 2017-06-06 下午10.59.11On the top, we see the system saturation and utilisation as usual.  Immediately under it, we see the detailed breakdown system events like context switches (Csw), traps (Trp), system calls (Sys), etc.  The system is now having thousands of context switch per second.  It would be no good if it were a system for high performance computing, but for our context of gaming with network messages, it is absolutely normal.

Further below, we see an ASCII art of system utilisation break down into system (=), interrupts (+), user space applications (>) and niced user applications (-).  The system is using mild amount of processing power and most of it are for the user space, which is a good thing.

The second bottom section shows the name caches of the virtual file system.  For a gaming system that uses few files, you can expect the hit rate be near 100%.  Otherwise you may want to pay more attention to the name caches.  In the picture, we see the system rarely searches for a file and those requests are handled by the cache perfectly.

The bottom section shows the disk utilisation.  Utilisation and bandwidth of each of the virtual disks are shown.  In the picture, we see the system rarely access the files.

The right most column shows the interrupt statistics.  The “timer” interrupts happens almost all the time in order to hint the operating system to context switch and update system clocks.  It used to be 100 or 1000 per processor core; thankfully, with the advent of more advanced system clocks, systems no longer need to tick as frequent as before.  In the picture, the network card (virtio-network) requires quite some interrupts handling.  As long as the network card interrupts does not go to numbers like 50000 or 100000, they are most likely normal.

The list of pages available in the tool, as of today, are:

  • pigs: shows the processes which consume the most processing power
  • vmstat: (as discussed above)
  • swap: shows the system swap situation
  • zarc: shows the ZFS adaptive read cache situation
  • iostat: raw disk input and output statistics
  • netstat: network socket statistics, such as buffered bytes for each of the connections
  • sctp: stream control transport protocol statistics
  • tcp: transport layer protocol statistics
  • ip: internet protocol statistics
  • ip6: internet protocol statistics for IPv6
  • icmp: internet control message statistics such as ping, etc
  • icmp6: internet control message statistics for IPv6
  • ifstat: raw network interface utilisation statistics

Conclusion

In this article, we go through some performance monitoring tools that come with FreeBSD.  The general process information can be listed by the top(1) command, where you can understand the system saturation and utilisation, and also list the resource consuming processes.  More detailed resource utilisations like network, disks, hardware interrupts can be found in the systat(1) command.  If in doubt, the “vmstat” page can be a great starting point to look for congested system resources.

Highly Available Network File System

Standard

In the previous article, we discussed the way to create a highly available block device by replication.  We continue and attempt making a network file system (NFS) on top of it.  We first discuss the procedures to start and stop the service.  Then we have the script…  Some parts are deliberately missing due to my conflict of interest.

NFS Configuration

Since it is not our goal here, we only do minimal NFS configuration in this example.  In short, the export(5) file “/etc/exports” is being modified like as follows.  This implies the directory “/nfs” is shared with the given two IP subnets.

/nfs -network=10.65.10.0/24
/nfs -network=127.0.0.0/24

Unlike previous setting, we do not use the “/etc/rc.conf” file to start the service.  This is because we like to control when a service is started, instead of blindly just after boot.  In FreeBSD, services can be started with the “onestart” command.

Firewall Configuration

Configuring NFS for a tight firewall is tricky, because it uses random ports.  For convenience, a simple IP address-based whitelist can be implemented.  In this example, we have the server IP 10.65.10.13 (see later), and the client IP 10.65.10.21.  If you simply do not have a firewall, skip this part.  On the server side, the PF can be configured with:

pass in quick on vtnet1 from 10.65.10.21 to 10.65.10.13 keep state

On the client side, the PF can be configured with:

pass in quick on vtnet1 from 10.65.10.13 to 10.65.10.21 keep state

Starting the Service

When we start the service, we want the following to happen:

  1. Acquire the IP address, say 10.65.10.13, regardless which machine it is running.
  2. Activate the HAST subsystem so to become the primary role.
  3. Wait for the HAST device to be available.  If the device is in secondary role, the device file in “/dev/hast” will not appear so we can go to sleep a while.
  4. Run the file system check just in case the file system was corrupted in the last unmount.
  5. Mount the file system for use (in this example, “/nfs”)
  6. Start the NFS-related services in order: the remote procedural call binding daemon, the mount daemon, the network file system daemon, the statistic daemon, and the lock daemon.
  7. Once the step 5 completes, the service is available to the given clients as instructed to the NFS and allowed by the firewall.

For the inpatient, one can jump to the second last section for the actual source code.

Stopping the Service

Stopping the service is the reverse of starting, except some steps can be less serious.

  1. Stop the NFS-related services in order: the lock daemon, the statistic daemon, the network file system daemon, the mount daemon, and finally the remote procedural call binding daemon.
  2. Unmount the file system.
  3. Make the HAST device in secondary role.
  4. Release the iP address so neighbours can reuse.

Also, one can jump to the second last section for the actual source code.

Service Check Script

There are two types of checking.  The first one ensures all the components (like the IP address, mount point service, etc) are present and valid.  The procedure returns success (zero) only when the components are all turned on.  Whenever a component is missing, it will be reported as a failure (non-zero return code).

The second one ensures all the components are simply turned off, so that the service can be started on elsewhere.  The procedure returns success (zero) only when all the components are turned off.  Whenever a component is present, it will be reported as a failure (non-zero return code).

What is Missing

Once we master how to start and stop the service on one node, we need the mechanism to automatically start and stop the service as appropriate.  In particular, it is utmost important not to run the service concurrently on two hosts, as this may damage the file system and confuse the TCP/IP network.  This part should be done out of the routine script.

The Script

Finally, the script is as follows…

#!/bin/sh -x

start() {
  ifconfig vtnet1 add 10.65.10.13/24
  hastctl role primary nfs_block
  while [ ! -e /dev/hast/nfs_block ]
  do
    sleep 1
  done
  fsck -t ufs /dev/hast/nfs_block
  mount /dev/hast/nfs_block /nfs
  service rpcbind onestart
  service mountd onestart
  service nfsd onestart
  service statd onestart
  service lockd onestart
}

stop() {
  service lockd onestop
  service statd onestop
  service nfsd onestop
  service mountd onestop
  service rpcbind onestop
  umount /nfs
  hastctl role secondary nfs_block
  ifconfig vtnet1 delete 10.65.10.13
}

status() {
  ifconfig vtnet1 | grep 10.65.10.13 && \
  service rpcbind onestatus && \
  showmount -e | grep /nfs && \
  mount | grep /nfs && \
  ls /dev/hast/nfs_block
}

residue() {
  ifconfig vtnet1 | grep 10.65.10.13 || \
  (service rpcbind onestatus && showmount -e | grep /nfs) || \
  mount | grep /nfs || \
  ls /dev/hast/nfs_block
}

clean() {
  residue
  if [ $? -ne 0 ]
  then
    exit 0
  fi
  exit 1
}

if [ "$1" == "start" ]
then
  start
elif [ "$1" == "stop" ]
then
  stop
elif [ "$1" == "status" ]
then
  status
elif [ "$1" == "clean" ]
then
  clean
fi

Testing

To test, fine the designated computer and mount the file system.  Assume the file system has been running on the host “store1”, make a manual failover to see…  The file client does not need to explicitly remount the file system; it cab be remounted automatically.

client# mount 10.65.10.13:/nfs /mnt
client# ls /mnt
.snap
client# touch /mnt/helloworld
store1# ./nfs_service.sh stop
store2# ./nfs_service.sh start
client# ls /mnt
.snap helloworld