Low Bandwidth Networking

Last update: 2006-11-20 by David Pierce <chikuru@mali.geekcorps.org>

The Need for a Low Bandwidth Connection

Geekcorps' Low Bandwidth Networking project provides Internet connections to radio stations in the remote villages of Bourem Inaly, near Timbuktu, and Bourem Foghas, near Gao. The radio stations use their connections for news, weather, and email.

Currently, the only way to provide connectivity to these villages is via satelite. One way to do this is with a VSAT terminal. This allows unlimited monthly usage at rates of 128k bits per second or better, but costs about $4000 to set up and over $300 per month.

The Low Bandwidth Networking project instead uses Inmarsat's R-BGAN satellite modems, which cost $45 for activation, $36 per month plus $6 per megabyte of network traffic. The modems themselves can be found for around $400. The challenge is to provide useful Internet connectivity while limiting network usage to just 200 kilobytes per day for a monthly bill of $72.

200 kilobytes does not go very far. Simply accessing the Yahoo! homepage can use as much as 250 kilobytes. We have used the following strategies to make the connection both useful and affordable:

  1. Cache web pages on the client computer at the radio station.
  2. Force the client computer to communicate only with a central server.
  3. Have the server strip images and ads from web pages.
  4. Have the server email news summaries to the client.
  5. Transfer email between client and server just once per day.
  6. Strip attachments from email messages.
  7. Compress all communication between client and server.
  8. Disable web access when the daily quota is exceeded.
  9. Provide continuous feedback of daily network usage to help the user learn to efficiently use the Internet.

These strategies have been deployed at the villages on low-power VIA Desert PC computers designed by Geekcorps. The server, located in the U.S., will be the same computer that currently manages email for the Cybertigi project.

This document describes, in fairly exhaustive detail, how we configured the server and clients. Our intention is to make it possible for others to use these techniques and expand on them. The implementation of the project is based on Requirements document. This project has built on the work of many previous Geekcorps staff and volunteers. Please see the Acknowledgements section for a list.

Low Bandwidth Networking Project Components

Server-side Components

Squid Web Cache

A squid (http://www.squid-cache.org/) web cache handles HTTP requests from the client computers. When a request is received, it is either processed locally, if the request is to the local server, or is redirected to the loband filter.

Configuration file /etc/squid/squid.conf:

redirect_program /etc/squid/loband_redirector.rb
This tells squid to use the loband_redirector.rb script to rewrite URLs. This script causes requests to be processed by www.loband.org, which removes images and advertisements.

#http_access deny all
This line is commented out so that access is permitted from any address. This way, we don't have to know the IP addresses of the clients.

visible_hostname gcmserver.geekcorps.org
This is the hostname that will appear to the customer in error messages.

Loband Filter

Currently, squid redirects web requests to the loband (http://www.loband.org/) filter, which fetches the requested web page, removes the ads and pictures, and returns it to the squid cache, which both caches it and sends it to the client computer.

For faster response and more control, we could install loband on the server. A drawback would be that if loband.org makes improvements to their implementation, we would not benefit.

The Postfix Mail Server

The server computer, gcmserver.geekcorps.org, is registered as the mail server for the domains used by the radio stations. For example, domain gcmclient.radio.org.ml is registered (using zoneedit.com) as a domain associated with the same IP address as gcmserver.geekcorps.org.

The mail server is implemented with the Postfix (http://www.postfix.org/) mail transfer agent. Postfix receives mail for the clients on port 25, runs them through Spamassassin (http://spamassassin.apache.org/), then sends them to a custom mail filter script. Postfix receives mail from the clients via UUCP and forwards it on to the recipients.

Configuration file /etc/postfix/main.cf:

See the manual page for postconf(5).

myhostname = gcmserver.geekcorps.org
Self-explanatory.

relay_domains = mali.geekcorps.org, radio.org.ml, .radio.org.ml, etc
The domains that postfix serves.

Configuration file /etc/postfix/master.cf:

See the manual page for master(5). After changing this file, run "postfix reload" to have the changes take effect.

smtp inet n - n - - smtpd -o content_filter=spamchk:dummy
This tells postfix to run the smtp daemon with the spamchk filter, defined later in the file.

uucp unix - n n - - pipe flags=Fqhu user=uucp argv=/usr/bin/bzip-mail-filter.rb $sender $nexthop!rmail ($recipient)
This tells postfix to use the custom mail filter bzip-mail-filter.rb instead of uux to spool the messages. The mail filter will make the call to uux.

spamchk unix - n n - 10 pipe flags=Rq user=filter argv=/usr/local/bin/spamchk -f ${sender} -- ${recipient}
This defines the spamchk filter.

Script /usr/local/bin/spamchk:

This filter, called from postfix, runs each email message through /usr/bin/spamassasin, which adds an "X-Spam-Level" entry to the message's header. The spamchk filter reads this header and uses it to determine whether to forward the message to /usr/sbin/sendmail or move it to the /var/spool/spams folder.

Todo: add a cron job to remove old spam from /var/spool/spams.

Script /usr/bin/bzip-mail-filter.rb:

This filter strips attachments from each email message, compresses the body of the message using bzip, encodes the message with printable characters using /usr/bin/enc64.rb, then forwards the result to /usr/bin/uux, which writes it to the UUCP spool so that it can be picked up the next time the client connects with a uucico call.

Configuration file /etc/postfix/transport:

This file tells postfix, for each email domain, how to process emails for that domain. See the manual page for transport(5). After making changes to this file, it is necessary to run "postmap /etc/postfix/transport" to update the transport.db file, used by postfix.

gcmclient.radio.org.ml uucp:gcmclient.radio.org.ml
This tells postfix to use the UUCP filter, defined in the master.cf configuration file, for messages sent to gcmclient.radio.org.ml.

Configuration file /etc/postfix/virtual:

See the manual page for virtual(5).

/(gcmclient)@radio.org.ml/ $radio@${1}.radio.org.ml
This tells postfix to forward mail for gcmclient@radio.org.ml to radio@gcmclient.radio.org.ml, the default user account configured on the client computer.

/(feeds)@radio.org.ml/ radio@gcmclient.radio.org.ml,etc.
This defines the list of recipients for news feeds sent automatically by the news-aggregator script.

References
http://www.postfix.org/faq.html#uucp-only
http://www.postfix.org/faq.html#internet-uucp

UUCP

UUCP is used to send mail to and receive mail from the client computers. Mail is transferred in both directions when the client calls uucico to connect to the server on port 540.

Configuration file /etc/uucp/config:

nodename gcmserver.geekcorps.org
This is the UUCP system name for the computer.

Configuration file /etc/uucp/passwd:

gcmclient <password>
This password must be the same as the one in file gcmclient:/etc/uucp/call.

Configuration file /etc/uucp/port:

port TCP
type tcp

This defines the "TCP" port.

Configuration file /etc/uucp/sys:

system gcmclient.radio.org.ml
protocol igt
commands rnews rmail
time any

This tells UUCP about the client, tells which protocols to support, which commands to support, and what time of day to accept connections.

References

http://www.faqs.org/docs/Linux-HOWTO/UUCP-HOWTO.html

The Apache Web Server

To understand how the news aggregator works, it's necessary to first understand how the web server on gcmserver is configured.

Configuration file /etc/apache/httpd.conf:

This file defines a virtual host for gcmserver.geekcorps.org whose document root is /var/www/html/mali.geekcorps.org/htdocs/. This is the same root that is used for mali.geekcorps.org.

Configuration file /var/www/html/mali.geekcorps.org/htdocs/.htaccess:

php_flag zlib.output_compression on
php_value zlib.output_compression_level 2

This configures the web server to use compression.

The News Aggregator

Configuration file /etc/cron.d/rbgan_rss:

This tells the cron daemon when to run the news aggregator script.

News aggregator script /usr/bin/katapulte-aggregator-rbgan.rb:

This sends an email to feeds@radio.org.ml, which is an alias defined in /etc/postfix/virtual. This email contains a list of links to news articles available from server gcmserver.geekcorps.org. The news articles come from a variety of sources as defined in the configuration file /etc/kamille/katapulte_feeds.db.

Each link in the summary email is of the form http://gcmserver.geekcorps.org/rbgan/news.php?id=<id> in which the id uniquely identifies the article. The PHP script /var/www/html/mali.geekcorps.org/htdocs/rbgan/news.php calls script news_allafrica.rb in the same directory. This script fetches the requested page from web site fr.allafrica.com, removes the ads and pictures, and returns it.

Todo: We could greatly increase the responsiveness of the calls to news.php by caching all of the articles on the server instead of fetching and processing them on demand.

The Ruby scripts make use of several libraries, developed at Geekcorps, that have been added to folder /usr/lib/ruby/1.8: cancan.rb, homoplate.rb, melanie.rb, mimine.rb.

Configuration file /etc/cron.daily/rbgan-purge:

This tells the cron daemon to call the script /usr/bin/rbgan-purge-rss.rb once each day. This script removes, from /var/spool/uucp, news-summary emails that are more than two days old. It uses the configuration file /etc/rbgan_stations.conf to determine the set of subdirectories of /var/spool/uucp to search.

Client-side Components

The Computer

The client computers are VIA Desert PCs, designed for low power, high temperatures, and dusty conditions. Each has a 2 gigabyte flash drive and 1 gigabyte of RAM.

The Operating System

The client computers run the Dapper Drake release of Ubuntu Linux. Because there is plenty of RAM but not much disk space, no swap space is used. To minimize disk usage, the server version of Dapper is installed, and then a minimal set of components is installed, including those found in Geekcorps' Kunnafonix Linux distribution. These components include:

For a complete list of configuration files and scripts installed on the client computers, refer to this script: cp_cfg_to_client.sh

Networking

Configuration file /etc/network/interfaces (See manual for interfaces(5).):

During testing with a LAN, use dhcp. For use with an Inmarsat R-BGAN modem, switch to using a static address, and explicitly specify the DNS server addresses. According to sources at mvsnet.net, it's necessary to use address 192.168.128.101 if one wants to be able to remotely log into the client computer using SSH.

After switching to using a static address and with the Ethernet port not connected, run "/etc/init.d/networking restart". Run "ifconfig eth0" to confirm that the address is assigned.

Configuration file /etc/resolv.conf:

nameserver 123.45.67.89
nameserver 123.45.67.90

When using the R-BGAN modem, use this file to specify the addresses of the name servers. Get these from your Service Provider.

Configuration file /etc/sysctl.conf:

net/core/rmem_default = 6000
net/core/rmem_max = 6000
net/core/wmem_default = 6000
net/core/wmem_max = 6000

These lines are appended to the /etc/sysctl.conf file during installation. They configure small TCP window sizes to limit the amount of data sent from server through satellite modem to client after the gatekeeper shuts down the connection. The variables in /etc/sysctl.conf are applied by init file /etc/rcS.d/S17procps.sh, which is called well before networking is started.

References

Linux Tweaking: Raising network limits for broadband under Linux

The R-BGAN Satellite Modem

The satellite modem must be configured with the proper GPS coordinates, pointed at a satellite, and configured with the correct network settings. Confirm settings with service provider.

The Firewall

A firewall is implemented using netfilter/iptables (http://www.netfilter.org/), a standard component of current Linux systems.

Script /etc/iptables/firewall:

This script is installed with the call

update-rc.d iptables start 37 S . start 38 0 6 .

which causes it to be run with the start option during computer startup right before networking is started. It causes the script to run with the stop option during computer shutdown right after networking is stopped. For runlevels 0 and 6 we use start instead of stop because that is what /etc/init.d/networking script uses. See Section 9.3 of the Debian Policy manual for a discussion of init.d scripts.

The firewall prevents external connections to the client except via SSH (to be used for remote configuration or trouble shooting, if needed). It prevents all but a limited set of outgoing connections. It implements a transparent (or "interception") proxy, redirecting HTTP traffic to the local squid cache. This way, even if the client were to use a different browser or were to turn off the explicit proxying in Firefox, the local proxy would still be used.

Todo: Without explicit proxying, DNS lookups are not cached (a benifit that squid provides). Consider adding a local DNS server to cache lookups.

When testing the firewall, use tcpdump to monitor network traffic. Below is an example of a command (all one line) to list short descriptions of TCP packets and display a running total of bytes received or sent.

tcpdump -i eth0 -n -N -l -q -e -tttt | tee tcp.dump | awk -W interactive '{ bytes += substr($8,0,length($8)) ; print "Total: ", bytes, " Current: ", $8, " ", $9, " ", $10, " ", $11, " ", $12 }'

References

http://tldp.org/HOWTO/TransparentProxy.html

The Squid Web Cache

A squid (http://www.squid-cache.org/) web cache saves requested pages locally. When a request is received, it is either ignored, if the request is to a blacklisted site, or is redirected to the Squid cache on the server. Currently, no web sites are blacklisted. After modifying file /etc/squid/squid.conf, use "/etc/init.d/squid reload" to apply the changes.

Configuration file /etc/squid/squid.conf:

cache_peer 123.45.67.89 parent 3128 0 no-query default
This tells the client squid to forward all requests to the squid on the server for all requests (see Squid FAQ 4.9: "How do I configure Squid forward all requests to another proxy?").

cache_dir ufs /var/spool/squid 72 16 256
This changes the size of the cache from the default of 100 Mb to 72 Mb.

cache_access_log none
Since no one would be reading the access log, we don't use one.

cache_store_log none
Since no one would be reading the store log, we don't use one.

http_access allow all
Because we are using transparent proxying and the REDIRECT iptables target produces packets with the source IP address equal to the DHCP-assigned address rather than 127.0.0.1 and since we don't want to make assumptions about the address range used by our ISP, we will allow open access and use the firewall to prevent connections from outside the local host.

httpd_accel_host virtual
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

These are needed for transparent proxying. See Section 4 of the Transparent Proxy with Linux and Squid mini-HOWTO.

never_direct allow all
Forward all requests only to parent proxy (see Squid FAQ 4.9).

coredump_dir /var/spool/squid
Leave coredumps in the first cache directory instead of the directory from which squid was started.

References

Squid FAQ: Concepts of Interception Caching

The Gatekeeper Daemon

See the installation makefile for the complete list of files and their locations on the client computer.

Daemon /usr/bin/gatekeeperd:

This daemon monitors network traffic. It publishes a summary of the current usage and daily limit to the file /var/gatekeeper/stats and maintains its state across reboots in file /var/gatekeeper/state. It closes the connection between local Squid cache and the remote Squid cache on the server when the limit is exceeded. At the beginning of the day, it resets the daily limit and enables the connection to the remote Squid cache. Any bandwidth leftover from one day carries over to the next.

To compute network traffic, it uses module /usr/lib/ruby/1.8/netusage.rb, which reads the receive and transmit byte counters in the /proc/net/dev system file.

Script /etc/iptables/connect:

This is called by the gatekeeper to open the connection to the server squid.

Script /etc/iptables/disconnect:

This is called by the gatekeeper to close the connection to the server squid. With the connection closed, email and SSH continue to function.

Script /etc/init.d/gatekeeperd:

This script is installed with the call "update-rc.d gatekeeperd start 38 S . start 37 0 6 .". This causes it to be run with the start option during computer startup right after the firewall is started. It causes the script to run with the stop option during computer shutdown right before the firewall is stopped.

The Gatekeeper Display

A custom gatekeeper widget runs in the task bar. It uses a progress bar to display the current usage for the day and the daily limit. It gives the customer immediate feedback on his network usage and helps him learn how to use the Internet efficiently.

Bonobo server /usr/lib/bonobo/servers/gatekeeper-applet.server:

This XML file registers the gatekeeper display applet with the Gnome desktop manager. It is patterned after the examples that shipped with Ruby: /usr/share/doc/libpanel-applet2-ruby/examples/*.server

Gnome Panel Applet /usr/lib/gnome-panel/gatekeeper-applet.rb:

This applet reads the gatekeeper's stats file once per second and updates the usage and limit values displayed both as text and as a progress bar. It is patterned after the examples that shipped with Ruby: /usr/share/doc/libpanel-applet2-ruby/examples/*.rb

The gatekeeper display widget may be added to a Gnome panel as follows: Right-click on panel » Add a new element » XfApplet » Add » Gatekeeper » OK. In the actual installation, this widget is automatically included as part of the desktop configuration (with configuration file /etc/skel/.config/xfce4/panel/xfapplet-*.rc).

The Firefox Web Browser

Configuration file /etc/firefox/pref/firefox.js:

The Firefox web browser (http://www.mozilla.com/firefox/) is configured to use the local Squid cache: Edit » Preferences » Connection Settings » Manual Proxy Configuration:

HTTP Proxy: 127.0.0.1
Port: 3128
No Proxy for localhost, 127.0.0.1, 192.168.128.100, webmail, whatismyip.org

Though HTTP redirection is built into the firewall, we use explicit proxying since it lets the local Squid service cache DNS requests as well as HTTP requests.

After configuring one instance of Firefox, use it's preference file to modify the default preferences file /etc/firefox/pref/firefox.js.

Additional configuration (after installation):

Remove the default bookmarks since they link to web sites that would consume a lot of bandwidth. The "Latest BBC Headlines" bookmark uses a lot of bandwidth just to fetch the headlines, which it likes to do every few minutes.

Bookmarks » Manage Bookmarks » Delete "Latest BBC Headlines" and "Getting Started"

View » Toolbars » uncheck "Bookmarks Toolbar"

The Apache Web Server

The Apache web server is required by the SquirrelMail reader. The following commands are executed as part of the installation to prevent apache from using disk space for its log files:

rm /var/log/apache2/{access,error}.log
ln -s /dev/null /var/log/apache2/access.log
ln -s /dev/null /var/log/apache2/error.log

The SquirrelMail Mail Reader

The customer uses SquirrelMail (http://www.squirrelmail.org/), accessible via Firefox, to handle his email. SquirrelMail requires an IMAP mail server (as opposed to a POP mail server).

Configuration file /etc/apache2/sites-enabled/01_squirrel:

This is a modified version of the file /etc/squirrelmail/apache.conf that ships with SquirrelMail. It defines the virtual host "webmail":

<VirtualHost webmail>
  DocumentRoot /usr/share/squirrelmail
  ServerName webmail
</VirtualHost>

Configuration file patch /usr/share/squirrelmail/src/compose.php.patch:

This modifies the configuration file /usr/share/squirrelmail/src/compose.php so that file attachments cannot be used.

The Dovecot IMAP mail server

Dovecot (http://www.dovecot.org/) is the IMAP mail server. It is configured to manage the mail in the local mail spool.

Configuration file /etc/dovecot/dovecot.conf.

The Postfix Mail Service

A Postfix (http://www.postfix.org/) service handles mail sent by the customer. It forwards the mail to the local UUCP spool.

Configuration file /etc/cron.d/uucp-mail:

This runs /etc/init.d/uucico-auto once per hour. It determines the last time that uucico was called and calls it if it hasn't yet been called today and if it is possible to ping the server. The uucico-auto script is also called during initialization. This guarantees that uucico will be called every day the customer connects to the Internet (provided he is connected when the computer starts up or at the top of the hour).

The command uucico ("uu call-in/call-out") sends mail in the local UUCP spool to the server and retrieves mail from the server. Retrieved mail is processed with rmail, procmail, and a custom filter that unzips the compressed messages.

Configuration file /etc/postfix/aliases:

This empty file tells postfix not to use any aliases.

Configuration file /etc/postfix/main.cf:

mydestination = gcmclient.radio.org.ml, etc.
This tells postfix which emails to process locally.

default_transport = uucp:gcmserver.geekcorps.org
This tells postfix to use gcmserver to process all nonlocal email.

inet_interfaces = loopback-only
This tells postfix that it will not receive any email from nonlocal interfaces.

Configuration file /etc/postfix/master.cf:

This file should be the same as the one installed.

UUCP

Configuration file /etc/uucp/call:

gcmserver.geekcorps.org gcmclient <password>
This password must be the same as the one in file gcmserver:/etc/uucp/passwd. This file should only be readable by the uucp user (or group).

Configuration file /etc/uucp/config:

nodename gcmclient.radio.org.ml
This is the UUCP system name for the computer.

Configuration file /etc/uucp/port:

port TCP
type tcp

Tells UUCP what port TCP uses. Port TCP is referenced in file /etc/uucp/sys.

Configuration file /etc/uucp/sys:

protocol t
Use the t protocol, appropriate for connections over TCP.

system gcmserver.geekcorps.org
etc
Configure parameters for connections to gcmserver.

The Installation Process

Developing a robust installer is a work in progress. Since we have only one server, we have not attempted to operationalize the server installation. This section gives an overview of the steps we use to configure our client computers.

Explicitly Copied Files

During the installation process, the cp_cfg_to_client.sh script is used to copy files to the client computer from an installation directory with a hierarchical structure that mirrors the file hierarchy on the client. Additional configuration is accomplished with the configure_client.sh script.

Manual Configuration

The following edits are performed by hand after running cp_cfg_to_client.sh and before running configure_client.sh:

  1. /etc/mailname : insert the fully qualified domain name (fqdn) for the client computer.
  2. /etc/hosts : replace host name and fqdn.
  3. /etc/uucp/call : provide login information: <server fqdn> <host name> <password>
  4. /etc/uucp/config : provide fqdn: nodename <client fqdn>
  5. /etc/postfix/main.cf : use client and server fqdns

Geekcorps .deb Packages

During the installation process the following packages created at Geekcorps are installed.

libnetusage-ruby1.8_0.1-1_all.deb
ruby-gatekeeper_0.9-1_all.deb

Use "dpkg -X <deb file> <directory>" to extract the files to a directory without installing them.

Standard Packages

Other packages used in the installation are available from the standard Ubuntu repositories. The complete list of packages is given by the apt_install file.

Future Work

Installer

  1. Develop a complete installer on CDs requiring no downloads from the Internet.

Compression

  1. Upgrade the squid server when compression is supported.
  2. More effectively compress emails. Currently, only the bodies are compressed and only one at a time.

Web Content

  1. Use web pages targeted for PDAs and cell phones, e.g., http://google.com/imode.
  2. Improve or replace loband. The following table shows that loband doesn't always perform well:

    Web site
    Net usage
    Net usage with loband filter
    mali.geekcorps.org
    105k
    92k
    yahoo.fr
    240k
    97k
    google.com
    17k
    5k
    wikipedia.org
    98k
    15k
    hotmail.com
    134k
    N/A
    google.com/mail 465k N/A
    yahoo.com
    244k
    300k

Acknowledgements

The work described in this document is heavily influenced by Ian Howard's paper Autonomous and Remote Computing Platforms for Rural Installations in Lesser Developed Countries. It builds on the work done by many past and present Geekcorps volunteers and staff members (please let us know about any lacunae, misattributions, or misspellings):

The VIA Desert PC

Ian Howard, Sebastian Henschell, Frederic Renet, Amadou Konaté, Jean-Philippe Dion, Frederic Renet

The Kunnafonix Distribution of Linux

Ian Howard, Amadou Konaté, Kasper Souren, Sebastian Henschell

The Cybertigi Project

Matt Berg, Brennan Casey, Frederic Renet, Renaud Gaudin, and Ludovic Nadjindoroum

The R-BGAN Low Bandwidth Networking Project

Ian Howard, Moussa Keita, Ibrahim Touré, Rian Aldridge, Frederic Renet, Jean-Philippe Dion, Amadou Konaté, Stephane Nicolas, Matt Berg, David Pierce, and Renaud Gaudin