Linux

IntelliJ IDEA and the ClassNotFoundException

When compiling nested Maven projects in Idea, sometimes the compiler complains about a missing class file.

This occurs on several occasions, depending which part of a project is compiled and what dependencies have been considered. If the project is large this can easily happen, when a specific class should be compiled without having the complete context available. Besides the tipps such as invalidate caches and the ones I found here and here, editing the build configuration of a project helps. Add the a task “Make Project” and the correct class files should be compiled and available.

Hikari Connection Pooling with a MySQL Backend, Hibernate and Maven

Conection Pooling?

JDBC connection pooling is a great concept, which improves the performance of database driven applications by reusing connections. The benefit from connection pools is that the cost of creating and closing connections is avoided, by reusing connections from a pool of available connections. Database systems such as MySQL also assign database resources by limiting simultaneous connections. This is another reason, why connection pools have benefits in contrast to opening and closing individual connections.

Dipping into Pools

There exists a selection of different JDBC compatible connection pools which can be used more or less interchangeable. The most widely used pools are:

Most of these pools work in a very similar way. In the following tutorial, we are going to take out HikariCP for a spin. It is simple to use and claims to be very fast. In the following we are going to setup a small project using the following technologies:

  • Java 8
  • Tomcat 8
  • MySQL 5.7
  • Maven 3
  • Hibernate 5

and of course an IDE of your choice (I have become quite fond of IntelliJ IDEA Community Edition).

Project Overview

In this small demo project, we are going to write a minimalistic Web application, which simply computes a new random number for each request and stores the result in a database table. We use Java and store the data by using the Hibernate ORM framework.We also assume, that you have a running Apache Tomcat Servlet Container and also a running MySQL instance available.

In the first step, I created a basic Web project by selecting the Maven Webapp archetype, which then creates a basic structure we can work with.

Adding the Required Libraries

After we created the initial project, we need to add the required libraries. We can achieve this easily with Maven, by adding the dependency definitions to our pom.xml file. You can find these definitions at maven central. The build block contains the plugin for deploying the application at the Tomcat server.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>at.stefanproell</groupId>
  <artifactId>HibernateHikari</artifactId>
  <packaging>war</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>HibernateHikari Maven Webapp</name>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
      <dependency>
          <groupId>org.apache.tomcat</groupId>
          <artifactId>tomcat-servlet-api</artifactId>
          <version>7.0.50</version>
      </dependency>
      <dependency>
          <groupId>mysql</groupId>
          <artifactId>mysql-connector-java</artifactId>
          <version>5.1.39</version>
      </dependency>
      <dependency>
          <groupId>org.hibernate</groupId>
          <artifactId>hibernate-core</artifactId>
          <version>5.2.0.Final</version>
      </dependency>
      <dependency>
          <groupId>com.zaxxer</groupId>
          <artifactId>HikariCP</artifactId>
          <version>2.4.6</version>
      </dependency>
  </dependencies>
    
  <build>
    <finalName>HibernateHikari</finalName>
      <plugins>
          <plugin>
              <groupId>org.apache.tomcat.maven</groupId>
              <artifactId>tomcat7-maven-plugin</artifactId>
              <version>2.0</version>
              <configuration>
                  <path>/testapp</path>
                  <update>true</update>

                  <url>http://localhost:8080/manager/text</url>
                  <username>admin</username>
                  <password>admin</password>

              </configuration>

          </plugin>
          <plugin>
              <groupId>org.apache.maven.plugins</groupId>
              <artifactId>maven-war-plugin</artifactId>
              <version>2.4</version>

          </plugin>
      </plugins>
  </build>
</project>

Now we have all the libraries we need available and we can begin with implementing the functionality.

The Database Table

As we want to persist random numbers, we need to have a database table, which will store the data. Create the following table in MySQL and ensure that you have a test user available:

CREATE TABLE `TestDB`.`RandomNumberTable` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `randomNumber` INT NOT NULL,
  PRIMARY KEY (`id`));```


## POJO Mojo: The Java Class to be Persisted

Hibernate allows us to persist Java objects in the database, by annotating the Java source code. The following Java class is used to store the random numbers that we generate.

@Entity @Table(name="RandomNumberTable”, uniqueConstraints={@UniqueConstraint(columnNames={“id”})}) public class RandomNumberPOJO { @Id @GeneratedValue(strategy= GenerationType.IDENTITY) @Column(name="id”, nullable=false, unique=true, length=11) private int id;

@Column(name="randomNumber", nullable=false)
private int randomNumber;

public int getId() {
    return id;
}

public void setId(int id) {
    this.id = id;
}

public int getRandomNumber() {
    return randomNumber;
}

public void setRandomNumber(int randomNumber) {
    this.randomNumber = randomNumber;
}

}



The code and also the annotations are straight forward. Now we need to define a way how we can connect to the database and let Hibernate handle the mapping between the Java class and the database schema we defined before.

## Hibernate Configuration

Hibernate looks for the configuration in a file called hibernate.cfg.xml by default. This file is used to provide the connection details for the database.

    <property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
    <property name="hibernate.connection.provider_class">com.zaxxer.hikari.hibernate.HikariConnectionProvider</property>
    <property name="hibernate.hikari.dataSource.url">jdbc:mysql://localhost:3306/TestDB?useSSL=false</property>
    <property name="hibernate.hikari.dataSource.user">testuser</property>
    <property name="hibernate.hikari.dataSource.password">sEcRet</property>
    <property name="hibernate.hikari.dataSourceClassName">com.mysql.jdbc.jdbc2.optional.MysqlDataSource</property>
    <property name="hibernate.hikari.dataSource.cachePrepStmts">true</property>
    <property name="hibernate.hikari.dataSource.prepStmtCacheSize">250</property>
    <property name="hibernate.hikari.dataSource.prepStmtCacheSqlLimit">2048</property>
    <property name="hibernate.hikari.dataSource.useServerPrepStmts">true</property>
    <property name="hibernate.current_session_context_class">thread</property>

</session-factory>

The file above contains the most essential settings. We specify the database dialect that we speak `org.hibernate.dialect.MySQLDialect`, define the connection provider class (the Hikari CP) with `com.zaxxer.hikari.hibernate.HikariConnectionProvider` and provide the URL to our MySQL database (`jdbc:mysql://localhost:3306/TestDB?useSSL=false`) including the username and password for the database connection. Alternatively, you can also define the same information in the hibernate.properties file.

## The Session Factory

We need to have a session factory, which initializes the database connection and the connection pool as well as handles the interaction with the database server. We can use the following class, which provides the session object for these tasks.

import javax.servlet.ServletContextEvent; import javax.servlet.ServletContextListener; import javax.servlet.annotation.WebListener;

import org.hibernate.SessionFactory; import org.hibernate.boot.registry.StandardServiceRegistryBuilder; import org.hibernate.cfg.Configuration; import org.hibernate.service.ServiceRegistry; import org.jboss.logging.Logger;

@WebListener public class HibernateSessionFactoryListener implements ServletContextListener {

public final Logger logger = Logger.getLogger(HibernateSessionFactoryListener.class);

public void contextDestroyed(ServletContextEvent servletContextEvent) {
    SessionFactory sessionFactory = (SessionFactory) servletContextEvent.getServletContext().getAttribute("SessionFactory");
    if(sessionFactory != null && !sessionFactory.isClosed()){
        logger.info("Closing sessionFactory");
        sessionFactory.close();
    }
    logger.info("Released Hibernate sessionFactory resource");
}

public void contextInitialized(ServletContextEvent servletContextEvent) {
    Configuration configuration = new Configuration();
    configuration.configure("hibernate.cfg.xml");
    // Add annotated class
    configuration.addAnnotatedClass(RandomNumberPOJO.class);

    ServiceRegistry serviceRegistry = new StandardServiceRegistryBuilder().applySettings(configuration.getProperties()).build();
    logger.info("ServiceRegistry created successfully");
    SessionFactory sessionFactory = configuration
            .buildSessionFactory(serviceRegistry);
    logger.info("SessionFactory created successfully");

    servletContextEvent.getServletContext().setAttribute("SessionFactory", sessionFactory);
    logger.info("Hibernate SessionFactory Configured successfully");
}

}



This class provides two so called contexts, where the session gets initialized and a second one where it gets destroyed. The Tomcat Servlet container automatically calls these depending on the state of the session. You can see that the filename of the configuration file is provided (<span class="lang:default decode:true crayon-inline">configuration.configure(&#8220;hibernate.cfg.xml&#8221;);`) and that we tell Hibernate, to map our RandomNumberPOJO file (`configuration.addAnnotatedClass(RandomNumberPOJO.class);`). Now all that is missing is the Web component, which is waiting for our requests.

## The Web Component

The last part is the Web component, which we kept as simple as possible.

import org.hibernate.Session; import org.hibernate.SessionFactory; import org.hibernate.Transaction; import javax.persistence.TypedQuery; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse;

import java.io.IOException; import java.io.PrintWriter;

import java.util.List; import java.util.Random;

public class HelloServlet extends HttpServlet { public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { PrintWriter out = res.getWriter(); addRandomNumber(req); out.println(“There are " + countNumbers(req) + " random numbers”);

    List<RandomNumberPOJO> numbers = getAllRandomNumbers(req,res);

    out.println("Random Numbers:");
    out.println("----------");

    for(RandomNumberPOJO record:numbers){
        out.println("ID: " + record.getId() + "\t :\t" + record.getRandomNumber());
    }

    out.close();

}

/**
 * Create a new random number and store it the database
 * @param request
 */
private void addRandomNumber(HttpServletRequest request){
    SessionFactory sessionFactory = (SessionFactory) request.getServletContext().getAttribute("SessionFactory");

    Session session = sessionFactory.getCurrentSession();
    Transaction tx = session.beginTransaction();
    RandomNumberPOJO randomNumber = new RandomNumberPOJO();
    Random rand = new Random();
    int randomInteger = 1 + rand.nextInt((999) + 1);

    randomNumber.setRandomNumber(randomInteger);
    session.save(randomNumber);
    tx.commit();
    session.close();
}

/**
 * Get a list of all RandomNumberPOJO objects
 * @param request
 * @param response
 * @return
 */
private List<RandomNumberPOJO> getAllRandomNumbers(HttpServletRequest request, HttpServletResponse response){
    SessionFactory sessionFactory = (SessionFactory) request.getServletContext().getAttribute("SessionFactory");
    Session session = sessionFactory.getCurrentSession();
    Transaction tx = session.beginTransaction();
    TypedQuery<RandomNumberPOJO> query = session.createQuery(
            "from RandomNumberPOJO", RandomNumberPOJO.class);

    List<RandomNumberPOJO> numbers =query.getResultList();



    tx.commit();
    session.close();

    return numbers;


}

/**
 * Count records
 * @param request
 * @return
 */
private int countNumbers(HttpServletRequest request){
    SessionFactory sessionFactory = (SessionFactory) request.getServletContext().getAttribute("SessionFactory");
    Session session = sessionFactory.getCurrentSession();
    Transaction tx = session.beginTransaction();

    String count = session.createQuery("SELECT COUNT(id) FROM RandomNumberPOJO").uniqueResult().toString();

    int rowCount = Integer.parseInt(count);

    tx.commit();
    session.close();
    return rowCount;
}

}



This class provides the actual servlet and is executed whenever a user calls the web application. First, a new RandumNumberPOJO object is instantiated and persisted. We then count how many numbers we already have and then we fetch a list of all existing records.

The last step before we can actually run the application is the definition of the web entry points, which we can define in the file called web.xml. This file is already generated by the maven achetype and we only need to add a name for our small web service and provide a mapping for the entry class.

HikariCP Test App

<servlet>
    <servlet-name>hello</servlet-name>
    <servlet-class>HelloServlet</servlet-class>
</servlet>

<servlet-mapping>
    <servlet-name>hello</servlet-name>
    <url-pattern>/hello</url-pattern>
</servlet-mapping>

```

Compile and Run

We can then  compile and deploy the application with the following command:

mvn clean install org.apache.tomcat.maven:tomcat7-maven-plugin:2.0:deploy -e

This will compile and upload the application to the Tomcat server and we can then use our browser, open the URL http://localhost:8080/testapp/hello  to create and persist random numbers by refreshing the page. The result will look similar like this:

Encrypt a USB Drive (or any other partition) Using LUKS

Did you ever want to feel like secret agent or do you really need to transport and exchange sensitive data? Encrypting your data is not much effort and can be used to protect a pen drive or any partition and the data on it from unauthorized access. In the following example you see how to create an encrypted partition on a disk. Note two things: If you accidentally encrypt the wrong partition, the data is lost. For ever. So be careful when entering the commands below. Secondly, the method shown below only protects the data at rest. As soon as you decrypt and mount the device, the data can be read from everyone else if you do not use correct permissions.

Preparation

Prepare a mount point for your data and change ownership.

# Create a mount point
sudo mkdir /media/cryptoUSB
# Set permissions for the owner
sudo chown stefan:stefan /media/cryptoUSB

Create an Encrypted Device

Encrypt the device with LUKS. Note that all data on the partition will be overwritten during this process.

# Create encrypted device 
sudo cryptsetup --verify-passphrase luksFormat /dev/sdX -c aes -s 256 -h sha256

# From the man page:
       --cipher, -c 
              Set the cipher specification string.
       --key-size, -s 
              Sets  key  size in bits. The argument has to be a multiple of 8.
              The possible key-sizes are limited by the cipher and mode used.
       --verify-passphrase, -y
              When interactively asking for a passphrase, ask for it twice and
              complain  if  both  inputs do not match.
       --hash, -h 
              Specifies the passphrase hash for open (for  plain  and  loopaes
              device types).

# Open the Device
sudo cryptsetup luksOpen /dev/sdX cryptoUSB
# Create a file system (ext3)
sudo mkfs -t ext3 -m 1 -O dir_index,filetype,sparse_super /dev/mapper/cryptoUSB
# Add a label
sudo tune2fs -L Crypto-USB /dev/mapper/cryptoUSB
# Close the devicesudo cryptsetup luksClose cryptoUSB

Usage

The usage is pretty simple. With a GUI you will be prompted for decrypting the device. At the command line, use the following commads to open and decrypt the device.

# Open the Device
sudo cryptsetup luksOpen /dev/sdcX cryptoUSB
# Mount it
sudo mount /dev/mapper/cryptoUSB /media/cryptoUSB

When you are finished with your secret work, unmount and close the device properly.

sudo umount /media/cryptoUSB 
sudo cryptsetup luksClose cryptoUSB

Secure Automated Backups of a Linux Web Server with Rrsync and Passwordless Key Based Authentication

Backups Automated and Secure

Backing up data is an essential task, yet it can be cumbersome and requires some work. As most people are lazy and avoid tedious tasks wherever possible, automation is the key, as it allows us dealing with more interesting work instead. In this article, I describe how a Linux Web server can be backed up in a secure way by using restricted SSH access to the rsync tool. I found a great variety of useful blog posts, which I will reuse in this article.

This is what we want to achieve:

  • Secure data transfer via SSH
  • Passwordless authentication via keys
  • Restricted rsync access
  • Backup of all files by using a low privileged user

In this article, I will denote the client which should be backed up WebServer. The WebServer contains all the important data that we want to keep. The BackupServer is responsible for fetching the data in a pull manner from the WebServer.

On the BackupServer

On the BackupServer, we create a key pair without a password which we can use for authenticating with the WebServer. Details about passwordless authentication are given here.

# create a password less key pair
ssh-keygen -t rsa # The keys are named rsync-backup.key.public and rsync-backup.key.private

On the WebServer

We are going to allow a user who authenticated with her private key to rsync sensitive data from our WebServer to the BackupServer, This user should have a low privileged account and still being able to backup data which belongs to other users. This capability comes with a few security threats which need to be mitigated. The standard way to backup data is rsync. The tool can be potentially dangerous, as it allows the user to write data to an arbitrary location if not handled correctly. In order to deal with this issue, a restricted version of rsync exists, which locks the usage of the tool to a declared directory: rrsync.

Obtain Rrsync

You can obtain rrsync from the developer page or extract it from your Ubuntu/Debian distribution as described here. With the following command you can download the file from the Web page and store it as executable.

sudo wget https://ftp.samba.org/pub/unpacked/rsync/support/rrsync -O /usr/bin/rrsync
sudo chmod +x /usr/bin/rrsync

Add a Backup User

First, we create a new user and verify the permissions for the SSH directory.

sudo adduser rsync-backup # Add a new user and select a strong password
su rsync-backup # change into new account
ssh rsync-backup@localhost # ssh to some location e.g.  such that the .ssh directory is created
exit
chmod go-w ~/ # Set permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

Create a Read Only of the Data You Want to Backup

I got this concept from this blog post. As we want to backup also data from other users, our backup user (rsync-backup) needs to have read access to this data. As we do not want to change the permissions for the rsync-backup user directly in the file system, we use bindfs to create read only view of the data we want to backup. We will create a virtual directory containing all the other directories that we want to backup. This directory is called `/mnt/Backups-Rsync-Readonly`` . Instead of copying all the data into that directory, which would be a waste of space, we link all the other directories into the backup folder and then sync this folder to the BackupServer.

One Time Steps:

The following steps create the directory structure for the backup and set the links to the actual data that we want backup. With this method, we neither need root, sudo or any advanced permissions. We simply create a readonly view of the data where the only user with access is rsync-backup.

sudo apt-get install acl bindfs # Install packages
sudo mkdir /mnt/Backups-Rsync-Readonly # Create the base directory
sudo chown -R rsync-backup /mnt/Backups-Rsync-Readonly # Permissions
sudo mkdir /mnt/Backups-Rsync-Readonly/VAR-WWW # Create subdirectory for /var/www data
sudo mkdir /mnt/Backups-Rsync-Readonly/MySQL-Backups # Create subdirectory for MySQL Backups
sudo setfacl -m u:rsync-backup:rx /mnt/Backups-Rsync-Readonly/ # Set Access Control List permissions for read only
sudo setfacl -m u:rsync-backup:rx /mnt/Backups-Rsync-Readonly/MySQL-Backups
sudo setfacl -m u:rsync-backup:rx /mnt/Backups-Rsync-Readonly/VAR-WWW

Testrun

In order to use these directories, we need to mount the folders. We set the permissions for bindfs and establish the link between the data and our virtual backup folders.

sudo bindfs -o perms=0000:u=rD,force-user=rsync-backup /var/www /mnt/Backups-Rsync-Readonly/VAR-WWW
sudo bindfs -o perms=0000:u=rD,force-user=rsync-backup /Backup/MySQL-Dumps /mnt/Backups-Rsync-Readonly/MySQL-Backups

These commands mount the data directories and create a view. Note that these commands are only valid until you reboot. If the above works and the rsync-backup user can access the folder, you can add the mount points to fstab to automatically mount them at boot time. Unmount the folders before you continue with sudo umount /mnt/Backups-Rsync-Readonly/*  .

Permanently Add the Virtual Folders

You can add the folders to fstab like this:

# Backup bindfs 
/var/www    /mnt/Backups-Rsync-Readonly/VAR-WWW fuse.bindfs perms=0000:u=rD,force-user=rsync-backup 0   0
/Backups/MySQL-Dumps    /mnt/Backups-Rsync-Readonly/MySQL-Backups fuse.bindfs perms=0000:u=rD,force-user=rsync-backup 0   0

Remount the directories with sudo mount -a .

Adding the Keys

In the next step we add the public key from the BackupServer to the authorized_keys file from the rsync-backup user at the WebServer. On the BackupServer, cat the public key and copy the output to the clipboard.

ssh user@backupServer
cat rsync-backup.key.public

Switch to the WebServer and login as rsync-backup user. Then add the key to the file ~/.ssh/authorized_keys.
The file now looks similar like this:

ssh-rsa AAAAB3N ............ fFiUd rsync-backup@webServer```


We then prepend the key with the only command this user should be able to execute: rrsync. We add additional limitations for increasing the security of this account. We can provide an IP address and limit the command execution further. The final file contains the following information:

command=”/usr/bin/rrsync -ro /mnt/Backups-Rsync-Readonly”,from="192.168.0.10”,no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding ssh-rsa AAAAB3N ………… fFiUd rsync-backup@webServer



Now whenever the user rsync-backup connects, the only possible command is rrsync. Rrsync itself is limited to the directory provided and only has read access. We also verify the IP address and restrict the source of the command.

#### Hardening SSH

Additionall we can force the rsync-backup user to use the keybased authentication only. Additionally we set the IP address restriction for all SSH connections in the sshd_config as well.

AllowUsers rsync-backup@192.168.0.10 Match User rsync-backup PasswordAuthentication no



## Backing Up

Last but not least we can run the backup. To start synching we login into the BackupServer and execute the following command. There is no need to provide paths as the only valid path is already defined in the authorized_key file.

rsync -e “ssh -i /home/backup/.ssh/rsync-backup.key.private” -aLP  –chmod=Do+w rsync-backup@webServer: .



# Conclusion

This article covers how a backup user can create backups of data owned by other users without having write access to the data. The backup is transferred securely via SSH and can run unattended. The backup user is restricted to using rrsync only and we included IP address verification. The backup user can only create backups of directories we defined earlier.

Add your Spotify / Streaming Account to the Pi Musicbox in a Secure Way With Device Passwords

In a recent article I wrote about the old Raspberry Pi, which serves its duty as my daily Web radio. The Pi MusicBox natively supports a bunch of streaming services, which improves the experience if you already have a streaming account, by providing your custom playlists on any HDMI capable hifi system. Unfortunately, the passwords are stored in plaintext, which is not a recommended practice for sensitive information. Especially if you use your Facebook credentials for services such as Spotify.

Most streaming services offer device passwords, which are restricted accounts where you can assign a dedicated username and password. Having separate credentials in the form of API keys for your devices is good practices, as it does not allow a thief to get hold of your actual account password, but only read access to your playlists. Also Spotify provides device passwords, but at the time of writing of this article, the assignment of new passwords simply did not work. A little googling revealed that the only possible way at the moment is using Facebook and its device passwords for the service. As Spotify uses Facebooks Authentication service, the services can exchange information about authorized users.

In the Settings, go to the Security panel and create a new password for apps. Name the app accordingly and provide a unique password.

Then, open the Pi MusicBox interface and add the Emailaddress you registered with facebook and provide the newly created app password.

You can then enjoy your playlists in a secure way. You will receive a warning about the connection, which is an indicator that it worked.

A Reasonable Secure, Self-Hosted Password Database with Versioning and Remote Access

The average computer users needs to memorize at least 17 passwords for private accounts. Power users need to handle several additional accounts for work too and memorizing (good and complex) passwords quickly becomes a burden if not yet impossible. To overcome the memory issue, there exists a variety of tools, which allow to store passwords and associated metadata in password stores. Typically, a password manager application consists of a password file, which contains the passwords and the metadata in structured form, and an application, assisting the user in decrypting and encrypting the passwords. A typical example is Keepass, which is an open source password management application. Keypass uses a master password in order to encrypt the password file. An additional key file can be used in order to increase security by requiring a second factor to open the password database. There exists a very large variety of ports of this software, which allow to open, edit and store passwords on virtually any platform. As the passwords are stored in a single file, a versioning mechanism is required, which allows to track changes in the passwords on all devices and merge them together in order to keep the synchronized. There also exist online services which handle versioned password storage, but obviously this requires to give away sensitive information and to trust the provider for handling the passwords safely. Storing the encrypted password file in a cloud drive such as Dropbox, Google Drive or Microsoft Azure also solves the versioning issue partially, but still the data is out there on foreign servers. For this reason, the new Raspberry Pi Zero is a low cost, low power device, which can be turned into a privately managed and reasonable secure, versioned password store under your own control.

What is needed?

  1. A Raspberry Pi (in fact, a Linux system of any kind, in this example we use a new Zero Pi)
  2. Power supply
  3. SD micro card
  4. USB Hub
  5. Wifi Dongle
  6. USB Keyboard

Preparing the Raspberry Pi Zero

The Raspbian operating system can be easily installed by dumping the image to the SD micro card. As the Zero Pi does not come with an integrated network interface, a Wifi dongle can be used for enabling wireless networking. You can edit the config file  directly on the SD card, by opening it on a different PC with any editor, and provide the SSID and the shared secret already in advance.

# File /etc/wpa_supplicant/wpa_supplicant.conf
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
    ssid="MyWIFI
    psk="SECRET"
}

Then place the card again in the Pi and boot it with a keyboard and wifi dongle attached and the Pi connected to a screen. Boot the device and login with standard credentials, which are the user name pi and the password raspberry.

sudo adduser stefan # add new user
sudo apt-get install openssh-server git-core # Install ssh server and git
passwd # change the default password
sudo adduser stefan sudo # add the new user to the sudoers

In the next step, it is recommended to use a static IP address for the Pi, as we need to configure port forwarding for a specific IP address for the router in a later step. Open the interfaces file and provide a static IP address as follows:

# File: /etc/network/interfaces
allow-hotplug wlan0
iface wlan0 inet static
    address 192.168.0.100
    netmask 255.255.255.0
    gateway 192.168.0.1
    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

You can then remove the HDMI cable and also the keyboard, as now SSH is available via the static IP address we just defined above.  The next step covers the installation of the [Git server][1] and the configuration of public key authentication.

sudo adduser git # add a new git user
su git # change into the git account
cd /home/git # change to home directory
mkdir .ssh # create the directory for the keys
chmod 700 .ssh # secure permissions
touch .ssh/authorized_keys  # create file for authorized keys
chmod 600 .ssh/authorized_keys # secure permissions for this file```


We are now ready to create a key pair consting of a private and a public key. You can do this on your normal pc or on the Pi directly.

ssh-keygen -t rsa # Create a key pair and provide a strong password for the private key```

Note that you can provide a file name during the procedure. The tool creates a key pair consisting of a private and a public key. The public key ends with the suffix pub.

# Folder ~/Passwordstore $ ll
insgesamt 32
drwxr-xr-x  2 stefan stefan 4096 Mär 13 22:40 .
drwxr-xr-x 10 stefan stefan 4096 Mär 13 22:38 ..
-rw-------  1 stefan stefan 1766 Mär 13 22:40 pi_git_rsa
-rw-r--r--  1 stefan stefan  402 Mär 13 22:40 pi_git_rsa.pub

If you created the key files on a different PC than the Pi, you need to upload the public key to the Pi. We can do this with the following command:

cat ~/Passwordstore/pi_git_rsa.pub | ssh git@192.168.0.100 "cat >&gt;  ~/.ssh/authorized_keys"```


If you generated the keys directly on the Pi it is sufficient to cat the key into the file directly. After you managed this step, verify that the key has been copied correctly. If the file looks similar like the following example, it worked.

git@zeropi:~/.ssh $ cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDZ7MZYvI……..wnQqchM1 stefan@pc



We can the test key based SSH authentication with the following command.

ssh -i pi_git_rsa git@192.168.0.100 # connect with using the private key```

You are then prompted to connect to the Pi by using the private key password you specified earlier. Note that this password differs from the one we created for the git user. A less secure but more convenient solution is to leave the password empty during the key pair creation process. If the password has not been set, then everyone can connect to the Pi who gets hold of the private key.  By the way, additional interesting facts about passwords can be found here.

In order to increase convenience, you can add a short cut for this connection, by editing the /home/user/.ssh/config file. Simply add the following record for the password store SSH connection.

Host passwords
hostname 192.168.0.100
port 22
user git
IdentityFile    /home/stefan/passwort/pi_git_rsa

Now you can connect to the Pi by typing the following command `ssh passwords . Note that now you need to provide the password for the key file instead of the user password. Delete the pre-installed user pi from the system:

sudo userdel pi```


The default Raspbian partition configuration only utilises 2 GB of your SD card. This can become insuficient quickly. There exists a convenient tool which allows to increase the root partition to the full size of your SD card. Simply run the following command and select the appropriate menu item.

<span class="lang:default decode:true  crayon-inline ">sudo raspi-config</span>

## Prepare the Git Repository

In the following, we create an empty git repository which we will use for versioning the password database from Keepass.

mkdir Password-Repository git@zeropi:~ $ cd Password-Repository/ git@zeropi:~/Password-Repository $ git init –bare Initialisierte leeres Git-Repository in /home/git/Password-Repository/



The repository on the Pi is now ready for ingesting the passwords.

## Checkout the new Repository on your PC and add the Password File

Now that the repository is inititalized, we can start versioning the password file with git.  Clone the repository and add the password file to git, by copying the password file into the cloned repository directory.

git clone passwords:/home/git/Password-Repository cp /home/user/oldLocation/Password-Database.kdb ~/Password-Repository git add Password-Database.kdb git commit -m “initial commit”



The last step is to push the  newly committed password file to  the remote repository. You can improve the security by not adding the key file for KeePass into the repository.

git push origin master```

The basic setup is now completed and you can clone this repository on any device, in order to have the latest password file available.

Checkout the Password Repository on Your Phone

There exists a variety of Git clients for Android, which can deal with identity files and private key authentication. I have good experience with Pocket Git. Clone the repository by using the URL like this:

ssh://git@pi.duckdns.org:1234/home/git/Password-Repository```


### Versioning the Password File: Pull, Commit and Push

Handling versions of the password file follows the standard git procedure. The only difference is, that in contrast to source code files for which git is usually used for, the encrypted password database does not allow for diffs. So you cannot find differences between to versions of the password database. For this reason, you need to make sure that you get the latest version of the password database before you edit the file. Otherwise you need to merge the file manually.  In order to avoid this, follow these steps from within the repository everytime you plan additions, edits or deletes of the password database.

  1. git pull
  2. \## make your changes
  3. git commit -m &#8220;describe your changes&#8221;
  4. git push

## Enabling Remote Access

You can already access the Git repository locally in your own network. But in order to retrieve, edit and store passwords from anywhere, you need to enable port forwarding and Dynamic DNS. Port forwarding is pretty easy. Enter your router&#8217;s Web interface, browse to the port forwarding options and specify an external and internal port which points to the IP of the Raspberry Pi.

  * IP Address 192.168.0.100
  * Internal port 22
  * External port (22100)
  * Protocol: both

Now the SSH service and therefore the Git repository becomes available via the external port 22100. As we left the internal port at the default, no changes for the SSH service are required.

For Dynamic DNS I regularly use <a href="http://www.duckdns.org" target="_blank">Duck DNS</a>, which is a free service for resolving dynamic IP addresses to a static host name. After registering for the service, you can choose a host name and download the installer. There exists an installer particularly for the <a href="https://www.duckdns.org/install.jsp" target="_blank">Raspberry Pi</a>. Follow this instructions and exchange the token and the domain name in the file to match your account.  You can now use the domain you registered for accessing the service from other machines outside your network.

## Security Improvements

The setup so far is reasonably secure, as only users having the key file and its password may authenticate with the Git repository user. It is in general good practice to disallow root to connect via SSH and to restrict remote access. Ensure that all other users on the system can only connect via SSH if and only if they use public key based authentication. Always use passwords for the key file, so that if someone should get hold of your keys, the still require a password.

You can also disable password login for the user git explicitly and allow passwords for local users. Add these lines in the sshd config file.

Match User git
PasswordAuthentication no

Match address 192.168.0.0/24 PasswordAuthentication yes



If you know the IP addresses where you will update the password file in advance, consider limiting access only to these. The git user can authenticate with the key, but still may have too many privilieges and also could execute potentially harmful commands. Ensure that the git user is not in the list of superusers:

grep -Po ‘^sudo.+:\K.*$’ /etc/group```

The user git should not be in the output list. In order to limit the commands that the git user may execute, we can specify a list of allowed commands executable via SSH and utilise a specialised shell, which only permits git commands. Prepend the public key of the git user in the authorised_keys file as follows:

no-port-forwarding,no-agent-forwarding ssh-rsa AAAAB ........```


In addition, we can change the default shell for the user git. Switch to a different user account with sudo privileges and issue the following command:

sudo usermod -s /usr/bin/git-shell git```

This special shell is called git-shell and comes with the git installation automatically. It only permits git specific commands, such as push and pull, which is sufficient for our purpose. If you now connect to the Pi with the standard SSH command, the connection will be refused:

stefan $ ssh passwords 
Enter passphrase for key '/home/passwort/pi_git_rsa': 

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Mar 20 20:48:04 2016 from 192.168.0.13
fatal: Interactive git shell is not enabled.
hint: ~/git-shell-commands should exist and have read and execute access.
Connection to 192.168.0.100 closed.

Firewall

The Uncomplicated FireWall (ufw) is way less comlex to setup than classic IP tables and provides exactly what the name implies: a simple firewall. You can install and initialize it as follows:

sudo apt-get install ufw # Insatall it
sudo ufw default deny incoming # Deny all incoming traffic 
sudo ufw allow ssh # Only allow incoming SSH
sudo ufw allow out 80 # Allow outgoing port 80 for the Duck DNS request
sudo ufw enable # Switch it on
sudo ufw status verbose # Verify status```


The great tutorials at <a href="https://www.digitalocean.com/community/tutorials/how-to-setup-a-firewall-with-ufw-on-an-ubuntu-and-debian-cloud-server" target="_blank">Digital Ocean</a> provide more details.

## Conclusion

In this little tutorial. we installed a Git server on a Raspberry Pi Zero (or any other Linux machine) and created a dedicated user for connecting to the service. The user requires a private key to access the service and the git server only permits key based logins from users other than users from the local network. The git user may only use a restricted shell and cannot login interactively. The password file is encrypted and all versions of the passwords are stored within the git repository.



<div class="twttr_buttons">
  <div class="twttr_twitter">
    <a href="http://twitter.com/share?text=A+Reasonable+Secure%2C+Self-Hosted+Password+Database+with+Versioning+and+Remote+Access" class="twitter-share-button" data-via="" data-hashtags=""  data-size="default" data-url="https://blog.stefanproell.at/2016/03/20/a-reasonable-secure-password-database-with-versioning-and-remote-access/"  data-related="" target="_blank">Tweet</a>
  </div>
  
  <div class="twttr_followme">
    <a href="https://twitter.com/@stefanproell" class="twitter-follow-button" data-show-count="true" data-size="default"  data-show-screen-name="false"  target="_blank">Follow me</a>
  </div>
</div>

 [1]: https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-the-Server

Connecting a Tomato Router with an existing Wifi Network

I recently moved and therefore had to get a new contract with my ISP. Unfortunately, it as not possible to get the Ethernet modem (yet) and simply hook it to my beloved Netgear WNR3500L v2 powered by TomatoUSB. Instead, the provider shipped a Connect Box, which is very convenient to install for basic use, but does not allow you to define anything more complex than port forwarding. The previous wifi router did have the option to act as a simple bridge, but the new model does not come with this option. Fortunately, the Tomato firmware is very flexible and allows connect as a client into the existing network and provide connectivity to Ethernet ports and also virtual wireless networks.

Setting up the Client Mode

The good thing with this approach is that the primary router I got from my ISP, does not require any modifications or changes in the settings, the ISP Router remains in its original configuration. In this article, the ISP router is denoted as primary router. All modifications are made at the Tomato USB router, which is denoted as secondary router.

Change the IP address of the Tomato Router

First of all, the secondary router requires to have a different IP than the primary router. Make sure that you are connected via Ethernet cable to the secondary router and login using its original IP address, which is most likely 192.168.0.1. Change the IP  192.168.0.1 to 192.168.0.2` in the Basic Network Settings. Turn off DHCP, because the primay router  has an active DHCP server. Of course you can also use the secondary router as DHCP server or use static IPs for all devices. The screenshot below shows the network settings.

Reboot the secondary router and login again using the new IP address 192.168.0.2`. Open the same page again and set the gateway and DNS address to the IP of primary router, as depicted above.

Connect the Secondary Router as Wireless Client with the Primary Router

Remain on the Basic Networking page at the secondary router and set the Wireless Interface as Wireless Ethernet Bridge. This way, the secondary router creates a bridge to the primary router and allows clients connected to the Ethernet ports with the Internet. Enter the SSID of the primary router and provide the password. If you do not know the SSID by heart, you can use the Wireless Survey in the Tools menu. The screenshot below depicts the settings.

The Routing

The last step ensures that the routing of the traffic works. Change the mode of the secondary device from Gateway to router and save the settings. The Advanced Routing menu should now look somilar as shown below.

Virtual Wireless Networking

In the last optional step, you can create additional wifi networks from the secondary router which are accessible for other clients. This can be useful if you need to change the SSID on the primary router or are too lazy to change a lot of settings of your smart TV and other devices which still are looking for the old SSID. Open the Virtual Wireless page in the Advanced menu. You will see the connection with the primary router there and you can add additional networks by adding new virtual interfaces. Make sure that the mode of the wireless network is Access Point` .  This way you can create additional wifi networks and separate the devices into different zones.

Sources: Youtube

Add Self-Signed Certificates to your local trusted CA

When you use different self hosted services, such as your own Gitlab instance for example, you probably use self signed certificates for securing the connection. Self-signed certificates have the disadvantage that Browsers and other applications do not trust them and therefore either display error messages or refuse to connect at all. The IDEA IntelliJ IDE for instance does not clone from repositories which are protected by self signed certificates. Instead of disabling the SSL certificate verification completely, it is recommended to add your self signed certificate to the local certificate store.

Ubuntu and other Linux distributions store certification authorities in the file /etc/ssl/certs/ca-certificates.crt. This file is a list of trusted public keys, which is compiled by appending all certificates with the file ending .crt in the directory <span class="lang:default decode:true crayon-inline ">/etc/ssl/certs . Once a CA is in this list, the operating system trusts all certificates which have been signed by this CA. You could simply add the public part of the server certificate to this file, but it might get overwritten once the root CAs are updated. Therefore it is the better practice to simply store your self signed certificate in the mentioned directory and trigger the creation of the ca-certificates.crt file manually.

Fetch the Certificate from your Server

First, retrieve the certificate from your server. The command below trims away the information that is not needed and writes the certificate public key into a new file within the certificate directory. Run this command as root.

echo | openssl s_client -connect $SERVERADDRESS:443 2>/dev/null | openssl x509 &gt; /usr/local/share/ca-certificates/$SERVERADDRESS.crt

Update the Certificate Authority File

Then, by simply running the command sudo update-ca-certificates , the certificate will be added and trusted, as a link is created within the directory /etc/ssl/certs. The output of the command should mention that one certificate was added. Note that the command creates a new symbolic link from the file  $SERVERADDRESS.crtto$SERVERADDRESS.pem`. This certificate will then be accepted by all applications, which utilise the ca-certificates file.

Tracking Changes in Latex – The Easy Way

Latex has many advantages in comparison to WYSIWYG word processors such as Microsoft Word or Libre Office. Being a text based markup language, Latex allows distraction free writing in a large selection of editors, it scales easily also for large projects with hundreds or thousands of references and images within the text and it produces very clean layouts, which can be exchanged simply by using different document classes. If you use a version control system such as git or svn, also versioning becomes very convenient, every change can be seen in the text file as a simple diff.

Although I am a big fan of latex, there are some things where other processors such as Word do have their advantages as well. Tracking changes is such an advantage, which is solved very nicely and intuitively. Source code management tools such as git and svn can produce a diff of the file and highlight changes, but the changes are not rendered in the document. Tracing changes in a Latex source file is not comfortable, especially if people should be able to see where changes have been made in  a document, who are not familiar with the Latex source.

Installing latexdiff

The Perl tool latexdiff provides this missing feature and creates a Latex file as output, which contains the changes made between two revisions of a file. In addition the changes are highlighted in a similar fashion as in graphical text processors. Install the tool with the following apt command:

sudo apt-get install latexdiff

This provides the following tools for various source code management systems, as indicated by the suffix.

latexdiff latexdiff-cvs latexdiff-fast latexdiff-git latexdiff-rcs latexdiff-svn latexdiff-vc

Tracing Changes

After installing the software package, you can start comparing different revisions of the same file. Simply use the following command to highlight changes between two files. Note that this command creates a new Latex document as output, which you then need to compile in order to have a readable PDF with the changes.

# Generate diff with tracked changes
latexdiff Document_Version1.tex Document_Version2.tex > Document_Changes_V1_v2.tex
# Compile the Latex document
pdflatex Document_Changes_V1_v2.tex

This gives the following output, where the blue content has been added from one version to another and the red content has been deleted.

Using Revision Control

If you manage your Latex source with git or any other SCM tool anyways, you can use the same tool for comparing different revisions. Lets have a look at previous revisions in the git log:

stefan@stefan-Linux:~/Gitlab-Workspace/LatexTest$ git log
commit fc894678144569ace27776f82f230cfce0f1f017
Author: Stefan Pröll <sproell@sba-research.org>
Date:   Wed Feb 17 18:40:37 2016 +0100

    revision 2

commit 76bacf74d21f486daa4404a9bf16d2bfd634c38e
Author: Stefan Pröll <sproell@sba-research.org>
Date:   Wed Feb 17 18:36:41 2016 +0100

    revision 1

commit 641053bee12b9e9ecd8312c82925f5a962e5d65a
Author: Stefan Pröll <sproell@sba-research.org>
Date:   Wed Feb 17 17:45:17 2016 +0100

    mandrill init

Suppose we want to track the changes between revision 1 and revision 2, you can use the appropriate lateydiff variant to compare the revisions with eachother.

latexdiff-git -r fc894678144569ace27776f82f230cfce0f1f017 -r 76bacf74d21f486daa4404a9bf16d2bfd634c38e Document.tex

Note that in this case we provide the commit hash for git. This tool automatically writes the output in a new file denoted with the names of the two commit hashes that we provided. In the case of Subversion, you can also provide the revision id.

Limitations

It is important to understand that the Latex tool only considers changes in the printable text. This entails that if you change a picture or even replace an image with a new file (having a new file name), this will not be highlighted.

Data Wrangling with csvkit and SQLite

As mentioned earlier, csvkit is a very convenient tool for handling coma separated text files, especially when they are too large to be processed with conventional spread sheet applications like Excel or Libre Office Calc. The limits of Office programs can rather easy be reached, especially when dealing with scientific data. Open Office Calc supports the following limits.

  • maximum number of rows: 1,048,576
  • maximum number of columns: 1,024
  • maximum number of sheets: 256

Excel offers  also 1 048 576 rows but provides 16,384 columns. SQLite in contrast allows by default 2000 columns and provides if really needed up to 32767 columns if complied with a specific setting. In terms of row storage, SQLite provides a theoretical maximum number of 264 (18446744073709551616) rows. This limit is unreachable since the maximum database size of 140 terabytes will be reached first.

The limits we discussed will not be hit the our example of air traffic data, which we obtain from ourairports.com. You can download the sample file with currently 47409 airports described in the CSV format from the linked web page.

$: csvstat airports.csv 
  1. id
	<type 'int'>
	Nulls: False
	Min: 2
	Max: 316827
	Sum: 2112054844
	Mean: 44549.6602755
	Median: 23847
	Standard Deviation: 77259.1794792
	Unique values: 47409
  2. ident
	<type 'unicode'>
	Nulls: False
	Unique values: 47409
	Max length: 7
  3. type
	<type 'unicode'>
	Nulls: False
	Unique values: 7
	5 most frequent values:
		small_airport:	30635
		heliport:	9098
		medium_airport:	4536
		closed:	1623
		seaplane_base:	927
	Max length: 14
......

This little command provides us with the statistics of the columns in the file. We see that the file we provided offers 18 columns and we also can immediately see the column types, if there are null values and what the 5 most frequent values are. If we are interested in a list of columns only, we can print them with the following command.

$: csvcut -n  airports.csv 
  1: id
  2: ident
  3: type
  4: name
  5: latitude_deg
  6: longitude_deg
  7: elevation_ft
  8: continent
  9: iso_country
 10: iso_region
 11: municipality
 12: scheduled_service
 13: gps_code
 14: iata_code
 15: local_code
 16: home_link
 17: wikipedia_link
 18: keywords

We can also use the csvcut command for – you expect it already – cutting specific columns from the CSV file, in order to reduce the size of the file and only retrieve the columns that we are interested in. Image you would like to create a list of all airports per region. Simply cut the columns you need and redirect the output into a new file. The tool csvlook provides us with a MySQL-style preview of the data.

$: csvcut --columns=name,iso_country,iso_region  airports.csv > airports_country_region.csv
$: csvlook airports_country_region.csv | head -n 15
|--------------------------------------------------------------------------------+-------------+-------------|
|  name                                                                          | iso_country | iso_region  |
|--------------------------------------------------------------------------------+-------------+-------------|
|  Total Rf Heliport                                                             | US          | US-PA       |
|  Lowell Field                                                                  | US          | US-AK       |
|  Epps Airpark                                                                  | US          | US-AL       |
|  Newport Hospital & Clinic Heliport                                            | US          | US-AR       |
|  Cordes Airport                                                                | US          | US-AZ       |
|  Goldstone /Gts/ Airport                                                       | US          | US-CA       |
|  Cass Field                                                                    | US          | US-CO       |
|  Grass Patch Airport                                                           | US          | US-FL       |
|  Ringhaver Heliport                                                            | US          | US-FL       |
|  River Oak Airport                                                             | US          | US-FL       |
|  Lt World Airport                                                              | US          | US-GA       |
|  Caffrey Heliport                                                              | US          | US-GA       |

We could then sort the list of airports alphabetically in reverse and write the new list into a new file. We specify the name of the column we want to sort the file and We measure the execution time needed by prepending the command time.

$: time csvsort -c "name" --delimiter="," --reverse airports_country_region.csv > airports_country_region_sorted.csv

real    0m1.177s
user    0m1.135s
sys    0m0.042s

A nice feature of csvkit is its option to query CSV files with SQL. You can formulate SELECT queries and it even supports joins and other tricks. Thus you can achieve the same result with just one SQL query.

$: time csvsql -d ',' --query="SELECT name,iso_country,iso_region FROM airports ORDER BY name DESC" airports.csv > sql_airports_country_region.csv

real	0m11.626s
user	0m11.532s
sys	0m0.090s

Obviously, this is not the fastest possibility and may not be suitable for larger data sets. But csvkit offers more: You can create SQL tables automatically by letting csvkit browse through your CSV files. It will try to guess the column type, the appropriate field length and even constraints.

$: csvsql -i sqlite -d ',' --db-schema AirportDB --table Airports airports.csv 
CREATE TABLE "Airports" (
	id INTEGER NOT NULL, 
	ident VARCHAR(7) NOT NULL, 
	type VARCHAR(14) NOT NULL, 
	name VARCHAR(77) NOT NULL, 
	latitude_deg FLOAT NOT NULL, 
	longitude_deg FLOAT NOT NULL, 
	elevation_ft INTEGER, 
	continent VARCHAR(4), 
	iso_country VARCHAR(4), 
	iso_region VARCHAR(7) NOT NULL, 
	municipality VARCHAR(60), 
	scheduled_service BOOLEAN NOT NULL, 
	gps_code VARCHAR(4), 
	iata_code VARCHAR(4), 
	local_code VARCHAR(7), 
	home_link VARCHAR(128), 
	wikipedia_link VARCHAR(128), 
	keywords VARCHAR(173), 
	CHECK (scheduled_service IN (0, 1))
);

$: csvsql -i sqlite -d ',' --db-schema AirportDB --table Airports -u 0 airports.csv > airport_schema.sql

The second command in the listing above simply stores the table in a separate file.  We can import this CREATE TABLE statement by reading the file in SQLite. Change to the folder where you downloaded SQLite3 and create a new database called AirportDB. The following listing contains SQL style comments (starting with –) in order to improve readability.

./sqlite3 AirportDB.sqlite


SQLite version 3.9.2 2015-11-02 18:31:45
Enter ".help" for usage hints.
-- change the command separator from its default | to ;
sqlite> .separator ;
-- read the SQL file we created before
sqlite> .read /home/stefan/datawrangling/airport_schema.sql
-- list all tables
sqlite> .tables
Airports
-- print table schema

sqlite> .schema Airports
CREATE TABLE "Airports" (
	id INTEGER NOT NULL, 
	ident VARCHAR(7) NOT NULL, 
	type VARCHAR(14) NOT NULL, 
	name VARCHAR(77) NOT NULL, 
	latitude_deg FLOAT NOT NULL, 
	longitude_deg FLOAT NOT NULL, 
	elevation_ft INTEGER, 
	continent VARCHAR(4), 
	iso_country VARCHAR(4), 
	iso_region VARCHAR(7) NOT NULL, 
	municipality VARCHAR(60), 
	scheduled_service BOOLEAN NOT NULL, 
	gps_code VARCHAR(4), 
	iata_code VARCHAR(4), 
	local_code VARCHAR(7), 
	home_link VARCHAR(128), 
	wikipedia_link VARCHAR(128), 
	keywords VARCHAR(173), 
	CHECK (scheduled_service IN (0, 1))
);

You can also achieve the same results directly from Bash, simply by piping the SQL file to the database.

$: cat ~/datawrangling/airport_schema.sql | ./sqlite3 AirportDB.sqlite
$: ./sqlite3 AirportDB.sqlite ".tables"
Airports
$: ./sqlite3 AirportDB.sqlite ".schema Airports"
CREATE TABLE "Airports" (
	id INTEGER NOT NULL, 
	ident VARCHAR(7) NOT NULL, 
	type VARCHAR(14) NOT NULL, 
	name VARCHAR(77) NOT NULL, 
	latitude_deg FLOAT NOT NULL, 
	longitude_deg FLOAT NOT NULL, 
	elevation_ft INTEGER, 
	continent VARCHAR(4), 
	iso_country VARCHAR(4), 
	iso_region VARCHAR(7) NOT NULL, 
	municipality VARCHAR(60), 
	scheduled_service BOOLEAN NOT NULL, 
	gps_code VARCHAR(4), 
	iata_code VARCHAR(4), 
	local_code VARCHAR(7), 
	home_link VARCHAR(128), 
	wikipedia_link VARCHAR(128), 
	keywords VARCHAR(173), 
	CHECK (scheduled_service IN (0, 1))
);

We created a complex SQL table by automatically parsing CSV files. This gives a lot of opportunities, also for Excel spreadsheets and other data available in CSV. The great thing about csvkit is that it supports a large variety of database dialects. You can use the same command by adapting the -i parameter for the following database systems:

  • access
  • sybase
  • sqlite
  • informix
  • firebird
  • mysql
  • oracle
  • maxdb
  • postgresql
  • mssql

All major systems are supported, which is a great benefit. Now that we have the schema ready, we need to import the data into the SQLite database.  We can use the SQLite client to import the CSV file into the database, but suddenly we run into a problem! The 12th column contains boolean values, as correctly identified by the csvkit tool. When we inspect the file again with csvlook, we can see that the column contains ‘yes’ and ‘no’ values. Unfortunately SQLite does not understand this particular notion of boolean values, but rather expects 0 for false and 1 for true, as described in the data types documentation.We have two options: We could replace the values of yes and no by their corresponding integer, for instance with awk:

$: awk -F, 'NR>1 { $12 = ($12 == "\"no\"" ? 0 : $12) } 1' OFS=,  airports.csv &gt; airports_no.csv
$: awk -F, 'NR>1 { $12 = ($12 == "\"yes\"" ? 1 : $12) } 1' OFS=,  airports_no.csv &gt; airports_yes.csv

Or, much more comfortably, we could again use csvkit, which can help us out and replaces the values automatically. The following command imports the data into our database. As we already created the table in advance, we can skip the process with the appropriate flag.

$: time csvsql --db "sqlite:///home/stefan/datawrangling/AirportDB.sqlite" --table "Airports" --insert airports.csv --no-create

real    0m11.161s
user    0m10.743s
sys    0m0.169s

This takes a little while, but after a few seconds, we have the data ready. We can then open the database and query our Airport data set.

sqlite>  SELECT name,iso_country,iso_region FROM airports ORDER BY name DESC
Run Time: real 3.746 user 0.087215 sys 0.167298

You can now use the data in an advanced way and also may utilise advanced database features such as indices in order to speed up the data processing. If we compare again the execution of the same query on the CSV file and within SQLite, the advantage becomes much more obvious if we omit command line output, for instance by querying the COUNT of the airport names.

:$ time csvsql -d ',' --query="SELECT COUNT(name) FROM airports" airports.csv 
COUNT(name)
47409

real	0m11.068s
user	0m10.977s
sys	0m0.086s

-- SQLite
sqlite> .timer on
sqlite> SELECT COUNT(name) FROM Airports;
47409
Run Time: real 0.014 user 0.012176 sys 0.001273

Create an ER Diagram of an Existing SQLite Database (or many other RDBMS)

Visualisation helps solving problems and is therefore an important tool database design. Many database providers have their product specific tools for re-engineering existing schemata, but self-contained, serverless, embedded relational database management systems (RDBMS) such as SQLite often come without much tooling support. The extremely small footprint of SQLite provides a very powerful tool for implementing database driven applications without the hassle of database administration, user privilege management and other demanding tasks that come with more complex systems. There does not exist a workbench-like tool for SQLite, but we can use the open source SchemaCrawler for analysing database schemata and table relationships. The tool provides a plethora of commands and options, in this post we will only cover the diagramming part, which allows creating ER diagrams of the table.

After downloading and extracting the tool to your local drive, you can find a lot of examples included. The tool can handle SQLite, Oracle,  MS SQL Server, IBM DB2, MySQL, MariaDB, Postgres and Sybase database servers and is therefore very versatile. You will need Java 8 in order to run it. Have a look at the script below, which creates a PNG image of the database schema of the Chinook test database.

#!/bin/bash
# The path of the unzipped SchemaCrawler directory
SchemaCrawlerPATH=/media/stefan/Datenbank/schemacrawler-14.05.04-main
# The path of the SQLite database
SQLiteDatabaseFILE=/media/stefan/Datenbank/ChinookDatabase1.4_CompleteVersion/Chinook_Sqlite.sqlite
# The type of the database system.
RDBMS=sqlite
# Where to store the image
OutputPATH=/media/stefan/Datenbank/ER.png
# Username and password need to be empty for SQLite
USER=
PASSWORD=

java -classpath $(echo ${SchemaCrawlerPATH}/_schemacrawler/lib/*.jar | tr ' ' ':') schemacrawler.Main -server=${RDBMS} -database=${SQLiteDatabaseFILE} -outputformat=png -outputfile=${OutputPATH} -command=graph -infolevel=maximum -user=${USER} -password=${PASSWORD}

The SchemaCrawlerPATH variable contains the path to the directory where we unzipped the SchemaCrawler files to. This is needed in order to load all the required libraries into the classpath below. We then specify the SQLite database file, define the RDBMS and provide an output path where we store the image. Additionally, we provide an empty user name and password combination. SQLite does not provide user authentication, thus those two parameters need to be empty, SchemaCrawler simply ignores them. Then we can execute the command and the tool generates the PNG of the ER diagram for us.

You can find a lot of examples also online, which gives you an overview of the features of this tool. One of the main purposes of SchemaCrawler is to generate diffable text outputs of the database schemata. In combination with a source code version management tool such as Git or Subversion, you can create clean and usable reports of your databases and keep track of the changes there. You can retrieve an overview of the options  with the following command.

java -classpath $(echo ${SchemaCrawlerPATH}/_schemacrawler/lib/*.jar | tr ' ' ':') schemacrawler.Main -?

You can also HTML reports with the following command:

java -classpath $(echo ${SchemaCrawlerPATH}/_schemacrawler/lib/*.jar | tr ' ' ':') schemacrawler.Main -server=${RDBMS} -database=${SQLiteDatabaseFILE} -outputformat=html -outputfile=report.html -command=details -infolevel=maximum -user=${USER} -password=${PASSWORD}

Other available output formats are plain text, CSV or JSON.

Importing your Repositories from Github to your Gitlab Instance

Github is a great service and offers a free micro account for students, which includes five private repositories. Still this might not be enough private repositories for some of us, especially if you start a lot of side projects (which are hardly finished :-)) and want to keep your code and data organised, versioned and collaborative but non-public (because it is not finished yet :-)). For this and other reasons I am also running a private Gitlab instance, which basically offers the same features as Github and can be hosted on your private server.

I still use Github for some projects, which means that I continuously run out of private repositories there as only five are included in my plan. Now Gitlab (the private instance) offers a great feature, which allows one to hook the Github account into the Gitlab instance and import code hosted in private repositories at Github into your Gitlab instance, where no limits for private repositories apply. Obviously the repositories could also be transferred manually, but the import feature via the Web interface is very convenient.

Installing the oAuth feature for Github and other services is very simple and described here for Gihub and there in general. Read both documentations carefully, as using the wrong settings can lead to a serious security issues. After following the instructions carefully, the warning that oAuth needs to be configured kept popping up in the Gitlab dashboard:

To enable importing projects from GitHub, as administrator you need to configure OAuth integration.

The final hint was given in this SO post here. The official documentation failed to mention that the changes in the configuration file must be updated by reconfiguring Gitlab. The following list of steps should lead to success:

  1. Login into Github and open your account settings
  2. Select Applications on the left and chose the Developer Applications tab
  3. Register a new application by providing the name (e.g. Gitlab Access), a homepage (can be anything), a description (optional) and the callback URL (e.g. https://gitlab.example.org).
  4. Add the initial OmniAuth configuration as described in the official documentation.
  5. Provide the settings for Github.
  6. Save the file and execute the following two commands.
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart

The first time you select “Import project from Github”, you will be redirected to the Github authentication page, where you need to grant permission for the Gitlab user to retrieve the repositories. After you imported the private repositories into Gitlab, you can delete them from Github and reclaim some free space.

A quick fix for the ‘ascii’ codec can’t encode character error in Python and the csvkit tool

For a current project I need to migrate large volumes of CSV data into a relational database management system. The Python driven csvkit is the swiss army knife of CSV tools and very handy for this purpose. However, a few CSV files cause troubles when I tried to pipe the SQL CREATE statements I created with the csvkit tool to a file.

csvsql -i sqlite -d ';' -e 'utf8' --db-schema test_schema --table test_table inputfile.csv > output.sql
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 46: ordinal not in range(128)```


Adding the &#8211;verbose flag gives more clarity

csvsql -i sqlite -d ‘;’ -e ‘utf8’ –db-schema test_schema –table test_table inputfile.csv > output.sql Traceback (most recent call last): File “/usr/local/bin/csvsql”, line 9, in load_entry_point(‘csvkit==0.9.1’, ‘console_scripts’, ‘csvsql’)() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 161, in launch_new_instance utility.main() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 134, in main self.output_file.write('%s\n’ % sql.make_create_table_statement(sql_table, dialect=self.args.dialect)) UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\ufeff’ in position 46: ordinal not in range(128)```

Although the input file is already in UTF-8, the file is written in ASCII, which causes an error if Umlauts are included.
Adding the following code after the import commands at the top of the csvsql.py file sets the correct encoding for the output file in this Python script.

reload(sys)
sys.setdefaultencoding('utf-8')

This is rather a quick fix than an elegant solution, but it allows me to continue with my work.

Den UPC Wireless Cable Router durch den eigenen WLAN-Router ersetzen

Nachdem das Netzteil meines UPC Modems das zeitliche gesegnet hat, pilgerte ich zum UPC Store am Keplerplatz und ließ das gesamte Gerät tauschen. Der Austausch erfolgte unkompliziert, doch leider hatte ich anstelle des althergebrachten Modems nun endgültig einen Wifi-Router erhalten, wogegen ich mich bisher immer gesträubt hatte. Schließlich betreibe ich einen liebevoll eingerichteten Netgear WNR 3500L mit der altenativen Firmware TomatoUSB und hätte keine Lust, mein Setup auf dem UPC Gerät umzusetzen und auf Funktionen zu verzichten. Das Verwenden eines eigenen Routers stellt an und für sich kein Problem dar und wird auch in der UPC FAQ beschrieben.

Die Switch Variante

Der UPC Router kann auch als Switch eingesetzt werden. In diesem Modus kann eine Bridge zwischen beiden Geräten hergestellt werden. Das UPC Gerät leitet in diesem Modus alle Anfragen direkt an den eigenen Router durch. Explizites Portforwarding ist bei dieser Variante nicht nötig, sie erfolgt lediglich am eigenen Router. Um diesen Modus zu aktivieren, wählen wir die entsprechende Option in den UPC Systemeinstellungen und setzen die Auswahl auf Bridge Modus. Danach starten wir den UPC Router neu. Dadurch erhält das Gerät die IP-Adresse 192.168.100.1. Diese IP Adresse tragen wir nun im TomatoVPN Interface für das Modem ein, wie die folgende Abbildung zeigt:

Nachdem Sie den Bridge Modus aktiviert haben, können Sie den UPC Router nicht mehr aus dem Netzwerk erreichen. Das Administrationsinterface ist lediglich von einem direkt am UPC Router angeschlossenen PC möglich. Mehr Informationen zum Bridge Modus finden sich hier.

Die Router Variante

Das UPC Gerät funktioniert standardmäßig als Router. Damit zwei Router verwendet werden können, müssen auch alle Ports zwischen den beiden Routern weitergeleitet werden. Zunächst muss wie beschrieben das WLAN Modul deaktiviert werden. Danach geht es an die lokalen Netzwerkeinstellungen. Leider sind die Beschreibungen jedoch nicht ganz vollständig, es wurde darauf vergessen, den DNS Servereintrag ebenfalls auf die Class A IP Adresse 10.0.0.1 (also der IP des UPC Routers) zu setzen. Durch die falsche IP Adresse des DNS Servers, können logischerweise keine Hostnamen aufgelöst werden und das Surfen macht nur jenen mit ausgezeichnetem IP-Adressen-Gedächtnis Spaß. Im folgenden Screenshot sind die funktionierenden Settings eingetragen.

Das Diagramm des kleinen Heimnetzwerks ist im folgenden Listing abgebildet, erstellt habe ich es mit ASCII-Flow.

+------------+
                                                                          |            |
                                                                        +-+  Android   |
+-------------------+    +------------------+       +---------------+   | |            |
|                   |    |    UPC Router    |       |Netgear Router |   | +------------+
|      Internet     +----+   IP 10.0.0.1    +-------+IP 192.168.0.1 +---+
|                   |    |WAN 234.124.122.13|       | WAN 10.0.0.2  |   | +------------+
+-------------------+    +------------------+       +---------------+   | |            |
                                                                        +-+   Desktop  |
                                                                          |            |
                                                                          +------------+

Falls Sie Ports per Port Forwarding freigegeben haben, müssen Sie diese natürlich zuerst vom UPC Router auf ihren eigenen Router weiterleiten, also z.B. den Port 443 zum Zielrechner 10.0.0.2.

Migrating Markup Languages

There exists a broad range of different markup languages, which allow you to structure, highlight and format your documents nicely. Different wiki platforms utilise various markup languages for rendering texts, Github and other code sharing platforms use their own dedicated languages for their documentation files. These markup languages such as Textile or Markdown work rather similar, as they provide users with a basic set formats and structural elements for text. In order to avoid vendor lock-in or when you need to move to a different system, you will need to extract the information from the old markup and make it available in a different language. Copying the rendered text is insufficient, as the semantics contained in the markup is lost. Therefor you need to apply a tool such as pandoc for this task. The pandoc website says that the tool

can read Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs Org mode, DocBook, txt2tags, EPUB, ODT and Word docx; and it can write plain text, Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, reStructuredText, XHTML, HTML5, LaTeX (including beamer slide shows), ConTeXt, RTF, OPML, DocBook, OpenDocument, ODT, Word docx, GNU Texinfo, MediaWiki markup, DokuWiki markup, Haddock markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org mode, AsciiDoc, InDesign ICML, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX or ConTeXt is installed.

Pandoc’s enhanced version of Markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, superscripts and subscripts, strikeout, metadata blocks, automatic tables of contents, embedded LaTeX math, citations, and Markdown inside HTML block elements.

Install the tool from the Ubuntu repositories:

sudo apt-get install pandoc

You then can convert or migrate between the available formats, for instance from markdown to textile (i.e. Github to Redmine):

pandoc -f markdown_github -t textile inputFile.markdown -o outputFile.textile

For migrating your Github wiki pages for instance, you can checkout the wiki from Github and convert all markdown documents in a loop:

# clone repository
git clone https://gitlab.example.org/user/project.wiki.git
cd project.wiki

for fileName in *.markdown; do
    # Remove the extension markdown and replace it with textile
    pandoc -f markdown_github -t textile $fileName -o "${fileName#"markdown"}".textile
done