Chapter 2. Tigase DB Schema Explained

Artur Hefczyc <> v2.0, June 2014: Reformatted for AsciiDoc. :toc: :numbered: :website: :Date: 2011-07-12 00:56

The schema basics, how it looks like and brief explanation to all rows can be found in the schema creation script. However, this is hardly enough to understand how it works and how access all the data. There are only 3 basic tables which actually keep all the Tigase server users' data: tig_users, tig_nodes and tig_pairs. Therefore it is not clear at first how the Tigase data is organised.

Before you can understand the Tigase XMPP Server database schema, how it works and how to use it, is it essential to know what were the goals and why it works that way. Let’s start with the API as this gives you the best introduction.

Simplified access can be got through methods:

void setData(BareJID user, String key, String value);
String getData(BareJID user, String key);

And more complex version:

void setData(BareJID user, String subnode, String key, String value);
String getData(BareJID user, String subnode, String key, String def);

Even though, the API contanins more methods, the rest is more or less variation of presented above. Complete API description for all access methods is available in JavaDoc documentation to UserRepository interface. So we are not going into much details here except the main idea.

We are more or less operate on <*key*, value> pairs for the particular user. The idea befind this was to make the API very simple and also at the same time very flexible, so adding a new plugin or component would not require database schema change, adding new tables, conversion of the DB schema to a new version, etc…​.

As a result UserRepository interface is exposed to all the Tigase code, mainly components and plugins (let’s call all of them modules), and these modules simply call set/get methods to store or access module specific data.

As plugins or components are developed independently it may easily happen that developer choses the same key name to store some information. To avoid key name conclicts in the database a node concept has been introduced. Therefore, most modules when set/get key value they also provide a subnode part, which in most cases is just XMLNS or some other unique string.

The node thing is a little bit like directory in a filesystem, it may contain subnodes which makes the Tigase database kind of hierarchical structure. And the notation is also similar to filesystem. You use just '/' to separate node levels. In practice you can have database organised like this:

user-name@domain  --> (key, value) pairs
               roster -->
                     item1 --> (key1, value1) pairs.
                     item2 --> (key1, value1) pairs.

So to access item’s 1 data from the roster you could call method like this:

getData("user-name@domain", "roster/item1", key1, def1);

This is huge convenience for the developer, as he can focus ont he module logic instead of worrying about data storage implementation and organisation. Especially at prototypic phase it speeds development up and allows for a quick experiments with differnent solutions. In practice, accessing user’s roster in such a way would be highly inefficient so the roster is stored a bit differently but you get the idea. Also there is a more complex API used in some places allowing for more direct access to the database and store data in any format optimised for the particular use case.

Right now such a hierarchical structure is implemented on top of SQL databases but initially Tigase’s database was implemented as XML structure, so it was natural and simple.

In the SQL database we simulate hierarchical structure with three tables:

  1. tig_users - with main users data, user id (JID), optional password, active flag, creation time and some other basic properties of the account. All of them could be actually stored in tig_pairs but for performance reasons they are in one place to quickly access them with single, simple query.
  2. tig_nodes - is a table where the hierarchy is implemented. When Tigase was storing data in XML database the hierarchy was quite complex. However, in SQL database it resulted in a very slow access to the data and now more flat structure is used by most components. Please note, every user’s entry has something called root node, which is represented by root string;
  3. tig_pairs - this is the table where all the user’s information is stored in form of the <key, value> pairs.

Ok, so we now know how the data is organised. Now we are going to learn how to access the data directly in the database using SQL queries.

Let’s assume we have a user admin@test-d for whom we want to retrieve the roster. We could simply execute query:

select pval
  from tig_users, tig_pairs
  where user_id = 'admin@test-d' and
        tig_users.uid = tig_pairs.uid and
        pkey = 'roster';

However, if multiple modules store data under the key roster for a single user, we would receive mutliple results. To access the correct roster we have to know also node hierarchy for this particular key. The main user’s roster is stored under the root node, so the query would look like:

select pval
  from tig_users, tig_nodes, tig_pairs
  where user_id = 'admin@test-d' and
            tig_users.uid = tig_nodes.uid and
            node = 'root' and
            tig_users.uid = tig_pairs.uid and
           pkey = 'roster';

How exactly the information is stored in the tig_pairs table depends on the particular module. For the roster it looks a bit like XML content:

<contact jid="all-xmpp-test@test-d" subs="none" preped="simple" name="all-xmpp-test"/>