Home      --      UNIX

How to use MySQL to keep track of your local files

Obtaining MySQL -- Setting up the server -- Starting the server -- Configuring permissions -- The file database

Having downloaded plenty of interesting and informative stuff from the web over the years, one day I realised I had no idea what I had. I kept on finding again, after some searching, web pages which I later saw I had downloaded earlier. I do have a systematic directory structure for the files I download - but not all fit neatly into one category, and categorising and sub-categorising gives you only a one-dimensional criterion for finding them. The solution: I set up an SQL database containing the names and location of the files as well as several keywords which I can search for. This page describes how to do this.

Obtaining the software

A free SQL implementation exists in MySQL (www.mysql.com), which is already included in some Linux distributions. To be able to use my Perl scripts (see below), you also need the Perl package DBD-MySQL, available from www.cpan.org.

Setting up the SQL server

To have a database which only one user needs and is allowed to access is something of an abuse of MySQL. Therefore the steps to set up a database are a little different than is described in the MySQL documentation.

As a first step it is recommendable to create a new folder in which the database (and auxiliary files) will reside.

mkdir mysql

Before you can start the server, you have to create the grant tables, tables in database which tell the server who (that is, which MySQL user) is allowed to do what. You do that with the following command:

mysql_install_db --user=out --datadir=/home/out/mysql/

"out" is the (Linux) user with which I am running this example; you have to replace it by your own user ID. Also replace the mysql directory if you named it differently or placed it somewhere else. Then you start the server with the command mysqld_safe:

mysqld_safe --user=out --datadir=/home/out/mysql/ --skip-networking \
	    --socket=/home/out/mysql/mysqld.sock  &

mysqld_safe is a wrapper for mysqld which does some additional logging and adds some safety features. The option --skip-networking disables all network access to the database, which you don't need if you are only using it locally. Finally, the --socket option gives the filename of the UNIX socket through which the database can be accessed. This should also be in your local SQL directory, as the default location is in the global /tmp directory and might clash with any other SQL server possibly running on the same machine.

Starting the server automatically

Before we proceed to configuring MySQL users and actually creating the tables in the database, let's make a short detour which will make our life easier in the future: You wouldn't want to start the MySQL daemon by hand every time you want to use the database; rather, you want it to be started before you get round to using it. If you are the system administrator, you could put the command mysqld_safe into one of the init scripts; but if you are not, you can still make the server daemon start automatically when you login. Just put the following lines into your .bashrc or .xsession:

if ps -u out | grep mysqld &> /dev/null ; then
  echo > /dev/null
else
  mysqld_safe --user=out --datadir=/home/out/info/infodb --skip-networking \
                --socket=/home/out/info/infodb/mysqld.sock &> /dev/null &
fi

The if command checks whether there is already a daemon running under our user; if not, it is started.

Putting the code into your .bashrc has the advantage that you have access to the database even from the text consoles. However, if some graphical application should be able to access it, the server should be started in the .xsession, or it won't be available before you open an xterm. If your .xsession opens an xterm anyway, you have the best of both worlds if you put the server startup code into the .bashrc.

Configuring users and permissions

At this point, anybody could connect to your SQL server and do nasty things with your data - erase everything, deny you access, anything. The first order of the day is to change the password of the MySQL root user. As on UNIX/Linux systems generally, root is the administrator, but the MySQL root user is distinct from your system administrator account - you can administer your privately created database without having the rights to administer the system. The easiest way to change the MySQL root password is the following:

mysqladmin -u root -S /home/out/mysql/mysqld.sock flush-privileges password <password>

Now you can connect to the MySQL server as root only with the password:

mysql --safe-updates -u root -S /home/out/mysql/mysqld.sock -p

If you enter the command as given, you are prompted for the password. You could also have given it on the command line, immediately after the -p, without a space. The option --safe-updates is highly recommended unless you use SQL daily; it blocks wholesale deletions and updates by requiring you specify a row uniquely for such operations.

After logging in, you see the MySQL prompt. You can now navigate around the databases your server handles with SQL commands. As the full syntax of SQL is beyond the scope of this web page, you have to go elsewhere to learn all of it. For instance, you can consult the MySQL documentation. We will need only a few commands here. The most important thing to remember is that (almost) all of them have to be concluded by a semicolon. MySQL uses the readline library (or a close copycat), so you can use the same advanced keycodes as on the shell command line.

As I already hinted, a MySQL server can handle several databases, all of which can contain several tables. You can list the databases with the SQL command:

show databases;

Don't forget the semicolon; if you omit it, MySQL will keep on prompting you for it. The resulting printout, which I do not give here, contains a list of tables in a nice ASCII frame notation. The database called "mysql" contains administrational information in the so-called grant tables. Before you can do anything with a database, you have to tell the server that you want to use it. Then you can display a list of its tables with the show command.

use mysql;
show tables;

The table named "user" contains MySQL users, their passwords and their permissions. How do you display it? There is no command called "show entries". Rather, the select command allows you to tell the server which entries you want to display. To display the whole "user" table, enter the following:

select * from user;

You will notice immediately that this was a bad table to try this out on: It has more than 30 columns, and the nice ASCII frame printout of MySQL will not help you unless you have two 1920 by 1280 displays side by side and stretch your xterm across both of them. To selectively print some columns, replace the asterisk in the above command by a comma-separated list of column names. These are given in the title row of the previous printout and are case insensitive.

select host, user, password from user;

Now this is much more readable. There are two pairs of rows: One for the root user and one for a user with an empty user name on the host "localhost", and the same again for the actual hostname I set for my box. The "empty" user serves as an anonymous user account, which allows anybody to connect anonymously, and in this case without a password. Though these anonymous accounts have no permissions, we can remove them and create a named user account instead:

delete from user where user='' and host='localhost';
delete from user where user='' and host='<hostname>.<domainname>';
flush privileges;
grant select, insert, update, delete on *.* to 'out'@'%' identified by '<password>';

In these statements, <hostname> should be replaced by your computer's hostname, <domainname> (so that <hostname>.<domainname> is the host field entry of the second anonymous user account). The command flush privileges tells the server to reread the grant tables which were just modified. The last command creates a new user account by granting that user privileges to execute the commands select, insert, update and delete. out should be replaced by your user name of choice (your system user ID might be simplest), and <password> by your password. By giving % as the host name, the new user is allowed to connect from any host, but as we started the MySQL server with the option --skip-networking, all connection are restricted to the local machine anyway. One might equally have given localhost as the host. The grant statement requires no "flush privileges".

Which permissions the non-root user should be granted in our case is debatable: As this is your private SQL server anyway, the only distinction between root and the other user is that the latter will be used by scripts which allow you to comfortably access the database. Otherwise, it would make sense to give the user permission to create, alter and perhaps drop tables. See the MySQL documentation about that.

There is one last thing to do, namely to set the password for the second root account, the host of which is set to our explicit host name. At least on my system, its password was not set by our mysql_admin command above. We do this by explicitly changing the user table:

update user set password = password('<password>') where user = 'root' and host = '<hostname>.<domainname>';
flush privileges;

Building a database of local files

Creating the database and table

Now we can go about the task we set up the MySQL server for: Creating a database of certain local files, with keywords indicating their contents. First we have to create a new database and a table containing the information we want to store. As I didn't give our normal user the permission to create databases and tables, we have to connect to the MySQL server as root:

mysql --safe-updates -u out -S /home/out/mysql/mysqld.sock -p

Enter the password when prompted. Then you can create the database and a table in the new database with the following SQL commands:

create database infodb;
use infodb;
create table files (name varchar(80), path varchar(255), keywords varchar(255));

These commands create a new database called "infodb", tell the server to interpret all following commands as pertaining to this database, and to create a new table named "files" in that database. The list in parentheses after the name of the new table contains the column titles and types. A varchar(..) is a character string of variable length. To view the result of these commands, type:

show databases;
show tables;
describe files;

The first two commands we have encountered already, and the last outputs the column names and types of the table "files". We can now disconnect from the MySQl server, as we do not need the root user for adding, changing or deleting data in the table. (In fact, the last three commands could have been executed as any user.)

Filling the table by hand

To fill our new table with entries, we can act as the "normal", non-root MySQL user. First we connect to the server and select the right database using one shell command:

mysql --safe-updates -u out -S /home/out/mysql/mysqld.sock -p infodb

Note that infodb is not the password, but the name of the database to use. If you want to give the password on the command line, it has to follow -p immediately without a space. (This is not perfectly safe, but safer than it may seem: mysql overwrites the password in the command line after using it, so there is only a short time during which it is visible to programs listing running processes like ps.)

From the MySQL prompt you can fill a table row by row with the insert command:

insert into files values ('information.html', '/home/out/some/path', 'very, intersting, info');
select * from files;

The second command prints out the table including the new row.

Now if you had to fill the whole table in this way, you would be busy for weeks if you have a fair number of files to categorise. The load data command (see the MySQL documentation) won't help you either, unless you already have a table containing the information you want to store in the database. Wouldn't it be nice if you could just work from the shell and put a file into the database by giving its name relative to the current working directory and some keywords to go with it? That is what the Perl scripts down below do.

Helper scripts

So far we have set up an SQL server and created a database and table to store file names and paths together with keywords indicating their content. Let us recapitulate what we want to do with it. After downloading a web page or document about an interesting topic, we want to add it to the database so we can find it later more easily. We want to be able to search for files about a certain topic by listing all files in the database associated with a certain keyword. We might also want to add more keywords later. At some point we will delete files which no longer interest us and want to remove them from the database. And we might want to reorganise the directories where we store downloaded documents and files and want the file paths in the database reflect the fact that we moved files and directories around.

I wrote a couple of Perl scripts which perform all these tasks. They are described in the following paragraphs. In order to make them easier to adapt to your system and directory structure, they have a number of parameters which are assigned to local variables at the top of the scripts: $basedir is the top directory inside which all the files handled by the database reside. The paths in the database start at $basedir, which keeps paths from getting longer than necessary. Attempts to apply the scripts to files outside $basedir will be rejected. The variables $socket, $database and $table contain the location of the MySQL server socket, the name of the database and the table, respectively. The variable $sqluser is assigned the user ID of your system account, under the assumption that the non-root SQL user has the same name. If not, you have to change that assignment. The $pass variable contains your MySQL password. After downloading the scripts, you should deny read access to them by chmod go-rx *idb, or others will be able to read the SQL user name and password.

Now here is a brief description of the scripts: addidb adds a file name, path and keywords to the "files" table in the database infodb. Its arguments are a file name and one or several keywords. It converts the file name into an absolute path if necessary, splits path and file name and puts these and a comma-separated list of the keywords given into the table of the SQL database. If the option -f is given (before the file name), the keywords are replaced if an entry for that file already exists, otherwise they are added to those present. Directories can also be added to the database, and get a slash "/" as a filename.

The second script, searchidb, searches the "files" table for files associated with given keywords. Strictly speaking, the keyword column of the table is searched using regular expressions, which means (1) that you can give regular expressions as arguments to searchidb and (2) that files whose keywords contain a word you give will also be found. If multiple keywords are passed to searchidb, only files which match all of them are found.

The remaining two scripts are delidb and mvidb. They delete and move files in the database, respectively. Both expect the respective action already to have occurred in the filesystem, though delidb only warns if the given file still exists. All the scripts output a brief usage message when called without arguments.

These scripts, and the structure of the SQL table we created above, are of course only one way of solving our problem. If you have different needs or preferences, or want to do something quite different with MySQL, feel free to take and modify my scripts.