Tuesday, May 14, 2013

Semantic Desktop: Akonadi and Nepomuk

The idea of taking the myriad kinds of information stored on a computer, and trying to find the relationships between it so it's more usable, has been around for a long time. "Semantics", the dictionary tells us, "is the study of meaning". The goal of a "semantic desktop" is to take all the bits and pieces of information we as users collect over time, and make it more meaningful, and ultimately more useful.

Akonadi

The Akonadi Framework was created as one piece in an effort to realize a semantic desktop. It's basically a service for collecting, storing and retrieving personal information management (PIM) data. This is actually harder than it would seem. Most PIM applications like calendars, e-mail, address books, journals, notebooks and the like, traditionally use unique file types to store their information. Compounding the problem is all the web-based PIM applications people use today. And if that wasn't difficult enough, there's also the constantly changing API's (Application Programming Interface) used by some of these. Google is a perfect example of this, they've recently announced they're dropping support for CalDAV. It's one of the reasons Akonadi has received so much unfair criticism, it's extremely hard to create something as complex as Akonadi in the constantly shifting sands of APIs.
Despite these hurdles Akonadi has come a long way in it's development. Most of the KDE Plasma Desktop PIM applications now make use of Akonadi, as well as several of the Plasma Desktop widgets. Despite misconceptions, PIM data is kept in it's native file formats, or kept on the remote server in the case of web-based and groupware data. Akonadi only pulls the important, frequently-used bits out, and places them in a database cache for quick, unified access. This information is also handed off to Nepomuk for the real semantic work.
The advantages of Akonadi for PIM users are great. Let's say you want to know when was the last time you had a meeting with John Doe. You could of course open your calendar app and search for John Doe. Akonadi will enable you to do much more with less effort. Instead of opening your calendar application you'll just open Krunner and search for John Doe, which will not only return your last meeting with him, but also any meetings, any mails sent between you, any to-do's you might have had concerning him, his contact information (IM, e-mail, address, phone, etc.), essentially anything you have in your PIM concerning John Doe, as well as any documents and other files that he's mentioned in. By searching for when you last had a meeting with John Doe you just may be reminded he sent you a follow-up e-mail you haven't read yet. Uh-oh, good thing you didn't just search your calendar. Akonadi's not quite there yet, but it's real close.
In our hypothetical example above, Akonadi was responsible for pulling all the information from the various places John Doe is referenced in the PIM. It also allows all the various PIM applications to share this information, as well as some Plasma Desktop widgets. Creating the relationships between the data and returning the search results to Krunner though was the work of another in the semantic desktop duo, Nepomuk.

Nepomuk

I have a habit, as I think most of us do, of putting files in "relate-able" folders. Got pictures from the fishing trip last year? Create a folder called something like "Fishing Trip 2012" and put all the pictures there. Got a project with lots of files? I'd create a folder with a short but (hopefully) meaningful name. This is what people have done since the creation of computers with disks. It works... kind of. But it has a couple of major drawbacks. With modern digital cameras, video recorders, phones and other devices, and the abundance of excellent media applications available, we're storing a lot of files with names like "DSC00023.jpg". Not a lot of help when you're writing to a friend bragging about a whale-sized trout you caught and want to include the picture. Being it was only last year I'd remember which folder it was in, and after browsing with another program for a bit I'd find it, but that took me away from my e-mail, possibly my train of thought, had to open another application, navigate to the needed folder, etc.. And what if the picture I was looking for was taken years ago? I'd probably be looking through folders for quite a while.
Add to that the fact that many of us also copy lots of information from the web - text clippings, media files, PDFs, whole web pages, and even more images. It's a lot of files to try and keep track of. The old folder naming and hierarchy strategy I've used becomes too difficult to manage, making finding that picture of the whale-sized trout, or the recipe for apple turnovers clipped from the web difficult and time-consuming.
Nepomuk was created as an answer to this problem, and more. Nepomuk, and it's associated libraries and utilities, pull information in the form of metadata from files and creates a searchable database of that information. Added to that is the ability for the user to create their own metadata in the form or tags, ratings and comments. With Nepomuk, what the file is, it's contents and it's metadata, are more pertinent than where it is. With Nepomuk it's feasible to place all of a user's files in one folder, like Documents, and still find them quickly when needed, based simply on some known trait like file type, date, tag, comment, rating, contents or a metadata value like a camera model, video length or document creator.
However Nepomuk's database isn't of the kind that programs like Beagle, Tracker or Recoll create and use. Those programs use ones that are much like what we think when we hear "database", information gathered and stored in a searchable table. Nepomuk handles and stores it's data based on "ontologies". In a nutshell what ontologies do is create relationships between data, very much like the human brain does. This is a far more complex problem than creating a simple database, and overcoming this problem is one of the reasons Nepomuk had gained a bad, though mostly deserved, reputation. Being memory-hungry and CPU-intensive were the usual labels assigned when users were willing to try using it. That's not the case anymore.
Up until KDE 4.9 Nepomuk used a file indexing program called Strigi. While a good, fairly light-weight indexer, Strigi has drawbacks in getting it to work with Nepomuk in the way needed. But with the release of KDE Plasma Desktop 4.10, Nepomuk is Strigi-less. Thanks largely to the hard work of a talented developer named Vishesh Handa, Nepomuk was re-worked from the ground up, and even has it's own indexer now. The difference in performance and speed are extremely noticeable. That, and well, it works!
If you use a PIM, or are thinking you may want to try one to organize you digital life, give Kontact, the KDE Plasma Desktop PIM application a try. Akonadi, the back-end for Kontact, has come a long way and just may surprise you.
If you're a Plasma Desktop 4.10 user and don't have Nepomuk indexing turned on in your System Settings because of past bad experiences, give it a try. But note - if you have an older Nepomuk database, run Nepomuk Cleaner first. When that's done, turn on Nepomuk's file indexing. Initially Nepomuk will just index file names and mimetypes, basic stuff to work with the file manager. It will then wait for the computer to be idle for a while before doing a deeper metadata and content index. Depending on how many files you have will dictate how long this will take. But don't worry, the new Nepomuk is very well-behaved. It will nicely move out of your way if you start using your computer again. Nepomuk is starting to live up to it's original vision. The semantic desktop envisioned years ago is becoming a reality.

Reference

http://www.muktware.com/5417/semantic-desktop-akonadi-and-nepomuk

0 comments:

Post a Comment