Pimping your ride on the semantic web

Yesterday, I wrote about how I’d marked up my home page to create a semantic profile of myself that is both auto-discoverable and portable. A place where my identity on the web can be aggregated; not a hole I’ve dug for myself, but an identity that reaches out across the web but always leads back home.

While I enjoy polishing my text editor regularly and hand-crafting beautifully formed, structured data, we all know it’s a fool’s game and that the semantic web is about machines doing all the work for us. So here’s a quick and dirty run down of how to pimp your ride on the semantic web with WordPress and a few plugins.

You’ll need a self-hosted WordPress site that allows you to install plugins. I’ve got one on Dreamhost that costs me $6 a month. Next, you’ll want to install some plugins. I’ll explain what they do afterwards. One thing to note here is that I’m using plugins from the official plugin repository whenever possible. It means that you can install them from the WordPress Dashboard and you’ll get automatic updates (and they’re all GPL compatible). In no particular order…

I think that’s quite enough. All but the SIOC plugin are available from the official WordPress plugin repository. Here’s what they provide:

APML: Attention Profile Markup Language

APML (Attention Profiling Mark-up Language) is an XML-based format for capturing a person’s interests and dislikes. APML allows people to share their own personal attention profile in much the same way that OPML allows the exchange of reading lists between news readers.

The plugin creates an XML file like this one that marks up and weighs your WordPress tags as a measure of your interests. It also lists your blogroll/links and any embedded feeds.

Extended Profile

This plugin adds additional fields in your user profile which is encoded with hCard semantic microformat markup and can then be displayed in a page or as a sidebar widget. You can import hCard data, too. There might also be another use for this, too. (see below)

Micro Anywhere

Provides a couple of additional editor functions that allow you to create an hCard or hCalendar events page. Here’s an example.

OpenID

This plugin allows users to login to their local WordPress account using an OpenID, as well as enabling commenters to leave authenticated comments with OpenID. The plugin also includes an OpenID provider, enabling users to login to OpenID-enabled sites using their own personal WordPress account. XRDS-Simple is required for the OpenID Provider and some features of the OpenID Consumer.

This is key to your identity. You can use your blog URL as your OpenID or delegate a third-party service, such as MyOpenID or ClaimID. In fact, you’ve almost certainly got an OpenID already if you have a Yahoo!, Google, MySpace or AIM account. It’s up to you which one you choose to use as your persistent ID. Read more about OpenID here. It’s important and so are the issues it addresses.

XRDS-Simple

This is required to add further functionality to the OpenID plugin. It adds Attribute Exchange (AX) to your OpenID which basically means that certain profile information can be passed to third-party services (less form filling for you!) Like a lot of these plugins, install it and forget about it.

SIOC

Provides auto-discoverable SIOC metadata. “A SIOC profile describes the structure and contents of a weblog in a machine readable form.”

wp-RDFa

Provides an auto-discoverable FOAF (Friend of a Friend) profile, based on the members of your blog. I’ve been in touch with the author of this plugin and suggested that the extended profile information could also be pulled into the FOAF profile. This is largely dependent on the FOAF specification being finalised, but expect this plugin to do more as FOAF develops.

OAI-ORE Map

Provides an auto-discoverable OAI-ORE resource map of your blog. It conforms to version 0.9 of the specification, which recently made it to v1.0, so I imagine it will be updated in the near future. OAI-ORE metadata describes aggregated resources, so instead of seeing your blog post permalink as the single identifier for, say, a collection of text and multimedia, it creates a map of those resources and links them.

LinkedIn hResume

LinkedIn hResume for WordPress grabs the hResume microformat block from your LinkedIn public profile page allowing you to add it to any WordPress page and apply your own styles to it.

I like this plugin because you benefit from all the features of LinkedIn, but can bring your profile home. Ideal for students or anyone who wants to create a portfolio of work and offer their resume/CV on a single site. Depending on the theme you use, it does require some additional styling.

Get_OPML

This is a nice way to create an OPML file of your sidebar links. If, like on my personal blog, your links point to resources related to you, you can easily create an OPML file like this one. There’s a couple of things to note about this plugin though. The instructions mention a Technorati API key. I didn’t bother with this. When you create your links, just scroll down the page to the ‘advanced’ section and add the RSS feed there. Secondly, the plugin author has, for some stupid reason, hard-coded the feed to their own site into the plugin. Assuming you don’t want this spamming your personal OPML file, download a modified version from here or comment out line 101 in get-opml.php. I guess the plugin author thinks that you’ll be using this to import the OPML into a feed reader and from there, you can delete his feed. That’s no good to us though. Finally, you’ll want to make your OPML file auto-discoverable. You can do this by adding a line of html in your header, using the Header-Footer plugin below.

Header-Footer

This simply allows you to add code to the header and footer of your blog. In our case, you can use it to add an auto-discovery link to the header of every page of your blog.


<link rel="outline" type="text/xml+opml" title="ADD YOUR TITLE HERE" href="http://YOUR_BLOG_ADDRESS/opml.xml" />

WP Calais * + tagaroo

These three plugins use the OpenCalais API to examine your blog posts and return a bunch of semantic tags. I’ve written about this in more detail here (towards the end).

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

It’s an easy way to add relevant tags to your content and broadcast your content for indexing by OpenCalais. They place an additional link in your header that lists the tags for web crawlers and, I guess, improves the SEO for your site.

Extra Feed Links

I’ve written about this plugin previously, too. It adds additional autodiscovery links to your blog for author, category and tag feeds. WordPress feed functionality is very powerful and this plugin makes it especially easy to make those feeds visible.

Lifestream

This isn’t a semantic web plugin, but is a powerful way of aggregating all of your activity across the web into a single activity stream. See my example, here. It also produces a single RSS feed from your aggregated activity. Nice 😉

Wrapping things up

If you set all of this up, you’ll have a WordPress site that can act as your primary identity across the web, aggregates much of your activity on the web into a single site and also offers multiple ways for people to discover and read your site. You also get a ‘well-formed’ portfolio that is enriched with semantic markup and links you to the wider online community in a way that you control.

Bear in mind that some of these plugins might not appear to do anything at all. The semantic web is about machines being able to read and link data, right? If you look closely in the source of your home page, you’ll see a few lines that speak volumes about you in machine talk.


<link rel="meta" href="./wp-content/plugins/wp-rdfa/foaf.php"type="application/rdf+xml" title="FOAF"/>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<link rel="meta" type="text/xml" title="APML" href="http://blog.josswinn.org/apml/" />
<link rel="alternate" type="application/rss+xml" title="NoteStream RSS Feed" href="http://blog.josswinn.org/feed/" />
<link rel="resourcemap" type="application/atom+xml" href="http://blog.josswinn.org/wp-content/plugins/oai-ore/rem.php"/>

If you do want a way to view the data, I recommend the following Firefox add-ons

Operator: Auto-discovers any embedded microformats and provides useful ways to search for similar data via third-party services elsewhere on the web.

OPML Reader: Auto-discovers an OPML file if you have one linked in your header. Allows you to either download the file or read it on Grazr.

Semantic Radar: Auto-discovers embedded RDF data. Displays custom icons to indicate the presence of FOAF, SIOC, DOAP and RDFa formats.

The Tabulator Extension: Auto-discovers and provides a table-based display for RDF data on the Semantic Web. Makes RDF data readable to the average person and shows how data are linked together across different sites.

As always, please let me know how this overview could be improved or if you know of other ways to add semantic functionality to your WordPress blog. Thanks.

A few notes on data portability

I had a bit of fun over the weekend looking at how I could both aggregate my online presence and make it portable, all under my own domain name. I ended up touching on a bunch of interesting initiatives revolving around web and data standards. The minor output of this is over on my personal ‘home page’ at http://josswinn.org

You’ll see that there’s an Attention Profile (APML), Friend of a Friend document (FOAF), hCard generated from my contact details, an OPML file of the significant feeds I have spotted around the web (Delicious, this blog, Twitter, Last.fm, etc), an aggregated feed of my OPML file, and a link to my LinkedIn profile, which I happily learned includes hResume microformat markup. My OPML, FOAF profile and RSS feed are all auto-discoverable.

All links on the page are marked up using the XFN markup rel=”me” tag, which should help consolidated my identity on the web. There’s an interesting discussion over on Marshall Kirkpatrick’s blog about how our Twitter profiles are starting to rank higher in search engines than our personal blogs or home pages because Twitter is using the rel=”me” tag. Marshall suggests that we start using rel=”me” somewhere on our own sites to counteract that.

To add to the fun, I also tried to get the page to validate as HTML5, but in doing so, I had to remove the meta tag that provides OpenID Attribute Exchange via my OpenID Service Provider. I get the error:

Bad value X-XRDS-Location for attribute http-equiv on element meta.

Apparently the draft HTML5 spec currently disallows values for httpequiv. OpenID AX is a good thing if you want to consolidate your identity while at the same time ensure it is portable. It’s certainly more useful to me than validating as HTML5.

In addition to this, I added a Google Friend Connect (OpenSocial) widget and integrated Apture. I thought about adding the ability to leave comments via Disqus, the advantage being that comment authors could retain control over their own comments. But to be honest, I don’t think you or I need yet another method of communicating with each other. There are plenty of ways to do that already.

Other than providing a playground for fun, what this bit of tinkering on my home page has taught me is that microformats and the ethos of data portability is being embraced quite widely on the web and although I spent my time hand-crafting my new home page, there are opportunities to do much the same, quite easily, through the use of a WordPress blog and a bunch of third-party services. More on that later…

Addicted to feeds

I’ve been a long time consumer of news feeds and spend a lot of time reading the web via 200+ feeds in Google Reader. More recently, and largely as a result of working on WriteToReply, I’ve become just as addicted to publishing feeds from any data end point I can find.

WordPress makes this quite easy for developers, providing a whole load of functions and template tags for feeds. For the rest of us, there’s also documentation which is useful is you’re wondering what kinds of feeds can be generated from a basic WordPress site.

All the examples below, assume you’re using ‘pretty URLs’. If your URLs are something like http://example.com/?p=123 then the same principles apply but you’ll use the format /?feed=feed_type i.e. http://example.com/?feed=rss2 The documentation shows full examples.

So, here are the basic content feeds. RSS is RSS version 0.92 and RDF is RSS version 1.0, if you were wondering.

http://example.com/feed/
http://example.com/feed/rss/
http://example.com/feed/rss2/
http://example.com/feed/rdf/
http://example.com/feed/atom/

It’s also pretty straightforward to create a feed from a category or tag

http://example.com/category/my_category/feed/
http://example.com/tag/my_tag/feed/

You can also create feeds from combined tags

http://example.com/tag/tag1+tag2+tag3/feed/

And we know that a feed is available for site comments

http://example.com/comments/feed/

and it’s simple to grab a feed of comments from a single post by appending /feed/ to the end of the post permalink.

http://example.com/2009/01/01/my-latest-post/feed

You can also create a feed of a single post itself, by appending '?withoutcomments=1' to the end of the URL

http://example.com/2009/01/01/my-latest-post/feed/?withoutcomments=1

There is a feed for each author of the blog

http://example.com/author/joss/feed

but alas, as far as I know, no feed for the comments by any particular person.

You can also do something fancy with dates

http://example.com/2009/feed
http://example.com/2009/01/feed
http://example.com/2009/01/15/feed

and one of my favourite types of feed is from a search

http://example.com/?s=search_term&feed=rss2

Now all of this is well and good, but how many readers are going to know or care about constructing the various types of feeds available? Fortunately, it’s possible to make many of these feeds auto-discoverable either by adding some simple code to your theme’s header.php or installing a plugin.

By default, two feeds are auto-discoverable on your WordPress site: An atom and rss2 feed of your posts.

By using the Extra Feed Links plugin, you can make your comments, category, tag, author and search feeds autodiscoverable.

It’s also got a useful template tag that allows you to show the feed links in your theme, making the discovery of feeds even easier.  I created a simple widget for the plugin to display the feed and an RSS icon in the sidebar

Here’s the code. Let me know where it could be improved as I just hacked it together from looking at other widgets.

<?php
function widget_extrafeeds_register() {
function widget_extrafeeds($args) {
extract($args);
?>
<br />
<?php echo $before_widget;
echo $before_title;
echo $widget_name;
echo $after_title; ?>
<ul class="sidebarList">
<?php extra_feed_link(); ?> <?php extra_feed_link('http://path/to/your/feed/icon/feed.png'); ?>
</ul>
<?php echo $after_widget; ?>
<?php
}
register_sidebar_widget('Extra Feeds',
'widget_extrafeeds');
register_sidebar_widget('Extra Feeds','widget_extrafeeds');}
add_action('init', widget_extrafeeds_register);
?>

To get this to work with the plugin, you need to add this to the very bottom of the plugin’s main.php file

// widget support
require(dirname(__FILE__) . '/widget.php');

Like I said, if anyone can improve on this, do let me know. Also note that you’ll need to point the URL in the widget to a feed icon. A lot of themes include them in their /images/ directory, which makes it easy.

By using the widget or template tag, you can have these appearing on the relevant pages.

Try it by using http://writetoreply.org/tags 🙂

If you’re interested in how to add category and tag auto-discovery feeds to your theme’s source code, try adding this to your header.php

<?php if (is_category()) { ?>
<link rel="alternate" type="application/atom+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> Atom Feed" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'atom'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/atom+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> Atom Feed" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'atom'); ?>" />
<?php } ?>

<?php if (is_category()) { ?>
<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> RSS2 Feed" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'rss2'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> RSS2 Feed" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'rss2'); ?>" />
<?php } ?>

<?php if (is_category()) { ?>
<link rel="alternate" type="application/rdf+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> as RDF data" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'rdf'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/rdf+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> as RDF data" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'rdf'); ?>" />
<?php } ?>

I learned this from the author of this related plugin, which is similar but not quite as powerful as the Extra Feed Links plugin.

Finally, if you use FeedBurner, beware that it breaks some of the above feeds. One fix for ensuring that tag and category feeds continue to work as they should is to modify the FeedBurnerFeedSmith plugin as noted here. Simply change the line

is_feed() &amp;&amp; $feed != 'comments-rss2' &amp;&amp; !is_single() &amp;&amp;

to read

is_feed() &amp;&amp; $feed != 'comments-rss2' &amp;&amp; !is_single() &amp;&amp; !is_tag() &amp;&amp;

That’ll do for now. I intend to learn more about the RSS and Atom specifications over the next few weeks and will post anything I think relevant here. If you can add anything to this post, please do leave a comment. Thanks.

CommentPress

CommentPress is, for educators, one of the most important developments to come out of the WordPress community and one of the most significant innovations that I know of in online publishing. I first learned about it when I saw that Yale University Press were using it to invite comment on Yochai Benkler’s book, The Wealth of Networks. In its original form, CommentPress is a theme for WordPress that allows readers to comment on, annotate and discuss paragraphs of text. In fact, although installed as a theme, it transforms a site not only by design, but with functionality you’d normally expect from plugins. In CommentPress v1.x, form and function came as a single package. It’s worth reading about the background to CommentPress. You’ll see that it’s part of a larger course of research by the Institute for the Future of the Book.

Institute for the Future of the Book was founded in 2004 to [… stimulate] a broad rethinking—in publishing, academia and the world at large—of books as networked objects. CommentPress is a happy byproduct of this process, the result of a series of “networked book” experiments run by the Institute in 2006-7. The goal of these was to see whether a popular net-native publishing form, the blog, which, most would agree, is very good at covering the present moment in pithy, conversational bursts but lousy at handling larger, slow-developing works requiring more than chronological organization—whether this form might be refashioned to enable social interaction around long-form texts… We can imagine a number of possibilities: scholarly contexts: working papers, conferences, annotation projects, journals, collaborative glosses; educational: virtual classroom discussion around readings, study groups; journalism/public advocacy/networked democracy: social assessment and public dissection of government or corporate documents, cutting through opaque language and spin (like the Iraq Study Group Report, a presidential speech, the federal budget, a Walmart or Google press release); creative writing: workshopping story drafts, collaborative storytelling; recreational: social reading, book clubs.

You can also read about CommentPress in The Chronicle for Higher Education and The Journal of Electronic Publishing.

We have started to use CommentPress at the University of Lincoln for the discussion of internal documents and feedback from staff has been good. Many are astonished at what it makes possible. A departmental research strategy paper received over 100 comments from nine staff; something we’d never have had by emailing the document out for comment. Of course, I am keen to use it to support courses and a colleague and I have recently applied for funding to use CommentPress in a course with over 100 Criminology students, who are normally asked to critique texts and respond by emailing Word documents to their tutor. Using CommentPress allows for transparent and open, formative feedback and assessment by both staff and student peers.

Outside of my work for the university, I’ve been developing WriteToReply, with Tony Hirst from the Open University. You can read about how we started WriteToReply and you’ll see that CommentPress is fundamental to what we’re trying to achieve and we’re using it for networked democracy, as suggested above. CommentPress is in fact, a comment engine for each document site. Two things make this possible. First, and most obvious, is the fact that readers on a document site can direct comments to specific paragraphs of text. Readers can also respond to other readers’ comments and a happy by-product of our re-publication of the Digital Britain – Interim Report, is that the discussion still continues, despite the consultation period being over. So CommentPress is an engine for on-site comment and discussion. Texts are dissected but remain whole; they also become social objects.

The second important contribution CommentPress has made is the provision of permalinks for each paragraph in the text. This provides a unique URI or URL for each paragraph of text, making linked references from third-party web sites possible. Combined with the trackback/pingback system built into decent web publishing platforms, CommentPress makes remote commenting on text possible, as Tony explains on his blog.

What this means is that the paragraph, action point, section or whatever can become a linked resource, or linked context, and can support remote commenting. And in turn, the remark made on the third party site can become a linked annotation to the corresponding part of the original report… How? Well through the judicious use of trackbacks… So even if you don’t want to comment on the Digital Britain Interim report on the WriteToReply site, but you do care, why not post your thoughts on your own blog, and link your thoughts directly back to the appropriate part of the report on WriteToReply?

It’s this feature, so easily missed, which makes CommentPress a comment engine. An engine suggests an underlying technology that drives something greater. By introducing paragraph permalinks, text can now be linked at a much more accurate and deeper level than was previous possible. Texts are transformed into uniquely identifiable resources of data. Academics can now reference paragraphs rather than page numbers and readers can reflect, comment and participate in the analysis of texts from their own site. For the reader, CommentPress provides a fluid interface to the document as a whole but at a technical level, explodes it across the Internet.

In the running of WriteToReply, we’ve tested CommentPress quite hard and found it to be a complex and fragile tool. Until recently, it hasn’t been updated to reflect the fast changing development of WordPress and because of its extensive use of Javascript, it clashes with other plugins, so while it transforms a WordPress site, it also restricts functionality otherwise possible. Fortunately, CommentPress 2 is being actively worked on and I’ve been helping to test it with Eddie Tejeda, the original developer. It’s currently in beta, but Eddie is responding to my feedback and fixing issues rapidly. There is a mailing list for CommentPress and the code is publicly accessible.

CommentPress 2.2 Beta
CommentPress 2.2 Beta

If you test CommentPress 2, you’ll immediately see that it’s been split into a suite of plugins and themes and that it’s now much more flexible in terms of compatibility with other WordPress plugins and in being able to select different components, options and themes.  Notably, paragraph permalinks are available as a separate plugin, which means that any WordPress blog will be able to have paragraph-level URIs, without necessarily supporting paragraph level commenting. My test site is on WriteToReply. Feel free to have a look and post comments, if you wish. As I write, it’s not quite ready for everyday use, but at the speed which Eddie has been working over the last few days, I’m confident that I’ll be able to use it here at the university and on WriteToReply before the month’s out. If you’re used to using v1.4.1, you’ll notice a lot of change. Remember that it’s still beta software and that not all of the features have been fully implemented yet. It would be great if other people could help test it across various browsers and with different documents. Multimedia is not something I’ve yet been able to throw at it, for example.

Finally, CommentPress needs continued support in terms of testing, reporting issues, bug fxes and feature development. This can be done voluntarily, but given it’s potential to support education, business and government consultations, I for one, will be looking for ways to raise funding to help support all of this. If you know of any possible funding opportunities within UK Higher Education, please do let me know.

Mozilla & Creative Commons Open Education Course /1

Yesterday, I started a six week course on Open Education, organised by the Mozilla FoundationCreative Commons and The Peer 2 Peer University. Like all 26 participants, I’ll be blogging about the course with the tags ‘MozOpenEd’ or ‘MozOpenEdCourse’ and it looks like we’ll be aggregating as much online activity as possible into a blog I’m setting up here.

The course outline looks really interesting and yesterday’s first online session got off to a good start. Working with people both synchronously and asynchronously online is always interesting and the enthusiasm and democracy within the group is going to be quite infectious.  Some good Case Studies have been provided for discussion and we’ve each been asked to work on a blueprint project over the next six weeks. I’ll be working on student identity, data portability, portfolios and the transition to and from university, using this WordPress/BuddyPress platform I run at the University of Lincoln. My starting point is that university life should not necessarily encumber students with a further digital identity, should build on their existing digital identity and portfolio and that all digital products of student life should be portable as they are increasingly becoming elsewhere.

My thinking is not at all clear right now, but I’ve got a strong gut feeling that this topic is going to both interest me and be practically implementable. I’m working on a project with a colleague who teaches animation and it may be that I can combine these projects to the benefit of all. I signed up for the course so that I would be forced to clarify my thinking and hopefully benefit the work I’m already doing.

Anyway, more on all of this next week.

Open Calais + site-wide tags = semantic site architecture

Preamble about people

Over the last month, we’ve I’ve started to grow an embryonic social web publishing platform that can be many things but fundamentally offers a personalised and collaborative environment for research, teaching and learning. (Where? You’re looking at it!). There are a few active blogs (currently fewer than on the pilot Learning Lab blogs), nearly 70 users and the word is starting to get out at a pace that I can manage. So, now it’s time to look to the future…

By running BuddyPress, the connections between people are pretty much taken care of. Sign in to http://dev.lincoln.ac.uk with a Lincoln username and password and you’ve joined a community that, as it grows, will increasingly and effortlessly connect people through the information they choose to add to their profile. Staff and students can click on a link and find other people who have similarly tagged their profile.

Notice the comma seprated hyper-linked data
Notice the comma-separated hyper-linked data

What is of equal interest to me, and potentially very useful to the university community, is how we link the content that is being generated by staff and students and make those links accessible. It is not difficult to appreciate what the potential is when you have a revolving community of 10,000 people who, over time, document their work, their research, teaching and learning using cutting edge web publishing tools, but I’m writing this post to try and understand and sketch out how I might evolve what I have begun.

Put simply, WordPress Multi-User (WPMU) allows one person (me) to provide and manage multiple web sites which other people (staff and students) take ownership of. Typically, every action, every new user and every new page and post on every site, is recorded and held in a shared database(s). Although at this low level, the data is relational, on the surface, when you look at one of the sites, they pretty much stand alone and so they should. We’re not talking about a single website with lots of users, we’re talking about lots of websites with lots of users. They might be working collaboratively with others, but they’re working as individuals or in distinct groups that benefit from a distinct online identity. BuddyPress helps bring things together by aggregating people’s actions (i.e. posting blog updates, making friends, joining groups, posting messages) but the visibility of those connections is transient. Social networks display our actions along a timeline and the connections between people are, for the most part, buried until the next time person A interacts with person Y.

Enough about connecting people.

Site-wide content aggregation

Site content is a mixture of text, multimedia and metadata. The last thing I’ll do when completing this blog post is to categorise and tag it. Each time I write, I publish text, (sometimes images) and metadata which summarises and categorises the full text. Why am I telling you this? You know it already. What you may not know is that each post created on our university WPMU installation, by any person, providing their blog is public, is aggregated into a single site and re-published a second time. So this post exists here on this site and there, on the Community Posts site. Notice how the Community Posts version links back to the original post. We’re not creating a whole new resource, we’re creating a powerful linked resource that allows others to search, filter, browse and discover content held across multiple sites. With only a few sites up and running here at the moment, the opportunity to discover varied content is limited, but over time that will change. Look at wordpress.com, where there are 5 million sites:

Browse by user-generated metadata

Search over 5 million sites
Search over 5 million sites

On the university blogs, this is made possible through the use of the site-wide-tags plugin, which was developed by @donncha, the same person that develops WPMU and the wordpress.com site. By using this plugin, a WPMU installation can share similar functionality to what you see on wordpress.com. I say ‘similar’ because, as I’ll mention later, designing how people discover content is key to all of this and something I, or we as a community, would benefit from thinking about and acting on collectively.

Community Posts
Community Posts

On the Community Posts site, you can search the full-text of every post, filter resources by category and tag, and subscribe to feeds from any combination of tag or category. Any search can be turned into a feed by appending ‘&feed=rss’ to the end of the resulting URL.

i.e. http://tags.dev.lincoln.ac.uk/?s=gaming&feed=rss

To create a feed from a tag or category, just click on a tag or category and append ‘/feed’ to the end of the URL.

i.e. http://tags.dev.lincoln.ac.uk/tag/games/

You can combine tags with ‘+’, too:

http://tags.dev.lincoln.ac.uk/tag/games+development/

You can also specify the type of feed you want by appending:

/feed/rss/
/feed/rss2/
/feed/rdf/
/feed/atom/

Mixing categories and tags is currently broken by a bug but is due to be fixed in the next version of WordPress.

So it’s not difficult to imagine, over time, an active community of thousands of university web publishers, having their content aggregated into a site-wide resource that allows full text searching, browsing and filtering with a choice of feeds to syndicate that content elsewhere. See how it’s happening at the University of Mary Washington, where over 2400 sites have been created in under three years.

Semantic technology

Yesterday, I discovered OpenCalais. It’s a semantic technology that’s been around since January 2008, so you might be tired of hearing about it, but if not, ‘Welcome to Web 3.0!’

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

Nice. And it’s installed on this site. There are three Calais plugins available for WordPress. This one, allows writers to submit their blog posts to the OpenCalais web service API and fetch back a number of auto-generated tags based on the content of their post. The longer the post, the more tags are returned. Tags are returned in just seconds. Those tags can be added to the post in their entirety or used selectively (actually, you have to add them all and then remove those you don’t want to include – a minor irritation). This next plugin, allows you to automatically go through every post you’ve written and tags them using the Calais web service. It’s all or nothing, but following the auto-tagging of archive content, you can then go to the ‘tags’ menu and delete any tags you don’t want to use. I’ve done that to this site and to the Community Posts site. Calais looks for names, facts and events and the API allows for up to 40,000 transactions a day and up to four per second. It returns some predictable tags and a few odd ones, but on the whole is fast and works like magic.

The third plugin also allows blog authors to fetch tags for the post they are writing and, in addition, it also suggests Creative Commons licensed images based on a dynamic evaluation of the chosen or suggested tags.

The tagaroo interface
The tagaroo interface

Image suggestion is a nice idea, but tends to return some fairly generic images.

Having used OpenCalais to auto-tag the Community Posts site, a whole new and richer set of semantic metadata has been added with barely any effort. The challenge now is to figure out how to 1) automate this as a scheduled process, so that the Calais plugin looks for new content every hour, say, and tags whatever has been recently introduced (a cron job that calls the plugin and a modification to the plugin to look at the timestamp of the post and ignore anything older than when it was last run?); 2) present the semantic data in an accessible way and this mostly, I think, comes down to appropriate site design.  The wordpress.com screenshots above show one way of doing it. A del.icio.us style approach is a more powerful and versatile model of tag filtering. Until then, it’s a matter of constructing filters, searches and feeds in the way I’ve outlined above.

So how might all of this semantically structured data be used? It seems to me that most of the advantages are proportional to the quantity of information available. For teaching and learning, it could be used by students and staff who want to find and re-use material that has been posted in the past for a specific course or subject area. Great for new students who want to measure the type and quality of work produced by students in previous years. In a similar way, it could be used by staff looking for posts by colleagues on subjects they might be teaching, and because searches and tags can be turned into feeds, past content could be aggregated into a new course site. A widely adopted, semantically tagged WPMU installation could also reveal trends in the type of work occurring at the university and, by tagging names of people, queries against references to Prof. X’s work could be made (I also wonder whether through the use of feeds, content from the institutional repository could be joined up with all of this, too – but it’s late in the day and I can’t think straight).

You’ll see from the image below that using Calais on the Community Posts site, resulted in a much richer variety of tags than would have appeared if we relied on user-generated tagging alone (136 posts now have 558 tags). Some people don’t even bother to tag their work… Shame on them! Notice too, that with the Firefox Operator plugin, you can take a tag on the site and use it to find related resources elsewhere. So if you’re looking at work tagged ‘client-applications’ on WPMU, you can conveniently hop over to delicious and find further web resources or, on a whim, look at what books on this subject are available on Amazon.

Operator provides a way to use tags on one site to discover related resources on another site
Use tags on one site to discover related resources on another site

Anyway, if you’re still reading, you might remember from the title of this post that my overriding interest in all of this is how it can be understood as and developed into a site-wide ‘architecture’. Again, I’m thinking how user-generated tags have determined the way delicious is designed for navigation and searching of resources. I need to learn more about how WordPress themes are constructed and consider how available functions can be best exploited and usefully presented on this type of site. If you have any ideas or want to work on a specific theme to get the most out of the site-wide-tags plugin, please do leave a comment or get in touch on Twitter @josswinn

OAuth, OpenID, XMPP with WordPress

Automattic, the company behind WordPress, released an update to Prologue, their theme for group discussion, today. I read about this, minutes after reading about the new OAuth features in WordPress 2.8 and an hour or so after reading about a new Facebook Connect plugin for BuddyPress, the social networking layer for WordPress. All this stimulation proved a bit too much for me, so this post is an attempt to plot what’s happening here and what might be possible in just a few months from now…

So, I have the BuddyPress Facebook Connect plugin working on a my test installation…

BuddyPress Facebook Connect

Nothing fancy going on there. Basically, new users to the site can register using their Facebook credentials. The plugin doesn’t do anything for existing users on the site. They just login with their local account as usual. For a first release, the plugin is a good proof of concept and with a bit more integration work will make it easy for Facebook users to join BuddyPress sites.

The new Prologue theme, P2, is impressive, too…

P2 on wordpress.com

It takes advantage of the new threaded comments feature in WordPress 2.7+ , has ‘realtime’ notifications (unless I’ve missed something, the use of the term ‘realtime’ is a stretch – see below) and has some nice keyboard shortcuts…

Keyboard shortcuts

One thing that’s lacking is a Twitter-like realtime notification that a new post has been made and you should refresh your bowser. Twitter doesn’t use it for the user home page, but they do on their search page and I like it.

Twitter notifications

Moving on, OAuth functionality for WordPress is still in development but the latest code from the SVN trunks of both the DiSo plugin and WordPress does appear to work…

OAuth options

Be warned that it does not run on a server where PHP runs as a CGI. I tried to run it first on Dreamhost, but it gave an error showing that getallheaders() is an undefined function.

I need to spend more time with the OAuth plugin to see how it will actually work in practice. One of the first use-cases for it is to allow client applications like the iPhone app, to be able to post remotely without sending a password using XML-RPC. If anyone has any ideas and wants to test it with me, please leave a comment. As I understand from the announcement, it’s working but it’s still early days… For more information, see Will Norris’ presentation from last August.

Finally, there’s mnw, a new plugin for WordPress that provides support for the OpenMicroBlogging specification. With this, users from other sites using the specification, such as identi.ca and other Laconica-based services, can subscribe to your blog/omb site and receive updates whenever you publish a new post or page. So this…

WP OMB…ends up here…

WP posts on identica

mnw is still a bit rough around the edges but it was only released as V0.1 a month ago, so that’s to be expected. Note that mnw only seems to work on single WP installations (WPMU produces a familiar error message which I think is wp_nonce related) and does not work on WP 2.8 trunk. Also, identi.ca complained of my avatar image being the wrong size. In the example above, I’d removed my avatar from the mnw settings, but I’ve since found that a .png of 96px seems to work OK.

What does it mean for me and you?

So, what does all this mean? In terms of wordpress.com, we might speculate that before too long, they will add the BuddyPress layer to their 4.5m blogs to create a sizeable social network. The P2 theme shows posts in realtime, they’re already offering an XMPP firehose of blog posts and there are plugins that offer XMPP functionality for WordPress, so remote real-time updates aren’t far away and realtime remote publishing already exist using XML-RPC. With the P2 theme, anyone can create a Twitter-like site that any number of registered users can post to and anyone can comment on. Add OpenID authentication and OAuth authorisation and you’ve got a large, mature and open social (micro)blogging service.

For self-hosted WordPress users, it’s even closer to being a reality. I’ve had a site running today that accepts new user registrations via the DiSo OpenID plugin and those users can then post updates to the Prologue themed site and join a threaded group discussion. If I enabled XML-RPC posting, users could post in ‘realtime’ to the group site from their iPhone or other other client app. With OAuth support, this would be possible from desktop and mobile applications as well as other sites such as Flickr, without exchanging protected user data such as a password. Those updates could also be broadcast via XMPP in realtime, which I’ve done on another blog I was testing.

WordPress Flickr account setup

Things are a bit different for WordPressMU/BuddyPress installations. As you’ve seen above, I’ve got a BuddyPress site running that accepts users joining via Facebook connect.  Functionality is limited to social networking and it still has some issues that need working on before it’s ready for every-day use (I’ve noted them on the BP forum). WPMU blogs (by which I mean blogs not the overall site) don’t allow new-user registrations so the blog adminstrator needs to sign up new users. Users registered via Facebook don’t have an email address associated with their account, so blog admins can’t add these types of users as the process requires a username and email address of a new or existing user.

However, by activating the right plugins, registered WPMU users (I’m thinking university staff and students) could participate in a group microblog using the P2 theme, LDAP and/or OpenID for login and XML-RPC and XMPP for remote publishing and receiving posts. It won’t be too long before you can send and receive WordPress posts via your GMail or Jabber account (on your iPhone/iPod) in realtime (hopefully with support for tagging), and all of that data is simply WordPress data and has RSS feeds hanging off every tag and wrapped around every post.

Just a thought.

HEFCE HE Grant Allocations 2009-10 Visualised

In our weekly team meeting, I mentioned that I’d created some visualisations of the RAE research funding allocations. I also mentioned that Tony Hirst had previously done the same for the HEFCE teaching funding allocations. I offered to send everyone links to these, but before do so, I thought I’d have a go at re-creating the HEFCE visualisations myself to get a bit more practice in with IBM’s Many Eyes Wikified. So this is a companion piece to my previous post. All credit to Tony for opening my eyes to this stuff.

So, HEFCE have announced the 2009/10 grant allocations for UK Higher and Further Education institutions and provided full spreadsheets of the figures. I’ve imported the data into Google Spreadsheets and made the three tables publicly accessible as CSV files (1), (2), (3). Note that I’ve stripped out all data relating to FE grant allocations, which is included in the original spreadsheets.

Next, I’ve imported the CSV files into IBM’s Many Eyes Wikified (1), (2), (3), and these wikified tables are now the data sources for the following visualisations.

Recurrent grant for academic year 2009-10

The Pie

Pie Chart of HEFCE Funding

Bar Chart

HEFCE Funding Bar Chart

Matrix

HEFCE Funding Matrix

Bubble Chart

HEFCE Funding Bubble Chart

Comparison with 2008-09 academic year recurrent grant

The Pie

HEFCE Recurrent Funding Pie ChartBar Chart

HEFCE Recurrent Funding Bar ChartMatrix

HEFCE Recurrent Funding MatrixBubble

HEFCE Recurrent Funding Bubble ChartNon-recurrent funding for 2009-10

The Pie

HEFCE Non-recurrent Funding Pie Chart

Bar Chart

HEFCE Non-recurrent Funding Bar Chart

Matrix

HEFCE Non-recurrent Funding Matrix

Bubble ChartHEFCE Non-recurrent Funding