Standards & Specs – Page 3

Triplify: Make your blog mashable

Joss Winn / April 27, 2009August 30, 2009 / Commons, Data, Fun, Mashups, Standards & Specs, Web

Last week, I wrote about how it is relatively simple to ‘pimp your ride on the semantic web‘. Over the weekend, I stumbled upon Triplify, a small ‘plugin’ for pretty much any web publishing platform, that “reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.” What is so appealing about Triplify is how easy it is to implement, especially alongside a WordPress site.

I can confirm that the three-step installation process is all it takes, although I wouldn’t undertake implementing this blindly as you are, literally, exposing a semantic representation of your database content. In other words, you should look at the configuration file you’re using and check that it’s going to expose the right data and not clear text passwords and unpublished posts and comments. Before I implemented it, I realised that it would expose comments on a bunch of posts that I have since made private (they were imported from an old, private blog), so I had to ‘unapprove’ those comments so the script didn’t expose them to the public. A five minute job. Alternatively, the script could probably be modified to work around my problem, by only exposing comments after a certain date, for example.

The end result is that, with a WordPress site, you expose a semantic representation of your users, posts, pages, tags, categories, comments and attachments in RDF (N-Triples) and JSON formatted data (for JSON, just add ‘?t-output=json’ to the end of the URI). Like I said though, it could be used on any database driven web application. Here’s what you get when you expose the high level links to your content:


&lt;http://blog.josswinn.org/triplify/&gt; &lt;http://www.w3.org/2000/01/rdf-schema#comment&gt; "Generated by Triplify V0.5 (http://Triplify.org)" .
&lt;http://blog.josswinn.org/triplify/&gt; &lt;http://creativecommons.org/ns#license&gt; &lt;http://creativecommons.org/licenses/by/2.0/uk/&gt; .
&lt;http://blog.josswinn.org/triplify/post&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/attachment&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/tag&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/category&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/user&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .
&lt;http://blog.josswinn.org/triplify/comment&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.w3.org/2002/07/owl#Class&gt; .

Here’s an example of what you get when you expose the full content:


&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://rdfs.org/sioc/ns#Post&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://rdfs.org/sioc/ns#has_creator&gt; &lt;http://blog.josswinn.org/triplify/user/1&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://purl.org/dc/terms/created&gt; "2008-10-06T05:55:25"^^&lt;http://www.w3.org/2001/XMLSchema#dateTime&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://rdfs.org/sioc/ns#content&gt; "Up early to go to Sheffield for LPI exams. The last week has left me underprepared. Never mind." .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://purl.org/dc/terms/modified&gt; "2008-10-06T20:12:15"^^&lt;http://www.w3.org/2001/XMLSchema#dateTime&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/27&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/41&gt; .
&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag&gt; &lt;http://blog.josswinn.org/triplify/tag/42&gt; .

...

&lt;http://blog.josswinn.org/triplify/post/154&gt; &lt;http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/belongsToCategory&gt; &lt;http://blog.josswinn.org/triplify/category/22&gt; .

...

&lt;http://blog.josswinn.org/triplify/tag/154&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/Tag&gt; .
&lt;http://blog.josswinn.org/triplify/tag/154&gt; &lt;http://www.holygoat.co.uk/owl/redwood/0.1/tags/tagName&gt; "valentine" .

You can choose to expose different levels of information in your HTML source. If you have more than a moderate amount of content, you’ll probably want to just expose the top level links as in the first example and let the users of your data dig deeper. You’ll also note that you can (and should) attach a license to your data.

A number of namespaces are recognised as well as a WordPress vocabulary.


$triplify['namespaces']=array(
'vocabulary'=&gt;'http://sdp.iasi.rdsnet.ro/semantic-wordpress/vocabulary/',
'rdf'=&gt;'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs'=&gt;'http://www.w3.org/2000/01/rdf-schema#',
'owl'=&gt;'http://www.w3.org/2002/07/owl#',
'foaf'=&gt;'http://xmlns.com/foaf/0.1/',
'sioc'=&gt;'http://rdfs.org/sioc/ns#',
'sioctypes'=&gt;'http://rdfs.org/sioc/types#',
'dc'=&gt;'http://purl.org/dc/elements/1.1/',
'dcterms'=&gt;'http://purl.org/dc/terms/',
'skos'=&gt;'http://www.w3.org/2004/02/skos/core#',
'tag'=&gt;'http://www.holygoat.co.uk/owl/redwood/0.1/tags/',
'xsd'=&gt;'http://www.w3.org/2001/XMLSchema#',
'update'=&gt;'http://triplify.org/vocabulary/update#',
);

So, what’s the point in doing this? Well, it’s fairly trivial and if you think that structured, linked, machine-readable licensed data is a Good Thing, why not? The Triplify website lists an number of advantages:

Such a triplification of your Web application has tremendous advantages:

The installations of the Web application are better found and search engines can better evaluate the content.

Different installations of the Web application can easily syndicate arbitrary content without the need to adopt interfaces, content representations or protocols, even when the content structures change.

It is possible to create custom tailored search engines targeted at a certain niche. Imagine a search engine for products, which can be queried for digital cameras with high resolution and large zoom.

Ultimately, a triplification will counteract the centralization we faced through Google, YouTube and Facebook and lead to an increased democratization of the Web

The vision of the semantic web and semantic publishing is one of meaningfully identifying objects (and people) on the Internet and showing their relationships. This should improve searches for things on the web, but also improve how we exchange knowledge, re-use information and help clarify our identity on the web, too. It’s an ambitious task, but made easier with tools like Triplify. The semantic web also raises questions over individual privacy and, if data is well formed and accessible, it may be easier to control and therefore censor. The creator of Triplify recently gave a technical presentation on Triplify and how it is being used to publish data collected by the OpenStreetMap project. It shows how geodata exposed in this way can result in mashup applications that directly benefit you and me.

Pimping your ride on the semantic web

Joss Winn / April 21, 2009August 30, 2009 / Fun, Identity, Standards & Specs, Tips, Web

Yesterday, I wrote about how I’d marked up my home page to create a semantic profile of myself that is both auto-discoverable and portable. A place where my identity on the web can be aggregated; not a hole I’ve dug for myself, but an identity that reaches out across the web but always leads back home.

While I enjoy polishing my text editor regularly and hand-crafting beautifully formed, structured data, we all know it’s a fool’s game and that the semantic web is about machines doing all the work for us. So here’s a quick and dirty run down of how to pimp your ride on the semantic web with WordPress and a few plugins.

You’ll need a self-hosted WordPress site that allows you to install plugins. I’ve got one on Dreamhost that costs me $6 a month. Next, you’ll want to install some plugins. I’ll explain what they do afterwards. One thing to note here is that I’m using plugins from the official plugin repository whenever possible. It means that you can install them from the WordPress Dashboard and you’ll get automatic updates (and they’re all GPL compatible). In no particular order…

I think that’s quite enough. All but the SIOC plugin are available from the official WordPress plugin repository. Here’s what they provide:

APML: Attention Profile Markup Language

APML (Attention Profiling Mark-up Language) is an XML-based format for capturing a person’s interests and dislikes. APML allows people to share their own personal attention profile in much the same way that OPML allows the exchange of reading lists between news readers.

The plugin creates an XML file like this one that marks up and weighs your WordPress tags as a measure of your interests. It also lists your blogroll/links and any embedded feeds.

Extended Profile

This plugin adds additional fields in your user profile which is encoded with hCard semantic microformat markup and can then be displayed in a page or as a sidebar widget. You can import hCard data, too. There might also be another use for this, too. (see below)

Micro Anywhere

Provides a couple of additional editor functions that allow you to create an hCard or hCalendar events page. Here’s an example.

OpenID

This plugin allows users to login to their local WordPress account using an OpenID, as well as enabling commenters to leave authenticated comments with OpenID. The plugin also includes an OpenID provider, enabling users to login to OpenID-enabled sites using their own personal WordPress account. XRDS-Simple is required for the OpenID Provider and some features of the OpenID Consumer.

This is key to your identity. You can use your blog URL as your OpenID or delegate a third-party service, such as MyOpenID or ClaimID. In fact, you’ve almost certainly got an OpenID already if you have a Yahoo!, Google, MySpace or AIM account. It’s up to you which one you choose to use as your persistent ID. Read more about OpenID here. It’s important and so are the issues it addresses.

XRDS-Simple

This is required to add further functionality to the OpenID plugin. It adds Attribute Exchange (AX) to your OpenID which basically means that certain profile information can be passed to third-party services (less form filling for you!) Like a lot of these plugins, install it and forget about it.

SIOC

Provides auto-discoverable SIOC metadata. “A SIOC profile describes the structure and contents of a weblog in a machine readable form.”

wp-RDFa

Provides an auto-discoverable FOAF (Friend of a Friend) profile, based on the members of your blog. I’ve been in touch with the author of this plugin and suggested that the extended profile information could also be pulled into the FOAF profile. This is largely dependent on the FOAF specification being finalised, but expect this plugin to do more as FOAF develops.

OAI-ORE Map

Provides an auto-discoverable OAI-ORE resource map of your blog. It conforms to version 0.9 of the specification, which recently made it to v1.0, so I imagine it will be updated in the near future. OAI-ORE metadata describes aggregated resources, so instead of seeing your blog post permalink as the single identifier for, say, a collection of text and multimedia, it creates a map of those resources and links them.

LinkedIn hResume

LinkedIn hResume for WordPress grabs the hResume microformat block from your LinkedIn public profile page allowing you to add it to any WordPress page and apply your own styles to it.

I like this plugin because you benefit from all the features of LinkedIn, but can bring your profile home. Ideal for students or anyone who wants to create a portfolio of work and offer their resume/CV on a single site. Depending on the theme you use, it does require some additional styling.

Get_OPML

This is a nice way to create an OPML file of your sidebar links. If, like on my personal blog, your links point to resources related to you, you can easily create an OPML file like this one. There’s a couple of things to note about this plugin though. The instructions mention a Technorati API key. I didn’t bother with this. When you create your links, just scroll down the page to the ‘advanced’ section and add the RSS feed there. Secondly, the plugin author has, for some stupid reason, hard-coded the feed to their own site into the plugin. Assuming you don’t want this spamming your personal OPML file, download a modified version from here or comment out line 101 in get-opml.php. I guess the plugin author thinks that you’ll be using this to import the OPML into a feed reader and from there, you can delete his feed. That’s no good to us though. Finally, you’ll want to make your OPML file auto-discoverable. You can do this by adding a line of html in your header, using the Header-Footer plugin below.

Header-Footer

This simply allows you to add code to the header and footer of your blog. In our case, you can use it to add an auto-discovery link to the header of every page of your blog.


<link rel="outline" type="text/xml+opml" title="ADD YOUR TITLE HERE" href="http://YOUR_BLOG_ADDRESS/opml.xml" />

WP Calais * + tagaroo

These three plugins use the OpenCalais API to examine your blog posts and return a bunch of semantic tags. I’ve written about this in more detail here (towards the end).

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

It’s an easy way to add relevant tags to your content and broadcast your content for indexing by OpenCalais. They place an additional link in your header that lists the tags for web crawlers and, I guess, improves the SEO for your site.

Extra Feed Links

I’ve written about this plugin previously, too. It adds additional autodiscovery links to your blog for author, category and tag feeds. WordPress feed functionality is very powerful and this plugin makes it especially easy to make those feeds visible.

Lifestream

This isn’t a semantic web plugin, but is a powerful way of aggregating all of your activity across the web into a single activity stream. See my example, here. It also produces a single RSS feed from your aggregated activity. Nice 😉

Wrapping things up

If you set all of this up, you’ll have a WordPress site that can act as your primary identity across the web, aggregates much of your activity on the web into a single site and also offers multiple ways for people to discover and read your site. You also get a ‘well-formed’ portfolio that is enriched with semantic markup and links you to the wider online community in a way that you control.

Bear in mind that some of these plugins might not appear to do anything at all. The semantic web is about machines being able to read and link data, right? If you look closely in the source of your home page, you’ll see a few lines that speak volumes about you in machine talk.


<link rel="meta" href="./wp-content/plugins/wp-rdfa/foaf.php"type="application/rdf+xml" title="FOAF"/>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<link rel="meta" type="text/xml" title="APML" href="http://blog.josswinn.org/apml/" />
<link rel="alternate" type="application/rss+xml" title="NoteStream RSS Feed" href="http://blog.josswinn.org/feed/" />
<link rel="resourcemap" type="application/atom+xml" href="http://blog.josswinn.org/wp-content/plugins/oai-ore/rem.php"/>

If you do want a way to view the data, I recommend the following Firefox add-ons

Operator: Auto-discovers any embedded microformats and provides useful ways to search for similar data via third-party services elsewhere on the web.

OPML Reader: Auto-discovers an OPML file if you have one linked in your header. Allows you to either download the file or read it on Grazr.

Semantic Radar: Auto-discovers embedded RDF data. Displays custom icons to indicate the presence of FOAF, SIOC, DOAP and RDFa formats.

The Tabulator Extension: Auto-discovers and provides a table-based display for RDF data on the Semantic Web. Makes RDF data readable to the average person and shows how data are linked together across different sites.

As always, please let me know how this overview could be improved or if you know of other ways to add semantic functionality to your WordPress blog. Thanks.

A few notes on data portability

Joss Winn / April 20, 2009April 20, 2009 / Fun, Identity, Standards & Specs, Web

I had a bit of fun over the weekend looking at how I could both aggregate my online presence and make it portable, all under my own domain name. I ended up touching on a bunch of interesting initiatives revolving around web and data standards. The minor output of this is over on my personal ‘home page’ at http://josswinn.org

You’ll see that there’s an Attention Profile (APML), Friend of a Friend document (FOAF), hCard generated from my contact details, an OPML file of the significant feeds I have spotted around the web (Delicious, this blog, Twitter, Last.fm, etc), an aggregated feed of my OPML file, and a link to my LinkedIn profile, which I happily learned includes hResume microformat markup. My OPML, FOAF profile and RSS feed are all auto-discoverable.

All links on the page are marked up using the XFN markup rel=”me” tag, which should help consolidated my identity on the web. There’s an interesting discussion over on Marshall Kirkpatrick’s blog about how our Twitter profiles are starting to rank higher in search engines than our personal blogs or home pages because Twitter is using the rel=”me” tag. Marshall suggests that we start using rel=”me” somewhere on our own sites to counteract that.

To add to the fun, I also tried to get the page to validate as HTML5, but in doing so, I had to remove the meta tag that provides OpenID Attribute Exchange via my OpenID Service Provider. I get the error:

Bad value X-XRDS-Location for attribute http-equiv on element meta.

Apparently the draft HTML5 spec currently disallows values for http–equiv. OpenID AX is a good thing if you want to consolidate your identity while at the same time ensure it is portable. It’s certainly more useful to me than validating as HTML5.

In addition to this, I added a Google Friend Connect (OpenSocial) widget and integrated Apture. I thought about adding the ability to leave comments via Disqus, the advantage being that comment authors could retain control over their own comments. But to be honest, I don’t think you or I need yet another method of communicating with each other. There are plenty of ways to do that already.

Other than providing a playground for fun, what this bit of tinkering on my home page has taught me is that microformats and the ethos of data portability is being embraced quite widely on the web and although I spent my time hand-crafting my new home page, there are opportunities to do much the same, quite easily, through the use of a WordPress blog and a bunch of third-party services. More on that later…

Addicted to feeds

Joss Winn / April 15, 2009February 24, 2010 / Data, Fun, Mashups, Standards & Specs

I’ve been a long time consumer of news feeds and spend a lot of time reading the web via 200+ feeds in Google Reader. More recently, and largely as a result of working on WriteToReply, I’ve become just as addicted to publishing feeds from any data end point I can find.

WordPress makes this quite easy for developers, providing a whole load of functions and template tags for feeds. For the rest of us, there’s also documentation which is useful is you’re wondering what kinds of feeds can be generated from a basic WordPress site.

All the examples below, assume you’re using ‘pretty URLs’. If your URLs are something like http://example.com/?p=123 then the same principles apply but you’ll use the format /?feed=feed_type i.e. http://example.com/?feed=rss2 The documentation shows full examples.

So, here are the basic content feeds. RSS is RSS version 0.92 and RDF is RSS version 1.0, if you were wondering.

http://example.com/feed/

http://example.com/feed/rss/

http://example.com/feed/rss2/

http://example.com/feed/rdf/

http://example.com/feed/atom/

It’s also pretty straightforward to create a feed from a category or tag

http://example.com/category/my_category/feed/

http://example.com/tag/my_tag/feed/

You can also create feeds from combined tags

http://example.com/tag/tag1+tag2+tag3/feed/

And we know that a feed is available for site comments

http://example.com/comments/feed/

and it’s simple to grab a feed of comments from a single post by appending /feed/ to the end of the post permalink.

http://example.com/2009/01/01/my-latest-post/feed

You can also create a feed of a single post itself, by appending '?withoutcomments=1' to the end of the URL

http://example.com/2009/01/01/my-latest-post/feed/?withoutcomments=1

There is a feed for each author of the blog

http://example.com/author/joss/feed

but alas, as far as I know, no feed for the comments by any particular person.

You can also do something fancy with dates

http://example.com/2009/feed

http://example.com/2009/01/feed

http://example.com/2009/01/15/feed

and one of my favourite types of feed is from a search

http://example.com/?s=search_term&feed=rss2

Now all of this is well and good, but how many readers are going to know or care about constructing the various types of feeds available? Fortunately, it’s possible to make many of these feeds auto-discoverable either by adding some simple code to your theme’s header.php or installing a plugin.

By default, two feeds are auto-discoverable on your WordPress site: An atom and rss2 feed of your posts.

By using the Extra Feed Links plugin, you can make your comments, category, tag, author and search feeds autodiscoverable.

It’s also got a useful template tag that allows you to show the feed links in your theme, making the discovery of feeds even easier. I created a simple widget for the plugin to display the feed and an RSS icon in the sidebar

Here’s the code. Let me know where it could be improved as I just hacked it together from looking at other widgets.

<?php
function widget_extrafeeds_register() {
function widget_extrafeeds($args) {
extract($args);
?>
<br />
<?php echo $before_widget;
echo $before_title;
echo $widget_name;
echo $after_title; ?>
<ul class="sidebarList">
<?php extra_feed_link(); ?> <?php extra_feed_link('http://path/to/your/feed/icon/feed.png'); ?>
</ul>
<?php echo $after_widget; ?>
<?php
}
register_sidebar_widget('Extra Feeds',
'widget_extrafeeds');
register_sidebar_widget('Extra Feeds','widget_extrafeeds');}
add_action('init', widget_extrafeeds_register);
?>

To get this to work with the plugin, you need to add this to the very bottom of the plugin’s main.php file

// widget support
require(dirname(__FILE__) . '/widget.php');

Like I said, if anyone can improve on this, do let me know. Also note that you’ll need to point the URL in the widget to a feed icon. A lot of themes include them in their /images/ directory, which makes it easy.

By using the widget or template tag, you can have these appearing on the relevant pages.

Try it by using http://writetoreply.org/tags 🙂

If you’re interested in how to add category and tag auto-discovery feeds to your theme’s source code, try adding this to your header.php

<?php if (is_category()) { ?>
<link rel="alternate" type="application/atom+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> Atom Feed" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'atom'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/atom+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> Atom Feed" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'atom'); ?>" />
<?php } ?>

<?php if (is_category()) { ?>
<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> RSS2 Feed" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'rss2'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/rss+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> RSS2 Feed" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'rss2'); ?>" />
<?php } ?>

<?php if (is_category()) { ?>
<link rel="alternate" type="application/rdf+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_cat_title(''); ?> as RDF data" href="<?php echo
get_category_feed_link(get_query_var('cat'), 'rdf'); ?>" />
<?php } ?>

<?php if (is_tag()) { ?>
<link rel="alternate" type="application/rdf+xml" title="<?php bloginfo('name'); ?> &amp;raquo; <?php single_tag_title(''); ?> as RDF data" href="<?php echo
get_tag_feed_link(get_query_var('tag_id'), 'rdf'); ?>" />
<?php } ?>

I learned this from the author of this related plugin, which is similar but not quite as powerful as the Extra Feed Links plugin.

Finally, if you use FeedBurner, beware that it breaks some of the above feeds. One fix for ensuring that tag and category feeds continue to work as they should is to modify the FeedBurnerFeedSmith plugin as noted here. Simply change the line

is_feed() &amp;&amp; $feed != 'comments-rss2' &amp;&amp; !is_single() &amp;&amp;

to read

is_feed() &amp;&amp; $feed != 'comments-rss2' &amp;&amp; !is_single() &amp;&amp; !is_tag() &amp;&amp;

That’ll do for now. I intend to learn more about the RSS and Atom specifications over the next few weeks and will post anything I think relevant here. If you can add anything to this post, please do leave a comment. Thanks.

Open Calais + site-wide tags = semantic site architecture

Joss Winn / March 25, 2009August 30, 2009 / Data, Identity, Standards & Specs, Web

Preamble about people

Over the last month, we’ve I’ve started to grow an embryonic social web publishing platform that can be many things but fundamentally offers a personalised and collaborative environment for research, teaching and learning. (Where? You’re looking at it!). There are a few active blogs (currently fewer than on the pilot Learning Lab blogs), nearly 70 users and the word is starting to get out at a pace that I can manage. So, now it’s time to look to the future…

By running BuddyPress, the connections between people are pretty much taken care of. Sign in to http://dev.lincoln.ac.uk with a Lincoln username and password and you’ve joined a community that, as it grows, will increasingly and effortlessly connect people through the information they choose to add to their profile. Staff and students can click on a link and find other people who have similarly tagged their profile.

Notice the comma seprated hyper-linked data — Notice the comma-separated hyper-linked data

What is of equal interest to me, and potentially very useful to the university community, is how we link the content that is being generated by staff and students and make those links accessible. It is not difficult to appreciate what the potential is when you have a revolving community of 10,000 people who, over time, document their work, their research, teaching and learning using cutting edge web publishing tools, but I’m writing this post to try and understand and sketch out how I might evolve what I have begun.

Put simply, WordPress Multi-User (WPMU) allows one person (me) to provide and manage multiple web sites which other people (staff and students) take ownership of. Typically, every action, every new user and every new page and post on every site, is recorded and held in a shared database(s). Although at this low level, the data is relational, on the surface, when you look at one of the sites, they pretty much stand alone and so they should. We’re not talking about a single website with lots of users, we’re talking about lots of websites with lots of users. They might be working collaboratively with others, but they’re working as individuals or in distinct groups that benefit from a distinct online identity. BuddyPress helps bring things together by aggregating people’s actions (i.e. posting blog updates, making friends, joining groups, posting messages) but the visibility of those connections is transient. Social networks display our actions along a timeline and the connections between people are, for the most part, buried until the next time person A interacts with person Y.

Enough about connecting people.

Site-wide content aggregation

Site content is a mixture of text, multimedia and metadata. The last thing I’ll do when completing this blog post is to categorise and tag it. Each time I write, I publish text, (sometimes images) and metadata which summarises and categorises the full text. Why am I telling you this? You know it already. What you may not know is that each post created on our university WPMU installation, by any person, providing their blog is public, is aggregated into a single site and re-published a second time. So this post exists here on this site and there, on the Community Posts site. Notice how the Community Posts version links back to the original post. We’re not creating a whole new resource, we’re creating a powerful linked resource that allows others to search, filter, browse and discover content held across multiple sites. With only a few sites up and running here at the moment, the opportunity to discover varied content is limited, but over time that will change. Look at wordpress.com, where there are 5 million sites:

wordpress.com site wide tags — Browse by user-generated metadata

On the university blogs, this is made possible through the use of the site-wide-tags plugin, which was developed by @donncha, the same person that develops WPMU and the wordpress.com site. By using this plugin, a WPMU installation can share similar functionality to what you see on wordpress.com. I say ‘similar’ because, as I’ll mention later, designing how people discover content is key to all of this and something I, or we as a community, would benefit from thinking about and acting on collectively.

On the Community Posts site, you can search the full-text of every post, filter resources by category and tag, and subscribe to feeds from any combination of tag or category. Any search can be turned into a feed by appending ‘&feed=rss’ to the end of the resulting URL.

i.e. http://tags.dev.lincoln.ac.uk/?s=gaming&feed=rss

To create a feed from a tag or category, just click on a tag or category and append ‘/feed’ to the end of the URL.

i.e. http://tags.dev.lincoln.ac.uk/tag/games/

You can combine tags with ‘+’, too:

http://tags.dev.lincoln.ac.uk/tag/games+development/

You can also specify the type of feed you want by appending:

/feed/rss/
/feed/rss2/
/feed/rdf/
/feed/atom/

Mixing categories and tags is currently broken by a bug but is due to be fixed in the next version of WordPress.

So it’s not difficult to imagine, over time, an active community of thousands of university web publishers, having their content aggregated into a site-wide resource that allows full text searching, browsing and filtering with a choice of feeds to syndicate that content elsewhere. See how it’s happening at the University of Mary Washington, where over 2400 sites have been created in under three years.

Semantic technology

Yesterday, I discovered OpenCalais. It’s a semantic technology that’s been around since January 2008, so you might be tired of hearing about it, but if not, ‘Welcome to Web 3.0!’

The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

Nice. And it’s installed on this site. There are three Calais plugins available for WordPress. This one, allows writers to submit their blog posts to the OpenCalais web service API and fetch back a number of auto-generated tags based on the content of their post. The longer the post, the more tags are returned. Tags are returned in just seconds. Those tags can be added to the post in their entirety or used selectively (actually, you have to add them all and then remove those you don’t want to include – a minor irritation). This next plugin, allows you to automatically go through every post you’ve written and tags them using the Calais web service. It’s all or nothing, but following the auto-tagging of archive content, you can then go to the ‘tags’ menu and delete any tags you don’t want to use. I’ve done that to this site and to the Community Posts site. Calais looks for names, facts and events and the API allows for up to 40,000 transactions a day and up to four per second. It returns some predictable tags and a few odd ones, but on the whole is fast and works like magic.

The third plugin also allows blog authors to fetch tags for the post they are writing and, in addition, it also suggests Creative Commons licensed images based on a dynamic evaluation of the chosen or suggested tags.

Image suggestion is a nice idea, but tends to return some fairly generic images.

Having used OpenCalais to auto-tag the Community Posts site, a whole new and richer set of semantic metadata has been added with barely any effort. The challenge now is to figure out how to 1) automate this as a scheduled process, so that the Calais plugin looks for new content every hour, say, and tags whatever has been recently introduced (a cron job that calls the plugin and a modification to the plugin to look at the timestamp of the post and ignore anything older than when it was last run?); 2) present the semantic data in an accessible way and this mostly, I think, comes down to appropriate site design. The wordpress.com screenshots above show one way of doing it. A del.icio.us style approach is a more powerful and versatile model of tag filtering. Until then, it’s a matter of constructing filters, searches and feeds in the way I’ve outlined above.

So how might all of this semantically structured data be used? It seems to me that most of the advantages are proportional to the quantity of information available. For teaching and learning, it could be used by students and staff who want to find and re-use material that has been posted in the past for a specific course or subject area. Great for new students who want to measure the type and quality of work produced by students in previous years. In a similar way, it could be used by staff looking for posts by colleagues on subjects they might be teaching, and because searches and tags can be turned into feeds, past content could be aggregated into a new course site. A widely adopted, semantically tagged WPMU installation could also reveal trends in the type of work occurring at the university and, by tagging names of people, queries against references to Prof. X’s work could be made (I also wonder whether through the use of feeds, content from the institutional repository could be joined up with all of this, too – but it’s late in the day and I can’t think straight).

You’ll see from the image below that using Calais on the Community Posts site, resulted in a much richer variety of tags than would have appeared if we relied on user-generated tagging alone (136 posts now have 558 tags). Some people don’t even bother to tag their work… Shame on them! Notice too, that with the Firefox Operator plugin, you can take a tag on the site and use it to find related resources elsewhere. So if you’re looking at work tagged ‘client-applications’ on WPMU, you can conveniently hop over to delicious and find further web resources or, on a whim, look at what books on this subject are available on Amazon.

Operator provides a way to use tags on one site to discover related resources on another site — Use tags on one site to discover related resources on another site

Anyway, if you’re still reading, you might remember from the title of this post that my overriding interest in all of this is how it can be understood as and developed into a site-wide ‘architecture’. Again, I’m thinking how user-generated tags have determined the way delicious is designed for navigation and searching of resources. I need to learn more about how WordPress themes are constructed and consider how available functions can be best exploited and usefully presented on this type of site. If you have any ideas or want to work on a specific theme to get the most out of the site-wide-tags plugin, please do leave a comment or get in touch on Twitter @josswinn

OAuth, OpenID, XMPP with WordPress

Joss Winn / March 12, 2009August 30, 2009 / Fun, Standards & Specs, Web

Automattic, the company behind WordPress, released an update to Prologue, their theme for group discussion, today. I read about this, minutes after reading about the new OAuth features in WordPress 2.8 and an hour or so after reading about a new Facebook Connect plugin for BuddyPress, the social networking layer for WordPress. All this stimulation proved a bit too much for me, so this post is an attempt to plot what’s happening here and what might be possible in just a few months from now…

So, I have the BuddyPress Facebook Connect plugin working on a my test installation…

Nothing fancy going on there. Basically, new users to the site can register using their Facebook credentials. The plugin doesn’t do anything for existing users on the site. They just login with their local account as usual. For a first release, the plugin is a good proof of concept and with a bit more integration work will make it easy for Facebook users to join BuddyPress sites.

The new Prologue theme, P2, is impressive, too…

It takes advantage of the new threaded comments feature in WordPress 2.7+ , has ‘realtime’ notifications (unless I’ve missed something, the use of the term ‘realtime’ is a stretch – see below) and has some nice keyboard shortcuts…

One thing that’s lacking is a Twitter-like realtime notification that a new post has been made and you should refresh your bowser. Twitter doesn’t use it for the user home page, but they do on their search page and I like it.

Moving on, OAuth functionality for WordPress is still in development but the latest code from the SVN trunks of both the DiSo plugin and WordPress does appear to work…

Be warned that it does not run on a server where PHP runs as a CGI. I tried to run it first on Dreamhost, but it gave an error showing that getallheaders() is an undefined function.

I need to spend more time with the OAuth plugin to see how it will actually work in practice. One of the first use-cases for it is to allow client applications like the iPhone app, to be able to post remotely without sending a password using XML-RPC. If anyone has any ideas and wants to test it with me, please leave a comment. As I understand from the announcement, it’s working but it’s still early days… For more information, see Will Norris’ presentation from last August.

Finally, there’s mnw, a new plugin for WordPress that provides support for the OpenMicroBlogging specification. With this, users from other sites using the specification, such as identi.ca and other Laconica-based services, can subscribe to your blog/omb site and receive updates whenever you publish a new post or page. So this…

…ends up here…

mnw is still a bit rough around the edges but it was only released as V0.1 a month ago, so that’s to be expected. Note that mnw only seems to work on single WP installations (WPMU produces a familiar error message which I think is wp_nonce related) and does not work on WP 2.8 trunk. Also, identi.ca complained of my avatar image being the wrong size. In the example above, I’d removed my avatar from the mnw settings, but I’ve since found that a .png of 96px seems to work OK.

What does it mean for me and you?

So, what does all this mean? In terms of wordpress.com, we might speculate that before too long, they will add the BuddyPress layer to their 4.5m blogs to create a sizeable social network. The P2 theme shows posts in realtime, they’re already offering an XMPP firehose of blog posts and there are plugins that offer XMPP functionality for WordPress, so remote real-time updates aren’t far away and realtime remote publishing already exist using XML-RPC. With the P2 theme, anyone can create a Twitter-like site that any number of registered users can post to and anyone can comment on. Add OpenID authentication and OAuth authorisation and you’ve got a large, mature and open social (micro)blogging service.

For self-hosted WordPress users, it’s even closer to being a reality. I’ve had a site running today that accepts new user registrations via the DiSo OpenID plugin and those users can then post updates to the Prologue themed site and join a threaded group discussion. If I enabled XML-RPC posting, users could post in ‘realtime’ to the group site from their iPhone or other other client app. With OAuth support, this would be possible from desktop and mobile applications as well as other sites such as Flickr, without exchanging protected user data such as a password. Those updates could also be broadcast via XMPP in realtime, which I’ve done on another blog I was testing.

Things are a bit different for WordPressMU/BuddyPress installations. As you’ve seen above, I’ve got a BuddyPress site running that accepts users joining via Facebook connect. Functionality is limited to social networking and it still has some issues that need working on before it’s ready for every-day use (I’ve noted them on the BP forum). WPMU blogs (by which I mean blogs not the overall site) don’t allow new-user registrations so the blog adminstrator needs to sign up new users. Users registered via Facebook don’t have an email address associated with their account, so blog admins can’t add these types of users as the process requires a username and email address of a new or existing user.

However, by activating the right plugins, registered WPMU users (I’m thinking university staff and students) could participate in a group microblog using the P2 theme, LDAP and/or OpenID for login and XML-RPC and XMPP for remote publishing and receiving posts. It won’t be too long before you can send and receive WordPress posts via your GMail or Jabber account (on your iPhone/iPod) in realtime (hopefully with support for tagging), and all of that data is simply WordPress data and has RSS feeds hanging off every tag and wrapped around every post.

Just a thought.

Facebook to the repository via SWORD

Joss Winn / December 17, 2008April 9, 2009 / Open Access, Repositories, Standards & Specs, Web

A post to note that I have successfully deposited a document into our institutional repository from my Facebook account using the Facebook SWORD app, written by Stuart Lewis.

There’s a few things worth mentioning: It’s a 3.1.1 EPrints IR, hosted at our university and maintained by EPrints Services. EPrints has supported SWORD since v3.1. Originally, the FB app didn’t work for the following reasons:

The ‘Depositing on behalf of’ field has to be left empty. I was told by Seb at EPrints Services that this is ‘disabled by default’.
The repository URL needs to point at the ‘service document’, not the base URL of the IR. For us, that is http://eprints.lincoln.ac.uk/sword-app/servicedocument
We use LDAP for authentication and the IR configuration needed to be tweaked to account for this when depositing via SWORD.

Once we’d overcome these issues, my ‘test.txt’ doc was successfully deposited from my desktop to the University of Lincoln IR via Facebook:

…with a few caveats:

The app announced ‘Item Deposited!’ and gave a URL which resulted in a 404 dead link http://eprints.lincoln.ac.uk/sword-app/collections/1738/deposit. I don’t know why. I thought it was because I wasn’t logged in to the IR, but even after logging in, the link was dead.
The app (maybe it’s defined in the SWORD spec, I haven’t checked), zipped up my metadata and document, which resulted in depositing two items: My test.txt document and the original zip file were both showing in my item list. This could be because of the way our IR is configured to unpack zip folders. I don’t know.

The metadata mapping was partially successful. The referreed status didn’t map across at all and the URL reference I gave mapped to the ‘Identification number’ field in EPrints, rather than the ‘Related URLs’ field, which was what I was expecting. Maybe the SWORD app field could be renamed ‘Identification URL/DOI’ or similar? The title, abstract and my name were correctly mapped. It’s a shame that my email address wasn’t autocompleted as it would be if I were depositing through the normal EPrints workflow.

Despite these issues, it’s good to see this working in principle and I imagine that the above could be rectified quite easily. Perhaps someone can offer their solutions here?

As Stuart notes on his blog, the main value in this kind of app is the ability to broadcast to your Facebook friends that you’ve just deposited something in an IR. My main gripe, however, would be that it doesn’t make the deposit process any easier, which is what interests me about the SWORD protocol. Working this way…

I have to use two applications to make my document public, the benefit being that other people are told about what I’ve just done.
The EPrints URL that the app points to, even if it was working, points to a non-public space, so my friends don’t have a direct link to the document from within Facebook.
The metadata fields in the present version of the app, are not configurable which means that I have to add more metadata through the EPrints interface.
Finally, it does seem odd to upload a document from my desktop to Facebook only to send it to another application and finish off the process of deposit there. It would be more useful, if I could deposit files that I already hold in Facebook. I don’t use Facebook enough to really know if there are apps that allow you to create documents within Facebook, but if there were, then perhaps Facebook could be used as a (collaborative?) working space and the SWORD app used to deposit final versions to an IR.

Microformats and Firefox

Joss Winn / October 21, 2008April 9, 2009 / Fun, Standards & Specs, Web

When I have time, I like to read about new and developing web standards and specifications. Sad, you might think, but it’s a way of learning about some of the theoretical developments that eventually turn into practical functionality for all users of the Internet. Also, I am an Archivist (film, audiovisual, multimedia) by trade, and am somewhat reassured by the development of standards and specifications as a way of achieving consensus among peers and avoiding wasted time and effort in managing ‘stuff’.

So, while poking around on Wikipedia last night, I came across ‘Operator‘, an add-on for Firefox that makes part of the ‘hidden’ semantic web immediately visible and useful to everybody. If you’re using Firefox, click here to install it. It’s been available for over a year now and is mature and extensible through the use of user scripts. It’s been developed by Michael Kaply, who works on web browsers for IBM and is responsible for microformat support in Firefox.

Operator leverages microformats and other semantic data that are already available on many web pages to provide new ways to interact with web services.

In practice, Operator is a Firefox tool bar (and/or location/status bar icon) that identifies microformats and other semantic data in a web page and allows you to combine the value of that information with other web services such as search, bookmarking, mapping, etc. For example, this blog has tags. Operator identifies the tags and then offers the option of searching various services such as Amazon, YouTube, delicious and Upcoming, for a particular tag. If Operator finds geo-data, it offers the option of mapping that to Google Maps and, on this page for example, it identifies me as author and allows you to download my contact details, which are embedded in the XHTML. Because it is extensible through user-scripts, there are many other ways that the microformat data can be used.

Of particular interest to students and staff are perhaps the microformat specifications for resumes and contact details. Potentially, a website, properly marked up (and WordPress allows for some of this already), could provide a rich and useful portfolio of their work and experience which is semantically linked to other services such as Institutional Repositories or other publications databases where their work is held.

After using it for a few hours, I now find myself disappointed when a website doesn’t offer at least one piece of semantic data that is found by Operator (currently, most don’t but some do). Microformat support will be included (rather than an add-on) in Firefox 3.1 and IE 8, so we can expect to see much more widespread adoption of it. A good thing.

There’s a nice demonstration of microformats here, using the Operator plugin.