When I first read on Niall Kennedy’s blog that Yahoo had released their Term Extraction API, I immediately thought, “Cool, if I can rig up a method of inserting the extracted terms into my blog posts, I can get all of the benfits of Tags, without the added time and hassle of creating them all by hand.”
The first step for this project was looking at the Yahoo API, and more specifically, the examples of the terms that it extracts. And that’s where the experiment ended.
The problem is, of course, that the Term Extractor is just that, not a Meaningful Term Extractor. While I strongly believe that corpus analytics holds amazing promise (sorry, couldn’t resist the tease), Yahoo’s freebie offering is far too crude a tool to move straight to automated tag creation.
I just read (via Technorati’s David Sifry) that Jonas Luster has built the solution I only imagined (kudos Jonas). While the results bear out my assumptions, they are nonetheless a valuable starting point.
If we assume that “meaningful” is hard to automate near-term (it is), need we give up? Not necessarily. Rather than late binding the extracted terms to the post (computationally expensive, as Jonas notes, and with mixed results, as above)… what if we could invoke Jonas’s service before publishing a post (or even more interestingly, dynamically, as we’re typing)?
- That would allow me (the publisher and editor) to winnow the relevant terms from the automated result set, including them as tags in the post;
- What if the winnowed terms were then automatically passed to a process that would return all of my blog posts (likely as hyperlinked headlines) that use those terms? Well, with that, I could again winnow the result set so that I could in-line a “buzzhit!’s related articles” offering (or, draw examples, previous ideas and writing, etc from the returned articles to strengthen the post under creation);
- Similarly, through another service, I could receive and in-line” articles from around the web” (and/or, “from my reading list” [OPML file], and/or “from my social network [XFN et al]“
Lots of potential here (and more broadly for mashups resulting in automated meta-data creation). On the latter topic, I’ve got a few ideas that I’d love to “reduce to practice” if any of my more savvy dev buddies have a few spare cycles. ;-p Drop me a note…
Agreed. Remember that this post is focused on building something of use off of what Jonas has built out in the open (i.e., prototyping or as is trendy, ‘hacking’) with publicly available APIs. To your point, there are definitely smarter ways of satisfying the ‘needs’ that I’m expressing.
Given that, and the stated assumption that people need to be involved in the process to get the best results, the basic thinking is that, well, “people are lazy” and “human memory is lossy”. I’m looking for a suite of services that would automate the process of assisting me as an author/editor by recommending relevant pieces of content and meta-data during the blog post creation process. Specifically, I could use help with tag recommendations/mgmt, and surfacing what I and others have written about this topic/entity in the past. (Am I the only one who hates manually invoking these activities in a bunch of extra windows?!)
But hell, I’m still waiting for basic stuff, like a NOFOLLOW checkbox in the hyperlink dlog.