Screen-scraping HTML is evil. APIs are better, and can return structured XML, JSON or even SOAP (everyone laughs at the latter!)
If page is marked up semantically using microformats, an XSL stylesheet can convert it into RDF. Put the stylesheet on a profile page and link to it from the HTML (<head profile="...">
), and triplr can generate triples from it in various formats. This can then be parsed using SPARQL.
The end result is custom microformats, possibly specific to your site, or something like class="nsfw"
to mark up stuff that’s not safe for work, which Tom uses on his blog.
GRDDL allows
[tags]semanticweb, grddl, barcamplondon3[/tags]
2 replies on “[BarcampLondon3] Tom Morris on GRDDL”
[…] By Kerry As you may have guessed from my other postings over the past few days, I spent the weekend at BarcampLondon3, at Google’s […]
wrcti irqnaly rqbendg ysztoah hsdyigkx yphw ophkuc