Sunday, 22 January 2012

Get Real Data from the Semantic Web - Finding Resources

In my last article, I briefly explained how to get data from a resource using python and SPARQL. This article explains how to find the resource in the first place.
Have you ever been taught how to knit? I you have, then you'll know that you are not usually taught how to cast on (or start off) on your first lesson. That's because it much easier to learn how to knit than it is to cast on.

So it is with the Semantic Web. Once you have a resource URL, it's reasonably easy to extract information linked to that resource, but finding the starting resource is a bit trickier.
So let's just recap how we might get the abstract description for London from DBpedia.

If we know the URL then that's pretty straight forward:
#!/usr/bin/env python
import sys
from sparql import DBpediaEndpoint
def main ():
s = DBpediaEndpoint( {
"resource": "http://dbpedia.org/resource/",
"yago": "http://dbpedia.org/class/yago/"
} )
query = """
SELECT ?abstract WHERE {
resource:London dbpedia-owl:abstract ?abstract .
FILTER(langMatches(lang(?abstract), "EN"))
}
"""
results = s.query(query)
abstract = results[0]["abstract"]["value"]
print abstract
if __name__ == '__main__':
try:
main()
sys.exit(0)
except KeyboardInterrupt, e: # Ctrl-C
raise e
view raw main1.py hosted with ❤ by GitHub
(If you want to follow this tutorial, then you had better copy the sparql.py file from there.)


RDF types for the DBpedia entry for London
If you don't however, then you'll have to search for it. According to the dbpedia entry, London is many things, including a owl:Thing, there are a lot of Things out there, probably enough to make even the DBpdia  endpoint time out, so let's choose something more restrictive such as yago:Locations but not too restrictive, for example yago:BritishCapitals.

#!/usr/bin/env python
import sys
from sparql import DBpediaEndpoint
def main ():
s = DBpediaEndpoint( {
"resource": "http://dbpedia.org/resource/",
"yago": "http://dbpedia.org/class/yago/"
} )
query = """
SELECT ?url WHERE {
?subject rdf:type yago:Locations .
?subject foaf:page ?url .
?subject foaf:name ?name .
FILTER regex(?name, "London") .
} LIMIT 1
"""
results = s.query(query)
url = results[0]["url"]["value"]
print url
if __name__ == '__main__':
try:
main()
sys.exit(0)
except KeyboardInterrupt, e: # Ctrl-C
raise e
view raw main2.py hosted with ❤ by GitHub

Just to be a smart ass as I finish off, you can get both at the same time by doing this, but don't forget that doing this will stress the SPARQL endpoint more than is probably necessary. Be kind.
#!/usr/bin/env python
import sys
from sparql import DBpediaEndpoint
def main ():
s = DBpediaEndpoint( {
"resource": "http://dbpedia.org/resource/",
"yago": "http://dbpedia.org/class/yago/"
} )
query = """
SELECT * WHERE {
?subject rdf:type yago:Locations .
?subject dbpedia-owl:abstract ?abstract .
?subject foaf:page ?url .
?subject foaf:name ?name .
FILTER regex(?name, "London") .
FILTER(langMatches(lang(?abstract), "EN")) .
} LIMIT 1
"""
results = s.query(query)
url = results[0]["url"]["value"]
print url
abstract = results[0]["abstract"]["value"]
print abstract
if __name__ == '__main__':
try:
main()
sys.exit(0)
except KeyboardInterrupt, e: # Ctrl-C
raise e
view raw main3.py hosted with ❤ by GitHub

Thursday, 19 January 2012

Get Real Data from the Semantic Web

Semantic Web this, Semantic Web that, what actual use is the Semantic Web in the real world? I mean how can you actually use it?

If you haven't heard the term "Semantic Web" over the last couple of years then you must have been in... well somewhere without this interweb they're all talking about.

Basically, by using metadata (see RDF), disparate bits of data floating around the web can be joined up. In otherwords they stop being disparate. Better than that, theoretically you can query the connections between the data and get lots of lovely information back. This last bit is done via SPARQL, and yes, the QL does stand for Query Language.

I say theoretically because in reality it's a bit of a pain. I may be an intelligent agent capable of finding linked bits of data through the web, but how exactly would you do that in python.

It is possible to use rdflib to find information, but it's very long winded. It's much easier to use SPARQLWrapper andin fact in the simple example below, I've used a SPARQLWrapperWrapper to make asking for lots of similarly sourced data, in this case DBPedia, even easier.

from SPARQLWrapper import SPARQLWrapper, JSON
class SparqlEndpoint(object):
def __init__(self, endpoint, prefixes={}):
self.sparql = SPARQLWrapper(endpoint)
self.prefixes = {
"dbpedia-owl": "http://dbpedia.org/ontology/",
"owl": "http://www.w3.org/2002/07/owl#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"foaf": "http://xmlns.com/foaf/0.1/",
"dc": "http://purl.org/dc/elements/1.1/",
"dbpedia2": "http://dbpedia.org/property/",
"dbpedia": "http://dbpedia.org/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"foaf": "http://xmlns.com/foaf/0.1/",
}
self.prefixes.update(prefixes)
self.sparql.setReturnFormat(JSON)
def query(self, q):
lines = ["PREFIX %s: <%s>" % (k, r) for k, r in self.prefixes.iteritems()]
lines.extend(q.split("\n"))
query = "\n".join(lines)
print query
self.sparql.setQuery(query)
results = self.sparql.query().convert()
return results["results"]["bindings"]
class DBpediaEndpoint(SparqlEndpoint):
def __init__(self, prefixes = {}):
endpoint = "http://dbpedia.org/sparql"
super(DBpediaEndpoint, self).__init__(endpoint, prefixes)
view raw sparql.py hosted with ❤ by GitHub

To use this try importing the DBpediaEndpoint and feeding it some SPARQL:

#!/usr/bin/env python
import sys
from sparql import DBpediaEndpoint
def main ():
s = DBpediaEndpoint()
resource_uri = "http://dbpedia.org/resource/Foobar"
results = s.query("""
SELECT ?o
WHERE { <%s> dbpedia-owl:abstract ?o .
FILTER(langMatches(lang(?o), "EN")) }
""" % resource_uri)
abstract = results[0]["o"]["value"]
print abstract
if __name__ == '__main__':
try:
main()
sys.exit(0)
except KeyboardInterrupt, e: # Ctrl-C
raise e
view raw main.py hosted with ❤ by GitHub

Your homework is - How do you identify the resource_uri in the first place?

That's for another evening.

Tuesday, 17 January 2012

Github: Who needs it?

Do you ever think that you just don't want all your code on Github? I mean it's only a quick hack right?

Truth is, once you start using git you probably use it automatically for all your code, but you don't always want all your code floating around the net. What about those hard-coded email addresses and API tokens, or those references to your private net servers?

The answer is probably so simple that you have just overlooked it. You don't need to set up a local git server or hire one from Amazon. All you need to do is use DropBox or Ubuntu One as your remote origin repository.

Here's how, using Ubuntu One on Ubuntu:

Write a short shell script something like this and save it on your path as repo.sh.

#!/usr/bin/env bash
#set -x # debugging
if [ -z "$1" ]
then
echo "please supply a repo name!"
exit
fi
# set the new repository name
proj=$1
proj_dir=~/Projects
repo_dir=~/git_on_ubuntu_one
repo=${proj}.git
# ln -s Ubuntu\ One/git/ git_on_ubuntu_one
# create origin
cd ${repo_dir}
mkdir ${repo}
cd ${repo}
git init --bare
# create local repo
cd ${proj_dir}
mkdir ${proj}
cd ${proj}/
git init
touch README
# do your first push back to master
git add .
git commit -am "repository ${proj} pushed to origin"
git remote add origin ${repo_dir}/${repo}
git push origin master
view raw repo.sh hosted with ❤ by GitHub

Now when you want to create a new repository all you have to do is:

view raw link.sh hosted with ❤ by GitHub

If you use Python and virtualenv you may be interested in the slightly extended script at http://pythonic-apis.blogspot.com/2012/01/using-ubuntu-one-as-git-repository.html.