An Amazon Web Services (AWS) 4.0 application in just a few lines

I have somewhat of a debate with my friend Yuhong about the correct way to use a Web Service. Yuhong seems to prefer SOAP. I much prefer REST. What is a REST Web Service? For the most part, a REST Web Service is really, really simple. You simply follow a URL and magically, you get back an XML file which is the answer to your query. One benefit of REST is that you can easily debug it and write quick scripts for it. What is a SOAP Web Service? I don’t know. I really don’t get it. It seems to be a very complicated way to do the same thing: instead of sending the query as a URL, you send the query as a XML file. The bad thing about it is that if it breaks, then you have no immediate way to debug it: you can’t issue a SOAP request using your browser (or maybe you can, but I just don’t know how).

Now, things never break, do they? Well, that is the problem, they break often because either I’m being stupid or I don’t know what I’m doing or the people on the other side don’t know what they are doing or the people on the other side are experimenting a bit or whatever else. I find that being able to quickly debug my code is the primary feature any IT technology should have. The last thing I want from a technology is for it to be hard to debug.

Here is the problem that I solved this week-end. I have this list of artists and I want to get a list of all corresponding music albums so I can put it all into a relational (SQL) database. Assuming that your list of artists are in a file called artists_big.txt and that you want the result to be in a file called amazonresults.sql, the following does a nice job thanks to the magic of Amazon Web Services:

Yes: the code goes over because I cannot allow HTML to wrap lines (Python allows wrapping lines, but not arbitrarily so, white space is significant in Python). There is just no way around it that I know: suggestions with sample HTML code are invited.

import libxml2, urllib2, urllib, sys, re, traceback
ID=""# please enter your own ID here
uri="http://webservices.amazon.com/AWSECommerceService/2004-10-19"
url="http://webservices.amazon.com/onca/xml?Service=AWSECommerceService&SubscriptionId=%s&Operation=ItemSearch&SearchIndex=Music&Artist=%s&ItemPage=%i&ResponseGroup=Request,ItemIds,SalesRank,ItemAttributes,Reviews"
outputcontent = ["ASIN","Artist","Title","Amount","NumberOfTracks","SalesRank","AverageRating","ReleaseDate"]
input = open("artists_big.txt")
output = open("amazonresults.sql", "w")
log = open("amazonlog.txt", "w")
output.write("DROP TABLE music;nCREATE TABLE music (ASIN TEXT, Artist TEXT, Title TEXT, Amount INT, NumberOfTracks INT, SalesRank INT, AverageRating NUMERIC, ReleaseDate DATE);n")
def getNodeContentByName(node, name):
for i in node:
if (i.name==name): return i.content
return None
for artist in input:#go through all artists
print "Recovering albums for artist : ", artist
page = 1
while(True):# recover all pages
resturl = url %(ID,urllib.quote(artist),page)
log.write("Issuing REST request: "+resturl+"n")
try :
data = urllib2.urlopen(resturl).read()
except urllib2.HTTPError,e:
log.write("n")
log.write(str(traceback.format_exception(*sys.exc_info())))
log.write("n")
log.write("could not retrieve :n"+resturl+"n")
continue
try :
doc = libxml2.parseDoc(data)
except libxml2.parserError,e:
log.write("n")
log.write(str(traceback.format_exception(*sys.exc_info())))
log.write("n")
log.write("could not parse (is valid XML?):n"+data+"n")
continue
ctxt=doc.xpathNewContext()
ctxt.xpathRegisterNs("aws",uri)
isvalid = (ctxt.xpathEval("//aws:Items/aws:Request/aws:IsValid")[0].content == "True")
if not isvalid :
log.write("The query %s failed " % (resturl))
errors = ctxt.xpathEval("//aws:Error/aws:Message")
for message in errors: log.write(message.content+"n")
continue
for itemnode in ctxt.xpathEval("//aws:Items/aws:Item"):
attr = {}
for nodename in outputcontent:
content = getNodeContentByName(itemnode,nodename)
if(content <> None):
content = re.sub("'","'",content)
if(nodename == "SalesRank"):
content = re.sub(",","",content)
attr[nodename] = content
columns = "("
keys = attr.keys()
for i in range(len(keys)-1):
columns += keys[i]+","
columns+=keys[len(keys)-1]+")"
row = "("
values = attr.values()
for i in range(len(values)-1):
row+="'"+str(values[i])+"',"
row+="'"+str(values[len(values)-1])+"')"
command = "INSERT INTO music "+columns+" VALUES "+row+";n"
output.write(command)
NumberOfPages = int(ctxt.xpathEval("//aws:Items/aws:TotalPages")[0].content)
if(page >= NumberOfPages): break
page += 1
input.close()
output.close()
log.close()
print "You should now be able to run the file in postgresql. Start the postgres client doing psql, and using i amazonresults.sql in the postgresql shell."

Update : this was updated to take into account these comments from Amazon following the upgrade of AWS 4.0 from beta to release:

Service Name change: You will need to modify the Service Name parameter in your application from AWSProductData to AWSECommerceService. We realize that it may take some time to implement this change in your applications. In order to make this transition as easy as possible, we will continue supporting AWSProductData for a short time.

2) REST/WSDL Endpoints: You will need to modify your application to connect to webservices.amazon.com instead of aws-beta.amazon.com. For other locales, the new endpoints are webservices.amazon.co.uk, webservices.amazon.de and webservices.amazon.co.jp.

Published by

Daniel Lemire

A computer science professor at the Université du Québec (TELUQ).

7 thoughts on “An Amazon Web Services (AWS) 4.0 application in just a few lines”

  1. I hope my students won’t see this code because that is for their midterm :). I did not look into REST, but I know it is simipler. Then why SOAP? Here is I copied from Internet

    You might be a Resource guy if you actually use HTTP PUT
    You might be a Get guy if you use URLs to request parameterized actions
    You might be a Message guy if you actually use XML attributes
    You might be a Procedure guy if you feel you must encode XML in order to pass it as a parameter

    REST is for the Message guy, who looks the XML for the values. SOAP is for the procedure guy who sees invoking a web service as a remote procedure call. REST is like query a URL, the results are xml in http messages. SOAP is like a RPC, the parameters are encoding into xml which is embedded in http messages

  2. If a smart student makes it to this page and uses the code to help him, then good for them. That’s how you solve real world problems. However, because you’ll expect Java, they’ll still have to do a tremendous amount of work to port this code to Java. And they can’t do the port without understanding deeply the code. So, there is nothing to worry about.

  3. The edit box where I am typing in is better in Mozilla. In Explorer, I hit the bug. The edit box is like this before you type in. After, it expands to full length, behind the right bar.

    It is not the only problem. See the codes in the text box, they are too long, out of the box. It is the same on explorer or Mozillat.

  4. Internet Explorer is not standard compliant, so I don’t care about it.

    As for the codes coming out of the box, this is unavoidable: I use a preformatted section. I’ll see if I can force the box to have the right width, but that’s hard.

  5. Here you go, I deleted the box around the code, so now it won’t go over. As far as I know, this is the best one can do. HTML doesn’t provide enough for tight formatting as you know.

  6. Pingback: N=1: Population of One

Comments are closed.