2008年10月21日火曜日

Twitter, SearchMonkey, and Caching (Yahoo! Developer Network Blog)

Twitter, SearchMonkey, and Caching (Yahoo! Developer Network Blog)

Intrepid coder Bart Teeuwisse has written up an excellent technical account of creating "Tweet", a beautifully designed SearchMonkey app for Twitter. From a performance standpoint, writing a Twitter SearchMonkey app is particularly challenging, as Bart explains:

It turns out that execution speed of a SearchMonkey is key. To make the SearchMonkey Gallery a presentation monkey such as Tweet has to complete within a fraction of a second. Any call to fetch 3rd party takes too long to satisfy this requirement. Certainly calling Twitter's API whose fluctuating response times are all over the map.

Secondly, Twitter's profile API call takes a user ID, which first has to be extracted from Yahoo!'s indexed data. An additional data SearchMonkey can do that and whose output is the input to Tweet's profile feching data monkey. However, this chaining of data monkeys makes Tweet only slower.

Fortunately, Bart hit on a really clever solution: a mashup with Google App Engine, which acts as a simple proxy cache for Twitter data, which SearchMonkey can then consume. The result (after also adding Bart's own FriendNet infobar app):

Example Twitter application from Bart Teeuwisse; shows profile picture, tweets, followers, and more.

Not only is the caching a nifty way to smooth out the API response times, but it also helps reduce the number of (rate-limited) API calls required. Read more about it at Bart's place.

Yahoo! 360?? - Dawn Patrol - Tweet, a Yahoo! SearchMonkey application to enhance Twitter user profiles

Preamble

Tweet is a plugin for Yahoo! Search. Such plugins are called SearchMonkeys in honor of Greasemonkey for FireFox browser. Like Greasemonkey, SearchMonkey allows developers to enhance the experience, the search experience in this case. SearchMonkeys can enhance presentation with images and additional links or by combining Yahoo!'s Search index with other structured data.

Yahoo! Search users can add SearchMonkey applications to their profile on an opt-in basis. Add Tweet to yours if you like to get much improved search results for Twitter user profiles.

Current Twitter search results

While Twitter user profiles are being indexed by all major Search engines, their summary is extremely poor. Google and Yahoo's results are nearly identical. Yahoo!'s summary of my profile (below) doesn't even include my full name (Bart Teeuwisse), which is on the page.

Standard Yahoo! Search summary of my Twitter user profile

SearchMonkey to the rescue1

Luckily with SearchMonkey you can replace standard summaries with enhanced summaries. To improve Yahoo!'s Twitter user profile search results I wrote a SearchMonkey application called tweet that is triggered for all URLs matching *.twitter.com/*. Tweet calls Twitter's API to fetch user profile information not in the Yahoo! Search index. The result is a rich overview of a Twitter user, including last message (aka. tweet).

Enhanced summary of my Twitter user profile by Tweet

Tweet's challenges

Sounds simple doesn't it? Contact Twitter's API, get profile, present profile. The SearchMonkey's architecture splits this into 2 monkeys:

  • A data monkey to contact Twitter's API to get the profile, and
  • A presentation monkey (Tweet) to present said profile.

Well, not quite.

Speed, speed, speed

It turns out that execution speed of a SearchMonkey is key. To make the SearchMonkey Gallery a presentation monkey such as Tweet has to complete within a fraction of a second. Any call to fetch 3rd party takes too long to satisfy this requirement. Certainly calling Twitter's API whose fluctuating response times are all over the map.

Secondly, Twitter's profile API call takes a user ID, which first has to be extracted from Yahoo!'s indexed data. An additional data SearchMonkey can do that and whose output is the input to Tweet's profile feching data monkey. However, this chaining of data monkeys makes Tweet only slower.

Thirdly -as I mentioned earlier- Twitter's API has wildly varying response times. And is by no means predictable enough to guarantee a prompt response. Furthermore Twitter is having scaling issues already. Adding a SearchMonkey that calls Twitter's API for up to 10 search results for each query could make things should Tweet gain many opt-in users.

Perhaps caching can help? The SearchMonkey platform does has some caching. Unfortunately SearchMonkey developers have no control over SearchMonkey's cache. Emperical data suggests that SearchMonkeys are cached for only a few minutes. Tweet could be cached much longer without sacrificing functionality.

SearchMonkey + AppEngine = Fast Data Monkey

To mitigate these challenges I decided to use a proxy of my own in between SearchMonkey & Twitter.

  • A proxy that could scale, should Tweet become popular.
  • A proxy where I could control my own cache.
  • And a proxy that eliminates the need for an additional data Monkey to extract the Twitter user ID from the search result.
  • And lastly a proxy that returns DataRSS to minimize (XSLT) transformations in SearchMonkey.

Why not Yahoo! Pipes?

I first turned to Yahoo! Pipes, but Pipes doesn't give me caching control and the only XML output format is RSS not DataRSS. So I turned to Google's App Engine instead, which satisfies all my requirements. It offers Memcache caching, is build to scale, allows me to extract the Twitter user ID, make the Twitter API call and transform its response to DataRSS.

Developing for Google App Engine

Even though this is my 1st Python application worth mentioning, I didn't have too much trouble writing it. App Engine's documentation combined with Python's tutorials were sufficient to answer my questions. The biggest obstacle I encountered is the lack of good XML/XLT libraries for Python. There isn't a clear winner to begin with and App Engine's restriction to pure Python libraries eliminates all candidates, as I learned the hard way.

I really like the Googel App Engine SDK. No hassle configuring a web server or data base. No need to be online even. I developped about half the proxy while vanpooling to and from work!

How the proxy works

My proxy takes the URL of the search result as input from SearchMonkey. Given the trigger URL pattern these are all URLs to *.twitter.com. E.g. twitter.com, explore.twitter.com or m.twitter.com. The proxy first extracts the Twitter user ID, if any. In Twitter's URL schema, user IDs are the 1st part of the URLs path. E.g. bartt in twitter.com/bartt or twitter.com/bartt/friends

It then checks the Memcache for a profile for this ID. If it has one it composes the DataRSS response and exits. If it doesn't it calls Twitter's API. Succesfull API calls are parsed and stored in Memcache for -currently- 2 hours, before composing a DataRSS response. Failed calls return an empty DatRSS response.

Fast enough?

My proxy speeds up cached profiles by a factor 3x to 10x. Most of the time, that is. Despite App Engine's claim to scale, it does have performance issues from time to time. App Engine had an outage for a day while I tested my proxy for example.

Odly enough, Twitter's API holds the record of the fastest response time, yet its average is many times App Engine's average response time (for cached profiles). App Engine's response time is very stable - about 200 milliseconds round trip from a west coast data center.

This doesn't make Tweet fast enough to be included into the SearchMonkey Gallery though. Not only is the proxy not fast enough, to that you'll have to add the XSLT process and 'render' times by SearchMonkey. Still, Tweet is now eminently more usable and shields Twitter from API overload.

Tweet & FriendNet combined

Combine Tweet with FriendNet, one of my other monkeys for an even richer search result. FrienNet displays profiles and contacts embedded in the page. It combines hCard profiles with XFN links embedded on the page to present a social graph.

Enhanced summary of my Twitter user profile by Tweet with FriendNet below

In collapsed mode -the default- FriendNet shows the number of profiles, cards and contacts found on the page by Yahoo! Search.

Enhanced summary of my Twitter user profile by Tweet & social network by FriendNet

Expanded, FriendNet shows details of Twitter friends.

Create your own SearchMonkeys

Got your own ideas for improving Yahoo! Search? Start monkeying around! You find everything you need at SearchMonkey on the Yahoo Developer Network.

Check out the SearchMonkey Gallery for more monkeys you can use. Or take my Better Amazon monkey for a spin.

0 件のコメント: