HUGUK HBase at Mendeley Presentation

Last Friday I gave a talk on how we use HBase at Mendeley, which goes into more detail on the work I’ve been doing over the last year. It’s a summary of how the datamining team started out using MySQL and how and why we moved to HBase for most of our data storage and processing. It also describes some of the work we do in the datamining team.

You can find more details of the talk and a video of the presentation here http://lanyrd.com/2010/huguk7/sxbt/

The afternoon generally went very well, I met many intresting people in London using Hadoop and HBase. There were also some people from California stopping over in London from a conference in Brussels. So it was great to meet people like Jonathon Gray who is one of the HBase committers working at Facebook and learning how they are using it in their new messaging service, Stack who is another HBase commiter from StumbleUpon, and Tom White who started the whole Hadoop world off. It’s a great community and an intresting time to be part of it.

For more from this event you can see all the slides here http://lanyrd.com/2010/huguk7/ and learn more about the UK Hadoop users group I’ll now be runnning http://huguk.org/

Science Hack Day, Mendeley YQL Tables

It’s been a long time since my last post, I’ve mostly been busy down in London working at Mendeley and it’s gone very quickly. This has recently lead me to the Science Hack Day last weekend where I met some great people, learnt many things and hack on some YQL tables for Mendeley’s new api for their research data.

YQL is quite a cool piece of technology which basically lets you treat the web as a big SQL table, so you can effectivly join twitter to a google search, then grab some usage statistics from the Mendeley api for example.

I’ve put the Mendeley YQL tables up online here to play with in the YQL console here. With this you can do queries like :-

SELECT * FROM mendeley.search WHERE query = “information retrieval”

And then start joining our different api calls together like this :-

SELECT title, year, stats FROM mendeley.details WHERE id IN (select id from mendeley.tags(50) where tag = “genetics”) | sort (field=”year”, descending=”true”)

I’m trying to get them in the community tables in github soon so that they can be more easily used by everyone, and I don’t have to keep them up on dropbox (even how amazing that is for rapid prototyping of YQL tables).

Aego 2 Speaker Innards and Repair

About 6 years ago I brought a pair of Aego 2 speakers from Acoustic Energy, who make some very good speakers, which were the best for the price at the time and still are today. After awhile the left channel started to stop working intermittently and I found that tweaking a jack in the front input worryingly fixed it. I found that eventually this didn’t even fix it reliably anymore so I decided to take it apart and try to fix it…

IMG_4298 IMG_4296

Whilst taking it apart I found the speakers were very well built and surprisingly simple inside, apart from the circuits.

IMG_4303 IMG_4290

I found that three of the solders for the front jack input on the front panel had broken! This was probably from the many times it had been moved around the country as I’ve lived in various places. The broken solders would have caused this problem because the input from the back goes though the front input so it can be switched off when a jack is inserted in the front, but with the broken solders it was cutting out more than intended. I hoped that just re-soldering them would fix the problem I was having.

IMG_4313

Once I’d re-heated the solder on each of the points and added a bit more solder they all joined properly again and the problem I was having before had gone. So after putting it all back together I’ve got a great pair of speakers working fully again.

M.Sc. thesis and job hunting

Well it’s been a long time since I last posted! about 12 weeks which has mostly been working on my M.Sc. thesis and finding a job for when I finish in Edinburgh at the end of August. Both are done now so I’ve got a bit more free time again (which will be mostly enjoying the Edinburgh festivals for a week!).

The thesis changed slightly from influenza tracking to trying to forecast the belief of the population about the recent swine flu outbreak. This ended up looking at ways of extracting and aggregating information from Twitter and blog posts then trying to forecast the value of a prediction market. Prediction markets along with text mining are both very interesting so I’ve learned a lot and enjoyed the whole project. I’m going to post about the work and what I found from it in a post soon, summarising the interesting bits from 70 pages!

On the job hunting front I have a job at Mendeley down in London starting mid September which I’ll hopefully blog about when I start there. The work they do is very interesting trying to bring science up to speed on the web, they describe themselves as “Last.fm for research” which kinda gives the scope of their goals. So I’m leaving academia finally but not quite, as the work I am doing will be very much helping many people doing research around the world.

Infulenza tracking project

Not too long till now until I start working on the project to try and track influenza through blog posts. I’ve updated the project page to include a link to my proposal for a few more details, I’ll hopefully update that properly once my exams are over.

The recent swine flu will also bring some interesting points to the project, like that fact I’ve just searched for the spelling now so how does Google know that? There has been a bigger buzz in the media, on blogs, and sites like twitter than relates to the actual spread of the disease which has been very quite slow and not that frequent, compared to the size of populations. Maybe a model of media news sources will also be needed to find what proportion of the noise on the web is really a flu signal.

I also found a site called DIYcity which started a project called SickCity a few months ago which they accelerated work on with the outbreak of swine flu. With this they try to track the trends of a range of illnesses over cites in the world, which as they found is a very hard task to do! I’m going to see if I can help them with their goals a bit as there aims overlap quite a bit with my summer project, and it would be nice to have some use-able data for people at the end of it. We’ll see where things go once I start working in June, stay tuned.


About Me

I'm a student at Edinburgh University studying Artificial Intelligence. Find out more about me and my projects on my website