Skip to content

Category Archives: machine learning

Ascension to Doctorhood

research complete

research complete, originally uploaded by Mister Wind-Up Bird.

Well, okay, I’m not Dr Brochu just yet. There’s a list of minor revisions and some paperwork ahead of me. But on Friday, November 26, 2010, I successfully defended my thesis and passed my doctoral exam, so the days of gut-churning anxiety are finally over.

It’s disturbing how similar this feeling is to what I felt when I finally beat Nethack. (And I thank former fellow Kommunist and Nethacker Eddy for the title of this post.)

unreasonable, effective

I don’t want to bore you with the technical, academic stuff I’ve been wresting with lately, but there is one paper that is probably worth checking out even if you’re not a Machine Learning person. On the Unreasonable Effectiveness of Data is noteworthy because (i) it’s geared to a (science-literate but) general audience; (ii) it’s provocative; and (iii) one of the authors is Peter Norvig, Google Director of Research and one of the most prominent people in AI today.

The most interesting insight to me is that the authors come down against the kind of elegant, engineer-driven (parametric) models that are widely associated with AI, and embrace simple, data-driven (nonparametric) models. The difference is, in machine translation, say, the difference between designing a system that “understands” the grammar and semantics of the two languages and translates one to another trying to preserve it, and one that looks up words and phrases in an enormous table (which kind of reminds me of the Chinese Room thought experiment, though the point is somewhat different). It’s not exactly a new argument, but it’s great to see it so strongly and clearly expressed, and to hear how it arose from Google’s experience.

Worio Public Beta is out!


At long last, I can reveal what I’ve been working on for the past year: the public beta version of!

Worio combines search and recommendation with the philosophy that user effort should be kept to a minimum. You can simply use your favourite existing search engine, or Worio’s own, and receive recommendations based on the context of your search. But if you start to save, share and tag pages, Worio learns about your preferences and recommends pages it predicts you, personally, will be interested in.

Last week we had a combined beta release/demo at the 2008 NIPS conference at the Vancouver Hyatt, complete with Worio t-shirts foisted on everyone in range. The entire Worio Machine Learning team was there, and we were pleased with the response. NIPS is one of the most important Machine Learning conferences, and I know from personal experience that people are not shy about telling you if they don’t like your work.

Even if you’ve tried Worio in its previous incarnations, you might want to check it out again. It’s still in beta, but we’ve come a long way in just a few months. Suggestions and feature requests are most welcome.

headed to SIGGRAPH

Siggraph-SnipAs previously promised, I am headed to SIGGRAPH! I will be presenting a poster of my research with Abhi and Nando on Monday and Wednesday (booth I03): Preference Galleries for Material Design.

This is the first time I’ve publicly presented this particular research, so it will be interesting to hear what people think of it. For those of you who might have gotten here looking for information about my work, I can offer you a selection of the following fine links today:

update (aug 7): Our poster has advanced to the final round of the Student Research Competition! Pretty cool, but very surprising (to me, at least), considering it’s a bit of an odd fit for SIGGRAPH in the first place.

update (aug 22): I only just found out that we actually won first place! Awesome.

the interweb, it is mostly not so good

StupidinterwebFor the past several weeks, I’ve been setting up Machine Learning algorithms that allow machines to learn things about web pages. This is all very well and good — the algorithms work. Now it’s training time. Training means that you have to not only give the computers the data (the web pages), but for each datum, tell it what you want it to learn — “this is a good page”, “this is a bad page”, “this is a page about Buffy the Vampire Slayer“, whatever. And machines are so damned literal. Give the machine a set of Buffy wikipedia articles and science blogs, and it will most likely learn how to distinguish blogs from wikipedia articles (because it’s simpler), and not physics from Buffy.

My work with Active Learning has been helpful, since it allows a computer to find pages which would be particularly useful to get training information on, but ultimately, it comes down to looking at hundreds and hundreds of random web pages and telling a computer what to learn about each one, and not making too many mistakes.

This is my job right now, 8 or 9 or 10 hours a day staring at random web pages, and it’s making me a little batty. After the first few hours of boredom, patterns emerge, the programmer hyper-focus kicks in and it starts to become kind of fascinating. Predictably, there are tons of blogs bashing Bush and/or Microsoft (and virtually none supporting either, though Hillary and Apple both get to feel the hate every once in a while). The expected number of teenagers who think they are vampires. Unexpectedly many message boards about Marxism and people’s problems with their cars and/or significant others. Surprisingly little porn — it’s out there, but it all seems to be confined to its own little ghetto.

Yes, my job involves looking for porn on the web, and no, it’s not nearly as awesome as you think it is.