Skip to content

Monthly Archives: December 2006

and for my next trick, er… trip

I’ve done so much research on my trip, in fact, that I already know the route of my next trip to the South East Asia, to take in all the things I’ll miss this time. Starting in Singapore, take the train through Malaysia, stop off on the west coast of Thailand, and the islands of Koh Samui and Koh Pha Ngnan, then continuing on the train to Bangkok, a flight to Cambodia to see Angkor Wat, a riverboat trip across Cambodia to Phnon Penh and then up the Mekong River to Laos (though apparently, you can only go the whole way by river during the rainy season). Then through northern Laos to Yunnan, China.

Let’s say… 2008. Figure it will take 6 or 8 weeks. Who’s with me?


surely no harm can come of being overprepared?

Seanext-SnipThe past few days, I’ve been running various errands for my trip to southeast Asia. I went to MEC and got a bunch of travelly items, including a new bag (30L; I’ll be flying a lot, so my plan is to get everything I need into a carryon bag). I’ll post my packing list later. Today, I got my watch fixed, confirmed some bookings and printed out all the paperwork and guides I’ll ever need. Next week, I visit the dentist, the optometrist and the immunization clinic (still need another Hepatitis B shot).

This is by far the most-planned, most-researched, most-prepared trip I’ve ever taken. Quite different from my usual plan of ‘grab a guidebook at the airport and read it on the plane’.

the interweb, it is mostly not so good

StupidinterwebFor the past several weeks, I’ve been setting up Machine Learning algorithms that allow machines to learn things about web pages. This is all very well and good — the algorithms work. Now it’s training time. Training means that you have to not only give the computers the data (the web pages), but for each datum, tell it what you want it to learn — “this is a good page”, “this is a bad page”, “this is a page about Buffy the Vampire Slayer“, whatever. And machines are so damned literal. Give the machine a set of Buffy wikipedia articles and science blogs, and it will most likely learn how to distinguish blogs from wikipedia articles (because it’s simpler), and not physics from Buffy.

My work with Active Learning has been helpful, since it allows a computer to find pages which would be particularly useful to get training information on, but ultimately, it comes down to looking at hundreds and hundreds of random web pages and telling a computer what to learn about each one, and not making too many mistakes.

This is my job right now, 8 or 9 or 10 hours a day staring at random web pages, and it’s making me a little batty. After the first few hours of boredom, patterns emerge, the programmer hyper-focus kicks in and it starts to become kind of fascinating. Predictably, there are tons of blogs bashing Bush and/or Microsoft (and virtually none supporting either, though Hillary and Apple both get to feel the hate every once in a while). The expected number of teenagers who think they are vampires. Unexpectedly many message boards about Marxism and people’s problems with their cars and/or significant others. Surprisingly little porn — it’s out there, but it all seems to be confined to its own little ghetto.

Yes, my job involves looking for porn on the web, and no, it’s not nearly as awesome as you think it is.

we all like nips

This week is the Neural Information Processing Systems workshop in Vancouver and Whistler — apparently, skiers and snowboarders are quite well represented in the AI and neuroscience communities. I enjoy the conference and get a lot out of it, but I do tend to find academic conferences have a fair bit of “angels on the head of a pin” aspect to them, and I don’t really feel very much at home with large swaths of the academic Machine Learning community, who have a passion for stats that I can only admire from a distance. It’s been clear to me for some time now that as much as I enjoy research, my path is destined to take me out of the ivory tower into industrial research and development.

That’s not to say NIPS isn’t interesting. I snuck away from my work in Yaletown to hear Joshua Tenenbaum speak today. I think he’s doing some fascinating work, that I’ve been following for a while now.

Since about 1990, AI has been revolutionised by using probabilities, rather than rules, to model human-intelligence-type tasks. This is what powers Google, and computer vision, and other recent successes in AI. However, it was generally felt that this was just a hack, and that the mind processes experience through an inate structure — the most famous proponent of this being, of course, Noam Chomsky. What Josh and other people his area are working on is empirical, rather than structural models, of the brain. In a series of clever experiments, they show that, for certain problems at least, the brain really does seem to make guesses in a Bayesian probabilistic way, just like (much of) Machine Learning. Nobody knows why or how — clearly, we don’t have Bayesian solvers embedded in our heads doing Monte Carlo sampling — but the fact that the solutions are often the same opens the possibility that Machine Learning and Cognitive Science may yet be linked in much more profound way that was thought about five or ten years ago.

Anyway, if you’re a machine learning person and you want more info, you know how to get it. But if you’re not, I highly recommend you check out this article from The Economist about Josh Tenenbaum and Thomas Griffith’s work. It’s really fascinating stuff. I linked to it before, on the previous incarnation of my blog, but that link is long gone, now, and a little reposting never hurt anyone, now did it?