Last week, I went to a rather interesting talk at the LSE titled ‘How Would a Robot Read a Novel?’. I was introduced to a software, primarily used in the social sciences, called Alceste (note: this, and many other sites I’ve linked to in this post, are Google-translated pages, from the originals which are in French. There seems to be surprisingly little about it on the web in English). What Alceste does is look for repetitions of co-occurrences of words over a large volume of text to assess patterns. In the social sciences, it is used (still in only a few places, and in a limited number of cases at that) to detect instances of bias in surveys. Research has apparently shown that when words occur in the same pattern repeatedly, it is rarely random.
Alceste doesn’t understand meaning, and makes no pretenses about trying to do so. It was created by Max Reinert of the National Centre for Scientific Research (CNRS) in France, and is now marketed by a company called Image that holds all rights to it, from what we were given to understand.
Anyway, now that I’ve given you the context, let me move on to explaining what was really interesting about the talk. Dr. Kavita Abraham, a researcher at the LSE’s Methodology Institute, used Alceste to analyse a novel called the Kilburn Social Club by Robert Hudson. It is worth noting here that when Alceste was introduced as having been used to assess some literary works earlier as an experiment, members of the audience were easily able to identify the books as being Oliver Twist and Moby Dick. With the Kilburn Social Club, Dr. Robert Hudson (a history academic-turned-author) admitted that Alceste’s analysis matched the pattern of the story he started out intending to write, in that the words used were seen as generally being grouped around 4 themes (16% descriptive, 12% football, 22% finance and 50% relationships). So it could be used, hypothetically, during the process of writing to ensure that a book wasn’t skewed heavily in one or the other direction.
Dr. Hudson clearly meant ‘hypothetically’, though, because the truth is, as we discussed after the talk, we don’t really need Alceste to tell readers about patterns in books. Why would you want to reduce a work of art to a mere jumble of statistically co-relating groups of words? People read literary works FOR that element of bias (I think James is writing a post about how opinion – bias, if you must – is in fact often not given the respect it deserves in today’s world). A quote of Mark Twain’s was proffered by one of the panel members: ‘A classic is something that everyone wants to have read but no one wants to read’, but I’d argue that at a stretch you can extend it to summarizing business books – the way Kevin Duncan does on his blog, for example. It’s useful to time-starved people who want to be able to speak intelligently about a book and learn the distilled lessons from it, but who don’t have the time to wade through it in its entirety. You just can’t do that with novels, though! Here’s an example of how Alceste summarized that potboiler of potboilers, The Da Vinci Code. It’s quite a laugh.
One of the issues that was left simmering in my mind as I left the venue is that there are so many technologies we’re introduced to on a daily basis that many of us perhaps do not really question the need for – probably even more common in the case of clients. Is ‘I want a social media’ really still an accepted statement?
Google Buzz is being debated upon as either a highly intrusive or potentially highly social application, while right here at Made by Many we’re arguing the benefits of using Yammer at work versus plain old Twitter. The question isn’t what we can do with it, as in the case of Alceste, where it has been accepted that it is really only useful to the social sciences because that discipline is based on the removal of bias. The question is do we need it at all?
(A PDF of the talk, for those interested, is now available here).