Top Posts & Pages
- Follow Your New Head on WordPress.com
Summary: Before we can talk about what might be a good metric for semantic similarity (or, if you prefer your glass half-empty, “semantic distance”) , we better be clear on what the term means. The key question is “what exactly are we trying to measure?” and the answer is “either properties, entities, sets, or scenarios”.
At various points in my career I’ve spent some time thinking about how to quantify the degree of similarity between two concepts – not between strings, genomes, images, graphs, or any other type of encoded data structure. I’m talking about ideas, facts, questions, answers, hypothesis, scenarios, or any other type of the ‘stuff’ that forms the inputs and the outputs of cognitive processes. Lets agree for now to refer to these as ‘concepts’.
Every once in a while I would also write some of my thoughts on this topic down, usually for technical reports or proposals. I figured that at some point I would get around to converting some of this material to postings on this blog, but that was not a high priority. Recently, however, the topic of “information distance” popped up on a forum I frequent that is dedicated to machine learning. Much of that discussion thread isn’t relevant to this posting, but what is relevant was the suggestion that Kolmogorov complexity provides a basis for a “universal cognitive similarity distance”. The origin for this idea is, apparently, a 1998 paper : Bennett, C.H.; Gacs, Peter; Ming Li; Vitanyi, P.M.B.; Zurek, W.H., “Information distance,” Information Theory, IEEE Transactions on , vol.44, no.4, pp.1407,1423, July 1998.
As I read both this paper and the other material provided by those advocating this technique, I came to the conclusion that this approach does not address a number of what I view as key challenges and requirements. Explaining the basis for these misgivings is, however, a task that requires more words that can fit comfortably in a comment posted in an on-line forum. Hence, this series of blog postings was bumped to the top of my “To-Do” list.
Before delving into any algorithms or code, it would be a very good idea to set some context and review the types of entities whose similarity we may wish to measure. From that, a set of requirements and challenges can be identified. All of this is just to set a solid foundation for looking at how well various approaches satisfy the requirements.
Obviously, this is going to take more than one post so I may as well get started….
“X” is similar to “Y” but is more like “Z”. Clearly, this is a statement regarding the extent to which entity X has properties in common with entities Y and Z. It is also clear that ‘similarity’ is a quantitative measurement allowing for comparison of the degree of commonality. But what is less obvious, and therefore needs to be clearly stated, is the nature of the entities and properties being compared. My view is that there are four general categories of comparable ‘things’:
Lets start with an easy one. Obviously somebody who is 30 years old is closer in age to a 40 year old than a 10 year old. Comparing properties whose value range is a totally ordered set is, however, too easy. Lets try a harder comparison: which concept is closer to ‘restaurant’; a ‘kitchen’ or a ‘cafe’? Note the emphasis on “concept”. I don’t care about the similarity of the spelling of the words. That’s the kind of structural similarity that a metric based on edit distance will assess. We, however, are interested in semantic similarity. To put it another way, structural similarity only looks at sequences of bits whereas semantic similarity ignores the bits and looks at the underlying meanings. It’s the latter that we care about.
Eventually in this series of postings I’ll come back to this type of challenge and the metrics proposed to handle it. For the moment, however, the goal is only to identify the ecosystem of entities that we might want to assess the similarity of; so lets move on to….
When working on problems relating to document similarity and information retrieval we are dealing with collections of words. That means we need algorithms able to assess the similarity of concept sets.
Take as an example a target concept: pre-20th Century construction techniques used in Asia to deal with earthquakes. To find web pages dealing with this topic you might start with the key words “earthquake”, “Asia”, etc. Let us, for the sake of argument, assume that we have successfully developed a metric for semantic similarity that can handle the type of conceptual similarity described in the previous section. Japan is a region in Asia; terms like “building regulations”, “foundation”, and “superstructure” are all related to construction. That means that any web page containing those words should have some similarity to our target concept.
Now lets assume we have found multiple web pages containing those key concepts. Which is the ‘most similar’ to the target concept? Its pretty safe to assume that everyone will agree that an article that is focused on a specific target is more relevant than one that mentions it in passing. For example, assuming all else is equal, an article that focuses on historical construction methods is more relevant than one that only mentions historical techniques in passing. Somehow we want to look at the overall range of concepts covered by the text and then assess the amount of coverage given to the concepts we care about.
We might identify the percentage of paragraphs that reference in some way one or more of the key concepts we are looking for. What about the position of a concept? If we’re interested in earthquake-resistant construction in Asia, seeing the words ‘earthquake’ and ‘Japan’ in a title or abstract is probably a more significant indicator than if they are just scattered thru the general text.
To summarize, in order to measure the semantic similarity of two documents, it is not enough to simply identify the the concepts covered in the document (i.e, map the vocabulary onto the ontology) and determine the similarity of the found concepts to our target concepts. We also need to assign a weight to each concept within the set of identified concepts and then determine the similarity of the aggregation to our target set (e.g., pre-20th Century construction techniques used in Asia to deal with earthquakes).
The previous section touched briefly on an interesting point: what if the set in question has some sort of structure? With the document example this meant taking into account the semantic importance of words in body paragraphs versus words in a title or heading. Let’s take this to the next level and assume the structure is explicit, fixed, and of primary importance. Arguably the most important domain where this situation is encountered is that of bioinformatics. I, however, am going to use Pandora’s music recommender as my example [NOTE: I’m not about to dive into all the various techniques applicable to building a recommender. I’m simply looking at how a similarity metric may be applied in that context].
Pandora uses the Music Genome Project “ a collection of about 400 musical attributes that collectively essentially describe a song“. We are still looking at measuring the similarity of two entities (i.e., songs instead of documents). The properties, however, are no longer word counts but are instead characteristics like gender of lead vocalist or distortion level. In some ways life has gotten easier and in other ways more difficult. We now have ~400 dimensions to the data space as opposed to a word-based property set with a dimension in the thousands. On the other hand, we now need to deal heterogeneous properties. For document similarity all that mattered was the frequency and position of words and the semantic similarity of those words to the target concept (i.e., weighted taxonomic properties). With the music genome, however, we have numeric properties, enumerated properties, and taxonomic properties.
So lets look at what are dealing with. We have entities (e.g., songs) that may be described in terms of a set of semantic properties defined over ranges that may be totally ordered (i.e., numeric), partially ordered (i.e., taxonomic), or unordered (i.e., enumerated). Furthermore, the feature space is not a level playing field in the sense some properties may be more important than others (e.g., song date of publication vs. the tempo).
One thing that both the document retrieval and the music recommender problems have in common is that in both cases we have an apples-to-apples comparison in that only a single class of entity is involved . What if the issue is more akin to assessing the similarity of a restaurant’s menu to the inventory of a grocery store?
To illustrate what I mean, lets expand the music example so as to include not only songs, but bands and concert tours. Assume we are interested in quartets that mostly play swing-style jazz, that have won at least 2 Grammy awards, and that have toured most recently in either Japan or Europe. Keep in mind that this is not a data base query in that we are not looking for exact matches. Rather we want to query a knowledge base and find bands with similar characteristics.
The problem is we are now dealing with an aggregation of similarity-related issues. We still have the problems of concept similarity (e.g., which is closer to swing: Afro-Cuban or hard-bop?), concept sets (e.g., assessing the similarities of a body of work rather than individual songs), and heterogeneous property ranges (e.g., ordered, partially ordered, etc), but have added issues relating to relationships between entities of different classes (e.g., the concert tour) and logical comparisons involving order (i.e., the ‘most recent tour’ criteria) and sets (i.e cities included on the tour).
I’m going to refer to this type of semantic structure as a scenario. The objective, therefore, is algorithms and metrics applicable to the problem of scenario similarity. If you feel that the jazz band scenario used above is contrived and unrealistically complex, consider the challenges posed by a typical counter-terrorism scenario. Assume our goal is an algorithm that, in the context of a Maritime Interdiction Operation (MIO), can assess and decide which merchant ships are to be monitored based on ‘suspicious’ behavior.
It does so by matching the observed situation to a set of ‘template’ scenarios, each of which is associated with an assessment. Factors may include a ship’s cargo, its crew, and recent ports of call. Also to be considered are aspects of the situation external to the ship (i.e., weather conditions, the type and nationality of civilian and military ship traffic in the immediate vicinity, etc.).
An algorithm generating a similarity metric for conceptual scenarios must, therefore, address multiple factors. Several of these were mentioned previously, including the need to deal with heterogeneous concept ranges, complex concepts, and concept sets. We also touched on heterogeneity of properties and the need to handle concepts defined over valuation ranges that may be totally ordered (e.g. vessel length, wind speed), partially ordered (e.g., vessel type, geopolitical region), or unordered (e.g., color of the hull). To these we can add:
Last, but certainly not least, a similarity assessment must be ‘reasonable’ in the sense that when presented with a set of candidate aggregates that have been ranked by their distance from an ‘objective’ aggregate, a majority of subject matter experts (SME) would, for the most part, be in agreement with the rankings.
As I said at the start, the impetus behind this posting was my uneasiness regarding a proposed “universal cognitive similarity distance” based on Kolmogorov complexity. The discussion up to this point has focused on identifying the requirements any approach to conceptual similarity should, in my opinion, be able to handle. My next post will use this as the starting point for assessing the validity of various solutions, including (but not limited to) the Kolmogorov -based approach.
To be continued….
NOTE: I started on this post back in April so it was 8 months before the actually publishing. Some of the delay was due to work taking up more of my time and part of the delay was due to summer activities (somehow its easier to sit at a computer and write when there is 3″ of snow on the ground then when its beach weather). A significant part of the delay was because similarity is such a broad and complex topic
Several months back, in a brief post on the Turing Test I gave my thoughts regarding the possible irrelevance of the traditional Turing Test. Apparently I am not alone in my opinions. A recent Gigaom article brought to my attention a suggestion to replace the Turing Test with something called the Winograd Schema challenge, first proposed by Hector Levesque at the Univ of Toronto. Levesque’s argument is that the traditional Turing test is based on deceit and trickery. What is required is instead a test that assesses a computer’s ability to reason. The solution as described by Levesque is a test that takes the form of:
a small reading comprehension test involving a single binary question. Two examples will illustrate:
While I agree with Levesque as to the general nature of a desired solution, I think we can do better.
I know that Alan Turing well deserves his fame and I don’t even begrudge him being honored with his own movie and postage stamp. In my opinion, however, the person who is arguably the most responsible for laying the theoretical foundation for the modern Information Age is Claude Shannon.
Shannon made many contributions, some of which, such as the first wearable computer or an algorithm for playing the stock market, are not well known. One of his more famous works is the idea of information entropy as a way to measure the quantity of information in a message. To overly-simplify the concept, the quantity of information in a communications event (i.e., message) is inversely proportional to the probability of that message being transmitted. Entropy, and therefore information content, is zero when only one possible message can be received. To put it in plain English, if you tell me something I already know then you really haven’t told me anything at all.
Let’s consider this idea in the context of a scenario in which you are trying to determine who is the best student in a class of 50 students by asking the teacher questions. You have a class roster listing the names, genders, and date of birth for each student but nothing else. If you were to ask the teacher who the best student is, the probability that the response identifies any one of the students is 2%. Another way to say this is that all answers are equally possible so our uncertainty is 100% (i.e. maximal).
Now suppose the teacher tells us that the best student is a girl. The amount of uncertainty removed will depend on the ratio of boys to girls in the class. The equation is:
and the base of the logarithm does not matter. Assuming we use base e (i.e., ln) and that the class is 50% girls, the teacher’s statement that the best student is female has an information content of approximately 0.693 nats. If, however, the class was 60% girls then the information content drops to 0.51 nats.
So what does all this have to do with coming up with a better alternative to the Turing Test?
Your task is to identify the best student in the class by asking the teacher a series of questions. The only restriction is that you can not ask “who is the best student?”. You could, however, ask what month they were born, which row they sit in, who their best friend is, or anything else you can think of.
The information you start with, the questions you ask, and the sequence in which you ask them will determine how fast you can solve the problem. Some questions are clearly better than others so how you approach the problem will be an indication of your reasoning abilities.
My proposal is to flip things around. Instead of asking questions that a test subject is expected to answer, lets have the test subject do the questioning. We first create a ‘secret scenario’, then provide a test subject with some minimal amount of information, and then challenge them to resolve a question based on that scenario. For example, you are told that John was supposed to meet Mary last Tuesday but did not. You are then asked to identify why the meeting did not take place as planned.
Clearly there is a wide range of possible scenarios that might fit the minimal amount of data we start with; e.g.
The test subject is assessed on the basis of how quickly, if at all, they figure out the correct answer. A full test might consist several scenario-based problems to solve. By administering the test to a random sampling of people we can obtain a baseline that allows us to grade a test subject’s reasoning ability relative to the general population.
The advantages of this approaching to testing machine reasoning are several. Unlike the Turing test it lends it self to scoring against a metric of cognition. The same is true of the Winograd-Schema Challenge proposed by Levesque but unlike the WS test it goes beyond language comprehension or even the testing of general knowledge. Instead it incorporates goal-based information assessment and hypothesis testing. And last, but not least, its named after someone who in my opinion does not get nearly as much recognition in the mainstream media as they deserve.
Another story in the category “we welcome our new robot overlords”.
You’re trapped in an unfamiliar building rapidly being consumed by flames. It’s smoky and you can barely see, complicating you finding your way out. Suddenly, a robot enters the room and herds you and the other people around you toward a stairwell and, finally, out the door.
Robots could become a powerful tool in emergency situations, when they would be capable of entering burning or toxic buildings instead of endangering more human lives. But figuring out exactly how they can help is tricky. They need to be able to read their surroundings quickly and then immediately take action. There’s no time for heavy, time-intensive computing.
Help might come from a familiar source: dogs. Herding dogs instinctually know the best way to quickly and efficiently move a flock of sheep or other animal to a target, and robots could use the same system to move humans.
A team of researchers at Swansea University
View original post 221 more words
In case you missed the recent news, the report of a computer program convincing judges it was a 13-year old human has generated a great deal of discussion as to the significance of the feat. By far, I think the best response to-date is this one.
In the previous post in this series I raised the question of whether or not scientists and engineers can, and will, be replaced by AIs just as inevitably as accountants or typists have been. The trigger for my asking this was the announcement of DARPA’s new program called Big Mechanism. Before digging deeper into what this effort might lead to, I think it makes sense to examine the antecedent work that might have given the folks at DARPA reason to think this was a worthwhile avenue of investigation.
Back in 2004 an interdisciplinary team of computer scientists, chemists, and biologists published a paper in Nature describing Functional genomic hypothesis generation and experimentation by a robot scientist. This robot, named Adam, has been described as the first machine in history to have discovered new scientific knowledge independently of its human creators. An 2010 article in Scientific American by the paper’s lead author, Ross D King, provides a pretty good overview of both Adam and a second robo-scientist named (no surprise) Eve.
The basic idea of Adam is pretty simple to describe. Adam has a single research interest: how the enzymes in yeast are linked to specific genes. Adam is an attempt to “automate the entire scientific process: forming hypotheses, devising and carrying out experiments to test those hypotheses, interpreting the results and repeating the cycle until new knowledge is found.” The steps are:
I’m going to ignore the bio-chemical aspects and underpinning, as well as the robotic/mechanical aspects, and focus strictly on the cognitive aspects. These appear to include:
So is Adam a significant step forward and a warning of things to come, or is this much ado about nothing?
When you break the pieces down and look at them individually it may not seem like such an impressive feat. Rule engines, ontologies, decision trees…. nothing new here. But when you assemble all of these into a integrated construct you start to get something that hints of impressive things to come.
I certainly am not about to claim that Adam is a full-fledged ‘scientist’. There are a lot of shortcomings and limitations to what he, or his derived spouse Eve, can do. But I will also point out that there are a lot more technologies that can be added to the aggregation that is Adam. A lot of progress is happening on many fronts of the AI war and there are, I suspect, many opportunities to enhance the capabilities of a construct like Adam. As I quoted in a previous post , it has been said that the most insidious problem in the field of AI is the mechanistic balkanization of the field. Adam is the type of project that shows what can be done when aggregation, rather than balkanization , is the motto of the day.
So where do we (i.e., the human race) seem to stand? Clearly efforts like Adam are noteworthy and demonstrate another step closer to whatever future, be it utopian or dystopian (take your pick), we are heading towards. But we are not there just yet and, I suspect, the journey is far from over. I have a rough rule of thumb that I call “thirty to maturity”. What I mean by that is that it seems to take about thirty years for some new technology to make it from the initial stage of starting to get noticed by insiders at academic conferences to the point of being a consumer-friendly ready-for-prime-time product. That’s about how long it took to get from the first-stage research on packet switching to teenagers surfing the web. I can think of several other examples that fit that model. Given Adam’s debut was 2004, I’m fairly comfortable prognosticating that we have another 20 years work ahead of us before robo-scientists become ubiquitous enough to be considered business as usual.
Now I am sure most of the folks reading this will at this point be comforting themselves with the thought that their field is so complex and difficult to master that robots will never take over and will, at most, replace lab assistants. I would agree that the playing field is not level and the challenges are not uniform. So the next step should be to conduct a threat assessment and consider if, for example, biochemists are easier or harder to replace that petroleum geologists.
To be continued…
A while back a news story in the Financial Times came to my attention, The article, titled “Automation and the threat to jobs”, featured comments made by Google’s Executive Chairman Eric Schmidt while attending the 2014 World Economic Forum in Davos:
“The race is between computers and people and the people need to win,” he said. “I am clearly on that side. In this fight, it is very important that we find the things that humans are really good at….There is quite a bit of research that middle class jobs that are relatively highly skilled are being automated out.”
The article went on to state that economists are worried that “technology is starting to have a deeper impact than previous periods of technological change and may have a permanent impact on employment levels” and that “the current pace of change is too fast for employment levels to adapt.”
Another article covering the same Davos get-together, this time in the Globe and Mail, states
For all the talk of growth, though, the global economy is also in an employment morass that has the smartest people in the room humbled and anxious. The rebound is not producing jobs and pay increases to the degree that many of them expected.
So a sense of uneasiness seems to be prevalent among the business and economic elites as they view the world from high in the Swiss Alps. Meanwhile, down in the low-lands….
Now I assume that anybody reading this is no doubt aware that DARPA has been very active in advancing the state-of-art in the field of robotics. Videos of self-driving cars navigating the desert and large terminator-like humanoids are easy to find and, in almost every one of these, somewhere in the background is the DARPA logo. If you are an enemy soldier, the thought of the later of these entities coming after you can not be very comforting. On the other hand, if you make your living driving a taxi, the idea of self-driving cars probably isn’t very comforting either. At least the computer scientists who are building these devices don’t have anything to lose sleep over. Or do they?
A few days after the Davos meeting wrapped up, DARPA announced a new program called Big Mechanism. Despite its name, it has nothing to do with hardware. It is not a robotics project but it is an AI project. Furthermore, it’s the type of AI project that might result in some job insecurity among those working on it.
Big Mechanism is the type of ambitious and revolutionary effort DARPA is known for. To quote from the announcement:
The Big Mechanism program will develop technology to read research abstracts and papers to extract fragments of causal mechanisms, assemble these fragments into more complete causal models, and reason over these models to produce explanations.
Sounds to me like an average grad student. Not yet up to the task of writing and defending a PhD thesis but possibly headed in that direction.
The initial focus domain for Big Mechanism is cancer biology but the technologies to be developed have cross-domain applicability:
The Big Mechanism program will require new research and the integration of several research areas, particularly statistical and knowledge-based Natural Language Processing (NLP); curation and ontology; systems biology and mathematical biology; representation and reasoning; and quite possibly other areas such as visualization, simulation, and statistical foundations of very large causal networks. Machine reading researchers will need to develop deeper semantics to represent the causal and often kinetic models described in research papers. Deductive inference and qualitative simulation will probably not be sufficient to model the complicated dynamics of signaling pathways and will need to be augmented or replaced by probabilistic and quantitative models. Classification and prediction will continue to be important, but causal explanation is primary.
This is a pretty broad and ambitious list and, with the exception of the reference to “ systems biology and mathematical biology“, everything mentioned is general in nature.
I’m not a big fan of the term ‘singularity‘ . I think it gets tossed around too easily and with no real consensus of what it entails. We can all agree, however, that if the Big Mechanism program is successful it moves us farther down the path to whatever the singularity is.
There are those who regard the so-called singularity as the point at which mankind becomes enslaved by our new robot overlords. This is what I call the Skynet point of view, as opposed to those who see advanced AIs as helpers and partners (what I call the R. Daneel Olivaw point of view). The folks at DARPA are clearly of the latter opinion. Certainly there is a great deal of justification for taking the optimistic point of view. The effect this technology could have on cancer research, the initial target domain, is a pretty good argument in its favor.
Even if you fall into the pessimistic camp, you can take some comfort from the thought, according to the predictions of the experts, that the tipping point is still decades away. But no matter what your opinion is regarding what the so-called singularity will look like or when it may occur, what should not be overlooked is that there are other less obvious and more immediate consequences to these continuing developments in the AI field. It’s those that have the folks at Davos feeling a bit uneasy.
The question people should therefore be asking is not “am I about to be enslaved by an omnipotent AI overlord?” but rather “am I about to lose not only my current job but also any hope of future employment?”. I suppose if the answer to this is a “yes” then you should also be wondering how soon?
So should we fear our own creations? Is the Big Mechanism program an indication that scientists and engineers can, and will, be replaced just as inevitably as accountants or typists?
“Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones. And so people who have the omniscience that they can say with high certainty that something has not happened or is not being tried, …….can do things I can’t do.“
Donald Rumsfeld answering question at a press conference, Feb 12, 2012.
I know Donald got laughed at for saying this and the sound-bite certainly got replayed a lot by late-night comedians. I think, however, if you hear the full statement or, better yet, read the transcript, it makes a great deal of sense, especially when considered in the context of formal logic and the open world assumption. There is a new documentary on Rumsfeld called “The Unknown Known” so I thought this was a good time to post this particular quote.
I guess it makes sense that two of the popular memes in the tech world would eventually intersect. I am referring on one hand to those ubiquitous TED Talks and on the other to the constantly growing number of X Prizes. It seems that everybody who is anybody in the tech world makes it a priority to attend, if not speak, at TED. Likewise, there seems to be a unquestioned belief that the best way to foster innovation is via a contest or “grand challenge“. So merging these two impulses is as obvious a combo as bacon flavored ice cream!
Up until now, my favorite X Prize has been the Tricorder. I figured this had to be the brain-child of some ageing Trekkies. Now apparently, some fans of Issac Azimov’s work, has gone one better: the A.I. XPRIZE:
This is described as “a modern-day Turing test to be awarded to the first A.I. to walk or roll out on stage and present a TED Talk so compelling that it commands a standing ovation from you, the audience.”
I think this is a worthwhile step in the right direction but I think its possible to push the boundaries of AI even farther. The key to doing so is to focus not on the substitution of a robot presenter for a human presenter, but rather to replace the TED audience with the robots! I therefore propose the funding of the I.A. XPrize: a modern-day Turing test to be awarded to the first I.A (Intelligent Audience) that can listen to a TED Talk and determine if it is meaningless techno-babble or a significant insight.
Unfortunately I lack the financial resources to fund this noteworthy undertaking on my own. Luck for me there is a third popular meme these days for promoting innovation: crowd funding! I therefore propose the creation of a crowd-funded XPrize for the first team to develop a working Intelligent Audience. It is literally impossible to think up a better stimulus to innovation in the fields of A.I and cognitive science than this. TED plus XPrize plus crown-funding! All in favor, say “Aye”.
A few weeks back I noted the habit of researchers in fields relating to machine intelligence to turn to examples focusing on wine. Apparently the field of alcohol-fueled research is broader than I realized. This time, however, the studies have focused on whisky, rather than wine:
I will, of course, do all that I can to contribute to this promising avenue of inquiry.