Ronan Cummins (BEng MSc PhD PgCert)




Query Topics Models Using SPUD



Query topic modelling (QTM) is a way of expanding an initial query so that it contains terms that are topically related. The approach is more effective than the relevance modelling (RM) approach (Lavrenko, 2001). In particular, it extracts terms from the top retrieved documents of an initial retrieval run that are more likely to have been generated by the topical aspect of the documents (as opposed to a background model). In short, the main reason for its improved effectiveness is that it penalises noisy terms in a theoretically consistent manner, unlike the RM3 approach. QTMs can be used as a fast way of automatically creating topic models given a few initial seed words. The output of the approach returns the probability of a term being drawn from the query topic model. So for example, consider the expansion terms for the query {us gun violence} below. There are many common words returned high in the term ranking for the relevance modelling approach (i.e. have, use, also, he). Although these could be removed using a stopword list, the ranking of terms is quite different than the QTM approach (and nowadays search engines use only limited stopword removal). The Lucene code is available to here.


Related terms and their term-selection score on Wikipedia for the topic us gun violence
SPUD-RM3 p(t|RM) SPUD-QTM p(Q|t)
gun 0.0332 firearm 0.9928
have 0.0143 gun 0.9845
violence 0.0119 violence 0.9767
firearm 0.0116 control 0.8328
law 0.0091 law 0.8168
control 0.0068 weapon 0.8119
use 0.0061 ban 0.7995
from 0.0060 crime 0.7804
state 0.0060 advocate 0.7500
united 0.0055 policy 0.7227
us 0.0049 handgun 0.7172
other 0.0048 rifle 0.7096
states 0.0048 prevent 0.7001
weapon 0.0046 us 0.7000
crime 0.0045 enforcement 0.6952
which 0.0041 legislation 0.6749
also 0.0040 state 0.6442
handgun 0.0039 homicide 0.6441
he 0.0039 firearms 0.6406
rate 0.0036 criminal 0.6395



Copyright © Ronan Cummins. All rights reserved. Designed by Free Responsive Templates, Validation