Improving Classifier Performance by Autonomously Collecting Background Knowledge from the Web

TitleImproving Classifier Performance by Autonomously Collecting Background Knowledge from the Web
Publication TypeJournal Articles
Year of Publication2011
AuthorsMinton SN, Michelson M, See K, Macskassy S, Gazen BC, Getoor L
Journal2011 10th International Conference on Machine Learning and Applications Workshops
Pagination1 - 6
Date Published2011///
Abstract

Many websites allow users to tag data items to makethem easier to find. In this paper we consider the problem of
classifying tagged data according to user-specified interests. We
present an approach for aggregating background knowledge
from the Web to improve the performance of a classier. In
previous work, researchers have developed technology for
extracting knowledge, in the form of relational tables, from semi-
structured websites. In this paper we integrate this extraction
technology with generic machine learning algorithms, showing
that knowledge extracted from the Web can significantly benefit
the learning process. Specifically, the knowledge can lead to
better generalizations, reduce the number of samples required
for supervised learning, and eliminate the need to retrain the
system when the environment changes. We validate the
approach with an application that classifies tagged Fickr data.