|
| | | |
Automatically Generated Consumer Health Metadata Using Semantic Spaces
Chen, G., Warren, J.R. and Evans, J.
The continual growth of the World Wide Web presents the
(also growing) population of health information seekers
with the challenge of finding reliable information that is
appropriate to their needs. Metadata about consumer health
websites can provide a guide for end users and
domain-specific search tools. In this paper we present and
demonstrate a method for automatically inferring a
non-trivial metadata attribute that has been encoded for
breast cancer websites: whether the site is 'medical' or
'supportive' in tone. We induce decision trees to
distinguish Medical vs. Supportive sites based on feature
vectors of word co-occurrence patterns, founded in a
semantic space model called Hyperspace Analog to
Language (HAL). We achieve 82% (95% CI: 74% to 91%)
classification accuracy. This should already be a useful
capability for human metadata coders or to support
on-the-fly queries, and it inspires us to further investigate
metadata classifiers based on HAL features. |
Cite as: Chen, G., Warren, J.R. and Evans, J. (2008). Automatically Generated Consumer Health Metadata Using Semantic Spaces. In Proc. Second Australasian Workshop on Health Data and Knowledge Management (HDKM 2008), Wollongong, NSW, Australia. CRPIT, 80. Warren, J. R., Yu, P., Yearwood, J. and Patrick, J. D., Eds. ACS. 9-15. |
(from crpit.com)
(local if available)
|
|