|
| | | |
Extracting and Modeling the Semantic Information Content of Web Documents to Support Semantic Document Retrieval
Noah, S.A., Zakaria, L. and Alhadi, A.C.
Existing HTML mark-up is used only to indicate the
structure and lay-out of documents, but not the document
semantics. As a result web documents are difficult to be
semantically processed, retrieved and explored by
computer applications. Existing information extraction
system mainly concerns with extracting important
keywords or key phrases that represent the content of the
documents. The semantic aspects of such keywords have
not been explored extensively. In this paper we propose
an approach meant to assist in extracting and modeling
the semantic information content of web documents using
natural language analysis technique and a domain specific
ontology. Together with the user's participation, the tool
gradually extracts and constructs the semantic document
model which is represented as XML. The semantic
models representing each document are then being
integrated to form a global semantic model. Such a model
provides users with a global knowledge model of some
domains. |
Cite as: Noah, S.A., Zakaria, L. and Alhadi, A.C. (2009). Extracting and Modeling the Semantic Information Content of Web Documents to Support Semantic Document Retrieval. In Proc. Sixth Asia-Pacific Conference on Conceptual Modelling (APCCM 2009), Wellington, New Zealand. CRPIT, 96. Kirchberg, M. and Link, S., Eds. ACS. 79-86. |
(from crpit.com)
(local if available)
|
|