Skip to content

Sections
Personal tools
You are here: Home » Method and Overview

Method and Overview

Document Actions
How the System Gathers and Analyzes Web Documents

The | US Election 2004 Web Monitor Web Monitor captures the Web sites of the Fortune 1000 (the biggest US companies in terms of revenue), environmental organizations and international media from the US, Canada, UK, Australia and New Zealand. From these sites, the system processes more than 500,000 documents each week, comprising about 125 million words in 11 million sentences. This abundance of raw data allows three types of analysis (please refer to the technical description for additional details):

  • Attention presents the number of references to a particular candidate as a percentage relative to all candidate references in a given week. The small percentages next to each value indicate weekly changes.
  • Attitude tracks the semantic association of the candidate's name with positive and negative terms taken from a tagged dictionary. While attention is a percentage, attitude can have positive and negative values (zero represents neutral coverage).
  • Keywords identify topics associated with the presidential candidates by comparing the frequency of terms in sentences that contain the name of a candidate with a reference distribution taken from the sample's complete set of documents. Keywords in grey originate from a small number of documents and may not be representative. Moving the mouse pointer over underlined keywords reveals their common definition.

Related Publications

  • Scharl, A. (2004): "Web Coverage of Renewable Energy", Environmental Online Communication. Ed. A. Scharl. London: Springer. 25-34.
  • Weichselbraun, A. (2004): "Ontology-based Text Classification Using Mathematical Methods", PhD Thesis, Vienna University of Economics and Business Administration.
  • Scharl, A., Pollach, I. and Bauer, C. (2003). "Determining the Semantic Orientation of Web-based Corpora", Intelligent Data Engineering and Automated Learning (Lecture Notes in Computer Science, vol. 2016). Ed. J. Liu et al. Berlin: Springer. 840-849.
  • Scharl, A. (2000): Evolutionary Web Development. London: Springer.


Summary
This project of the ECOresearch Network automatically provides a weekly snapshot of international Web coverage. The results reflect online attention and attitude towards the US presidential candidates. Keywords grouped by political party and geographic region summarize the key issues associated with each candidate.
[more]
 
 

Released under a Creative Commons license. Some rights reserved.