Technology

How the System Gathers and Analyzes Web Documents

The US Election 2008 Web Monitor provides weekly snapshots of global Web coverage. It captures the Web sites of international media from the US, Canada, UK, Australia and New Zealand, environmental organizations, the Fortune 1000 (the biggest US companies in terms of revenue), as well as 1000 popular blogs on political issues. From these sites, the system processes more than 800,000 documents each week.

Weekly Updates

  • Monday: Data Collection
  • Thursday: Publication of New Results

Types of Analysis

The abundance of crawled Web data allows several types of textual analysis (please refer to the text processing section for additional details):

  • Attention presents the number of references to a particular candidate as a percentage relative to all candidate references in a given week. The percentages next to each value indicate weekly changes.
  • Sentiment tracks the co-occurrence (semantic association) of a candidate's name with positive and negative terms taken from a tagged dictionary.
  • Keywords identify topics associated with the presidential candidates by comparing the frequency of terms in sentences that contain the name of a candidate with a reference distribution taken from the sample's complete set of documents.
  • Quotes represent environmental statements by or about a presidential candidate, listed together with their date of publication and a link to the original article.

Related Publications

  • Scharl, A. and Weichselbraun, A. (2008): “An Automated Approach to Investigating the Online Media Coverage of US Presidential Elections”, Journal of Information Technology & Politics, 5(1): 121-132.
  • Scharl, A. (2007). "Towards the Geospatial Web: Media Platforms for Managing Geotagged Knowledge Repositories", The Geospatial Web - How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society. Eds. A. Scharl and K. Tochtermann. London: Springer. 3-14.
  • Scharl, A. and Weichselbraun, A. (2006): "Web Coverage of the 2004 US Presidential Election", 2nd Web as Corpus Workshop, 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2006). Eds. A. Kilgarriff and M. Baroni. Trento, Italy: Association for Computational Linguistics. 35-42.
  • Liu, W., Weichselbraun, A., Scharl, A. und Chang, E. (2005). 'Semi-Automatic Ontology Extension Using Spreading Activation', Journal of Universal Knowledge Management. 0(1). 50-58.
  • Scharl, A. (2004): "Web Coverage of Renewable Energy", Environmental Online Communication. Ed. A. Scharl. London: Springer. 25-34.
  • Scharl, A., Pollach, I. and Bauer, C. (2003). "Determining the Semantic Orientation of Web-based Corpora", Intelligent Data Engineering and Automated Learning (Lecture Notes in Computer Science, vol. 2016). Ed. J. Liu et al. Berlin: Springer. 840-849.