APPENDIX A

Tools in Blogosphere

Several modeling tools are available to simulate the social network embedded in the blogosphere that help in better understanding of various characteristics of these networks and conduct experiments. We will describe one such tool, BlogTrackers [75], in detail and briefly mention the others.

Sociologists are interested in studying the blogosphere for tracking socio-behavioral patterns, identifying the influential people in the region of interest and tracking interesting activities. They often have to eyeball the sites for useful information. Given a gamut of interests in the blogosphere, this can be a tedious and time consuming task. BlogTrackers is a user-oriented application that alleviates this problem by assisting them in effectively tracking and analyzing blogosphere. BlogTrackers grants sociologists the freedom to choose the blog sites they wish to analyze, observe interesting events and patterns with the flexibility of drilling-in. The tool consists of a number of analyzing and crawling modules and is a convenient alternative to eyeballing the blog sites and concentrate efforts on further analysis.

Most tools are generic in nature and cannot be directly used by sociologists and others with specific needs. BlogTrackers is particularly designed for their needs that can perform both data collection and provide convenient visualizing tools to analyze the data. Table A.1 presents a comparison of BlogTrackers with some of the existing tools like Technorati (http://technorati.com/), BlogPulse (http://www.blogpulse.com/), BlogScope [76] (http://www.blogscope.net/), GTD Explorer [77] (http://www.cs.umd.edu/hcil/gtd/gtd/intro.html), IceRocket (http://www.icerocket.com/), Google Blog Search (http://blogsearch.google.com/) that are specifically tailored for the blogosphere. Although, sites like Technorati and BlogPulse provide features similar to BlogTrackers, they cannot be directly used. BlogTrackers combines them in a unique manner to maximize the analytical capability of the individual techniques.

BlogTrackers is a Java based desktop application that provides a unified platform for the user to crawl and analyze blog data. It grants the user, the freedom to choose the data of interest and helps in effectively analyzing it. The data is stored in a relational database. Currently, it is tracking 21 different data sources like Twitter, Engadget, The Unofficial Apple Weblog (TUAW), LiveJournal, Flickr, Blogcatalog etc. The framework consists of two main components: Crawler and Tracker.

Crawler: BlogTrackers offers two types of crawlers to the user. The batch crawler crawls the websites from scratch and stores it in a database through HTML scraping using regular expressions to parse data from the HTML files. The RSS (Really Simple Syndicate) crawler on the other hand incrementally crawls the websites by retrieving information from their feeds. RSS crawler can be scheduled to run automatically and update the database.

Table A.1: Comparison of various analysis and visualization tools for Blogosphere (based on [75]).

Images

Tracker: The tracker component provides the user with a set of tools to analyze the data. The blog site to be used for analysis can be chosen by the user. The following are the 4 major tools in BlogTrackers:

1. Blog Analysis: BlogTrackers contains a blog browser that can be used to individually analyze the blog posts within a time period, as shown in Figure A.1. The time window can be adjusted as required. Entire blog posts are indexed for better viewing experience. Another tool for blog analysis is Term Frequency Analyzer, which shows the tag cloud (a visualization which highlights the terms by their frequency by varying the font) for all the blog posts in a given time period. This tool can be used to identify key terms associated with the blog posts during a particular time period. A traffic pattern graph can also be generated for a particular time period as shown in Figure A.2. The bar graph shows the traffic bursts depending on the granularity chosen (daily, weekly, monthly, or yearly). The bursts can be individually analyzed to observe the blog posts and the tag cloud for that period.

2. Blogger Analysis: BlogTrackers can be used to search for influential bloggers at a blog site. The influential bloggers are generated as described in [44]. It is also possible for a user to drill-in and look at the tags and the blog posts of the influential bloggers, as shown in Figure A.3. Bloggers can be classified based on their activity and influence into different categories like Active-Influential, Inactive-Influential, Active-Non Influential, and Inactive-Non Influential. These categories can be visualized as a confusion matrix as shown in Figure A.4.

Images

Figure A.1: Blog posts displayed in the blog browser of BlogTrackers.

3. Search: The blog sites are crawled on a daily basis and the posts are stored and indexed using Lucene. The index is automatically updated and can be used to search the blog posts for specific queries. The search interface of the BlogTrackers is shown in Figure A.5.

4. Watchlists/Alerts: BlogTrackers offers a convenient notification system to the users. A user can specify terms in the watchlist. The user is then notified by e-mail if any new post contains that word. The user can also choose the terms from a system-generated list based on popularity as shown in Figure A.6.

Images

Figure A.2: Traffic pattern graph also displaying the tag cloud of the high activity period from the BlogTrackers.

Apart from the tools that are specifically tailored for Blogosphere, there are also some generic visualization software tools that do not target blogs, per se, but can be used to do some analysis on the blog data. Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/) is a visualization tool that can be used to visualize the network data in various ways. IBMÕs ManyEyes (http://manyeyes.alphaworks.ibm.com/manyeyes/) is another interesting project on generic visualizations but suffers from scalability issues. The Prefuse visualization toolkit (http://prefuse.org/) contains a set of unique visualizations for the data. These and many other are briefly summarized below:

• NetLogo: (http://ccl.northwestern.edu/netlogo/) A multi-agent programming language and modeling environment designed in Logo programming language. Modelers can give instructions to hundreds or thousands of concurrently operating autonomous “agents”. This helps in exploring the connection between the individuals (micro-level) and the patterns that emerge from the interaction of many individuals (macro-level).

• StarLogo: (http://education.mit.edu/starlogo/) An extension of Logo programming language. It is used to model the behavior of decentralized systems like social networks.

Images

Figure A.3: Displaying the influential bloggers during a specified time period from the BlogTrackers.

• Repast: (http://repast.sourceforge.net/) Recursive Porous Agent Simulation Toolkit is an agent-based social network modeling toolkit. It has libraries for genetic algorithms, neural networks, etc. and allows users to dynamically access and modify agents at run time.

• Swarm: (http://www.swarm.org/wiki/Main_Page) A multi-agent simulation package to simulate the social or biological interaction of agents and their emergent collective behavior.

• UCINet: (http://www.analytictech.com/) A comprehensive package for the analysis of social network data including centrality measures, subgroup identification, role analysis, elementary graph theory, and permutation-based statistical analysis. In addition, the package has strong matrix analysis routines, such as matrix algebra and multivariate statistics.

• Pajek: (http://vlado.fmf.uni-lj.si/pub/networks/pajek/) (Slovenian: spider) A software for analyzing and visualizing large networks like social networks.

• Network package in “R”: (http://cran.r-project.org/src/contrib/Descriptions/network.html) The network class can represent a range of relational data types, and support arbitrary vertex/edge/graph attributes. This is used to create and/or modify the network objects and is used for social network analysis (SNA).

Images

Figure A.4: Analyzing the blogger categories during a specified time period from the BlogTrackers.

• InFlow: (http://www.orgnet.com/inflow3.html) Another integrated product for network analysis and visualization. It has been used in the SNA domain.

• NetMiner: (http://www.netminer.com/) A tool for exploratory network data analysis and visualization. NetMiner allows to explore network data visually and interactively, and helps in detecting underlying patterns and structures of the network.

• SocNetV: (http://socnetv.sourceforge.net/) A Linux based SNA and visualizing utility. SocNetV can compute network and actor properties, such as distances, centralities, diameter, etc. Furthermore, it can create simple random networks (lattice, same degree, etc.).

Images

Figure A.5: Search feature of the BlogTrackers that helps in filtering out the relevant blog posts.

Images

Figure A.6: Watchlist feature of the BlogTrackers that helps in monitoring future occurrences of “hot” keywords in blog posts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset