Visual analytics for software engineering data

Zhitao Hou; Hongyu Zhang; Haidong Zhang; Dongmei Zhang    Microsoft Research, Beijing, China

Abstract

Many data analysis techniques require substantial knowledge and skills and are typically performed by “data scientists”. Ordinary users may find it difficult to apply these techniques to quickly explore the data by themselves. We propose MetroEyes, a visual analytics tool for interactive data exploration. We have successfully transferred the main concepts and experiences of MetroEyes to Microsoft Power BI.

Keywords

Data analysis techniques; Visual analytics techniques; User interface; MetroEyes; Interactive data exploration

Many data analysis techniques have been widely used in practice to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable knowledge for real tasks around software and services [1]. Although these techniques are effective, they require substantial knowledge and skills and are typically performed by “data scientists.” Ordinary users (such as programmers, marketing personnel, service operators, managers, etc.) may find it difficult to apply these techniques to quickly explore the data and obtain insights by themselves. For example, users may have to learn SQL queries in order to extract data from a database, and to learn statistical techniques to analyze the data. Users often need to write programs to analyze data extracted from an Excel/text file. These techniques may increase the cost of data exploration for ordinary users and create high barriers to entry.

Many methods can be designed to democratize data analysis in software engineering. We advocate for incorporating visual analytics techniques into the analysis of software engineering data. Visual analytics is “the science of analytical reasoning facilitated by interactive visual interfaces” [2,3]. It focuses on the interactive exploration and manipulation of the data. Using visual analytics techniques, a user can perform several successive queries and view the data in a variety of formats before satisfactorily identifying the data in which they are most interested.

The Software Analytics group at Microsoft Research Asia developed a visual analytics tool called MetroEyes. MetroEyes can import data from external sources (such as Excel files or SQL databases) automatically. Data are represented as visual objects, such as a slice in a pie chart, a bar in a bar chart, or a series legend in a line chart, etc. MetroEyes provides an interactive graphical interface that enables users to directly click/touch/move all these objects. The visual objects can also be composed to form a new chart. MetroEyes is able to interpret the intentions of the visual operations, extract corresponding data from the data source, and create a new chart.

For example, assume that we have an Excel spreadsheet which contains the app sales data of a multinational software corporation. The corporation has three product teams (code named TeamA, TeamB, and TeamC) developing Game and Education Apps. The data includes yearly app sales in different counties, along with the detailed sales of different teams and categories. There are five columns (“Year,” “Team,” “Category,” “Country,” “Sales”) in the spreadsheet. Some sample records are shown in the following table.

YearTeamCategoryCountrySales (M)
2010TeamAGameUnited States0.5
2010TeamCGameChina0.5
2010TeamAEducationUnited States0.1
2011TeamCEducationChina0.4
2011TeamBGameUnited States0.3
2011TeamBEducationChina0.2

t0010

In MetroEyes, users can perform data exploration through direct manipulation of the visual objects. Say the users want to explore the sales of TeamA. As illustrated in Fig. 1, users can directly select the TeamA bar from the bar chart representing the contribution of each team to Sales, and drag and drop it into the canvas. The tool can then extract from the data source the app sales data contributed by TeamA, and display it in a new bar chart. This data operation selects a dimension value (Team = TeamA) and finds out its app sales, which is equivalent to the SQL query: SELECT Sales, Team FROM AppSales WHERE Team = “TeamA.” Note that each bar in the bar chart is a visual object, which can be touched and moved around. Furthermore, each bar represents the percentage of sales each team contributes (eg, the TeamA bar indicates the percentage of sales of TeamA). The visual operations over the object have semantic meanings and correspond to certain data operations.

f15-01-9780128042069
Fig. 1 Explore the sales of TeamA.

As another example, say users want to explore the sales data by Team and Category. As illustrated in Fig. 2, users first select the Team dimension and drop it into the canvas. MetroEyes extracts the team data from the data source, and displays a chart that contains the app sales data broken down by Team. Users can then select the Category dimension and drop it into the chart. Finally, MetroEyes displays a chart that shows the Team's sale data, broken down by category. This data operation explores data along multiple dimensions (in this case, the Team and Category dimensions), which is equivalent to the SQL query: SELECT Sales, Team, Category FROM AppSales.

f15-02-9780128042069
Fig. 2 Explore the sales data by Team and Category.

MetroEyes also enables changes from one data visualization format to another, and supports different types of data exploration tasks such as filtering and sorting. For example, data can be sorted through the use of gestures, which is illustrated in Fig. 3.

f15-03-9780128042069
Fig. 3 The gesture for sorting.

When using MetroEyes to explore SE data, users do not need to write SQL queries or programs by themselves, or understand sophisticated mathematical concepts. What they need to do is to decide what data they want, and simply conduct direct manipulation over visual objects to obtain the data. Although the visual operations are very simple to perform, they are able to express rich semantics for data operations. Furthermore, users are able to view the graphical representation of the data throughout the entire exploration process, which provides a much more intuitive understanding of the underlying data.

We believe that data analysis in software engineering should incorporate techniques from visual analytics, in order to help ordinary users analyze and understand their data. In this way, we lower the barrier to entry and reduce the cost of data exploration for ordinary users. MetroEyes has been used internally by Microsoft teams to perform various analytical tasks. We have successfully transferred the main concepts and experiences of MetroEyes to the Microsoft Power BI product (http://www.powerbi.com), which was officially released in 2015.

References

[1] Zhang D., Han S., Dang Y., Lou J., Zhang H., Xie T. Software analytics in practice. IEEE Softw. 2013;30(5):30–37.

[2] Kielman J, Thomas J, Guest editors. Special issue: foundations and frontiers of visual analytics. Inf Vis 2009;8(4):239–314.

[3] Thomas J., Cook K. Illuminating the path: research and development agenda for visual analytics. National Visualization and Analytics Ctr; 2005 http://www.amazon.com/Illuminating-Path-Research-Development-Analytics/dp/0769523234.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset