26 4. FACETED FRAMEWORK OF INTERACTIVE IR USER STUDIES
design, subsequent data analysis (e.g., statistical modeling) and result presentation. is is part of
the basic, default phase of user study design and will fundamentally dene a majority of facet val-
ues (e.g., task type, experimental system, participant features). For example, in Ong et al.’s (2017)
exploration on the impacts of information scent levels and patterns on search behavior, SERP type
can be considered as the core independent variable as it was intentionally manipulated to produce
varying information scent levels and patterns. Also, search behavioral measures (e.g., query refor-
mulation, dwell time on pages of dierent types, clicking behavior) in this case were considered as
dependent variables and incorporated into statistical models. en, to answer the proposed research
questions, a user study was designed around these fundamental cornerstones and a series of design
decisions (and compromises) were made.
Similarly, researchers focusing on system and interface evaluations often manipulate system
and/or interface feature(s) and see if the variation(s) in these features lead to any statistically sig-
nicant dierence in search performance (e.g., higher precision or recall in search; improved per-
formance in task completion) and search interaction experience (e.g., lower workload, higher level
of search satisfaction). For instance, Ruotsalo et al. (2018) introduced an innovative assistant tool
for search interaction called Interactive Intent Modeling, which can model a user’s evolving search
intents and visualize them as keywords for search interaction. While doing system evaluation, the
researchers employed a variety of search performance measures and found that the intent modeling
and visualization can signicantly improve retrieval eectiveness, users’ task performance, breadth
of information comprehension, and user experience. In this case, the presentation and adoption of
new intent modeling and visualization can be dened as an independent variable as researchers
directly manipulated the interaction interface for dierent groups of participants and compared
search performances and experiences (i.e., dependent variables) across dierent groups (i.e., baseline
group vs. treatment group).
With respect to meta-evaluation of evaluation metrics, researchers often propose and evalu-
ate a series of newly proposed evaluation measures (i.e., online search behavioral measures, such as
querying behavior, browsing and result examination behavior, eye movement pattern and attention
distribution; oine self-reported data, annotations, and usefulness judgments) that capture dier-
ent aspects of search interactions. Once the core measures and ground truth are dened, researchers
usually conduct the meta-evaluation mainly through two ways: (1) examining the correlations be-
tween these measures and predened ground truth (e.g., query-level satisfaction and session-level
satisfaction; relevance judgment; search success) (e.g., Chen et al., 2017); and (2) investigating the
predicative power of the model built upon new evaluation metrics/features in predicting predened
ground truth value (e.g., Liu et al., 2018).
In summary, clearly dening independent and dependent variables (i.e., the associated con-
cepts, operational denitions, as well as the hypotheses, if applicable) are the default “step one”
for user study design and are obviously crucial for most of the IIR research as well as information