32 4. FACETED FRAMEWORK OF INTERACTIVE IR USER STUDIES
goals and assessed the eectiveness of the innovative retrieval and ranking algorithms in improving
participants’ word-learning outcomes.
User studies that manipulate system and/or interface features can oer empirical basis for
IIR system evaluation and also shed light on the development of search systems and the design
of user interfaces in various settings (e.g., home setting, workplace, public and school libraries). In
addition to these subfacets, system features also involve other important aspects, such as the devices
on which a search system can operate (e.g., desktop computer, tablet, smartphones), test collections
and corpora. Particularly, some IIR research focus on the dierences in user search behavior across
dierent types of devices and suggest that users often adjust their search strategies according to
some specic interface features (e.g., search result representation, SERP size, information scent
level) of the systems running on dierent devices (e.g., Harvey and Pointon, 2017; Ong et al., 2017;
Song et al., 2013). ese studies can provide actionable design implications for system and interface
designers and facilitate user-centered evaluations of personalized search systems.
4.1.4 STUDY PROCEDURE AND EXPERIMENTAL DESIGN
As Figure 4.2 illustrates, the procedure of an IIR user study is largely aected by the proposed re-
search questions, externally assigned or self-generated task types, user characteristics, as well as the
system and interface features manipulated or created by researchers. Based on the proposed research
questions and the related variables, researchers need to decide other components and steps of the
user study and to gure out the design decisions and compromises they have to make in order to
properly address the research problems with available resources.
Due to the nature of dierent research questions, some of the user studies adapted experi-
mental design from psychology and applied the associated methods in studying the search behaviors
of users with dierent characteristics and in evaluating innovative search systems. e treatments
in IIR user studies often involve the main facet(s) or core factors, such as task facet and task type
(e.g., factual task vs. intellectual task), user group (e.g., expert group vs. novice group), and system
features (e.g., experimental system group vs. baseline system group). To create a suitable, reliable
experimental setting, researchers have to not only design and rene the treatment condition(s), but
also think carefully about how to dene a reasonable baseline condition. us, the justication of
a user study (especially experimental) design should detailedly describe and explain both the treat-
ment part and baseline part. On top of that, researchers also have the obligation to come up with
a reasonable ground truth if the user study involves user-centered evaluation of any kind, such as
interface evaluation, new search tool evaluation, and ranking algorithm evaluation.
In addition to the basic components discussed above, a user study researcher also needs to
adjust the values of other subfacets associated with the study procedure and experimental design.
For instance, as a preparation work for many user studies, pilot study and pre-study training or
tutorial should be properly reported and discussed in user study papers. ese pre-study parts are
33
critical for user study evaluation especially in the cases where participants are asked to interact with
entirely new system features or a search interface and thus need more time to get familiarized with
it so that they can interact with the search system in a more natural manner. In addition to these
two pre-study components, researchers also need to reect on other dimensions of user studies, such
as the length of study (balancing data richness and user fatigue), methods of data quality control
at dierent stages of user study (i.e., pre-study control in participant recruitment, within-study
control, post-study control and data ltering), types of experimental design (i.e., within-subject de-
sign, between-subject design, mixed design), and overall study environment (e.g., eld/naturalistic
settings and controlled lab settings). e decisions and choices made under each of these facets all
represent dierent design decisions made in study design. e “collaboration among these facets
can jointly aect the level of richness of the data collected as well as the quality and reliability of
the nal results. For instance, some researchers choose to conduct studies in eld settings instead
of controlled laboratory environments may be because addressing the identied research problem(s)
requires researcher to extract natural tasks and to elicit natural search strategies and responses as
much as possible (e.g., He and Yilmaz, 2017). At the same time, however, as a “price of this choice,
some types of control and support over the search context may become unavailable (e.g., task type,
real-time interventions). Another example is about the facet of experimental design. In the IIR
evaluation studies in which multiple tasks and/or systems are involved, adopting between-subject
design can help avoid possible learning eects, which is very likely to happen in the studies that
employ within-subjects design. Meanwhile, however, to facilitate model building and to maintain
the required statistical power in signicance tests, researchers using between-subjects design have to
recruit more participants (for avoiding underpowered situations), which is often dicult to achieve
for oine recruitments. In addition to these two types of experimental design, adopting mixed
design may allow researchers to apply and investigate more complicated combinations of dierent
facet values in various study settings (e.g., the combination between task type and interface feature).
Nevertheless, researchers still need to be mindful of the limitations from both sides.
Study procedure and experimental design are two of the main facets that shape the ways
in which multiple facets and variables discussed above interact with each other and jointly decide
the direction and quality of data collection and analyses. In other words, these two facets jointly
serve as the bridge connecting the input part (e.g., task context design, population selection and
participant recruitment, manipulation and design of system features) and the output part (e.g.,
search behavior and experience measures, results) of IIR user studies. erefore, through digging
into the details of user study procedure and experimental settings, researchers can better detect,
understand, and explain study design decisions and compromises made in terms of a variety of
facets and associated subfacets.
4.1 STRUCTURE OF THE FACETED FRAMEWORK
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset