59
Appendix
User study annotation example A1: Moshfeghi, Triantallou, and Pollick (2016)
Research Focus and Variables
[Research Focus] understanding search behavior and user cognition [Keywords (directly
extracted from the paper)] anomalous state of knowledge, information need, information
retrieval, fMRI study; [Independent Variable] level of information need (IN and no-IN);
dierent scenarios: QA scenario and QAS scenario; [Dependent Variable] blood oxygen-
ation level dependent (BOLD) activity in human brain;
Participant
[Recruitment] participants were recruited from the participant database at Centre for
Cognitive Neuroimaging, University of Glasgow. [Controlled lab] Yes; [Field/Large-
scale] N.A.; [Sample Size] 24 participants [Gender Composition] 11 males, 13 fe-
males; [Participant Occupation] no information; [Age] all under 44; 18-23 (54.1%) and
30-35 (20.8%); [Education Background/Level] no information; [Participants’ Native
Language(s)] 79.1% native English speaker. 20.9% had an advanced level of English;
[Language Used in Study] English; [Regular Incentives] 6 pounds per hour. [Extra
Incentives/Bonus] no information; [Length of Study] about 1 h (50 min to perform all
tasks, 10 min to obtain a scan of anatomical structure.
Task
[Task Source] experimenter assigned: TREC-8 and TREC 2001 QA tracks- main task;
[Search Task Type] QA tasks (40 hard and 40 easy); [Search Task Topic] topics from
TREC QA tracks; [Number of Tasks] 80 tasks; [Number of tasks/Person] 80 tasks;
[Time Length/Task] 8 s for question and options presentation, 4-6 s; for transition, no
time limit for choosing option; [Work Task Type] N.A.; [Did Work Task] N.A.; [An-
swer Search Task] choosing option; [Evaluation Task] N.A.
Study Procedure and Experimental Design
[Task/Session Feature Controlled] two assessors separately judged the diculty of the
questions from TREC QA tracks, and then selected of a subset of questions (40 easy
and 40 hard) both annotator agreed upon the diculty level; [Task Rotation] tasks and
options within each senario were randomized; [Pilot Study] a pilot study was performed
60 APPENDIX
using two participants to conrm that the process worked correctly and smoothly; [Pre-
study Training] no information; [Actual Task Completion Time] no information;
[Quality Control/Data Filtering Criteria] no information; [Experimental Design]
within-subjects design.
System Features
[Study Interface Element Varied] N.A.; [Other System/Context Feature Varied] N.A.;
[Study Apparatus] image presented using presentation software; fMRI data was collected
using a 3T Tim Trio Siemens scanner and 32-channel head coil; [Search Collection/
Corpus] TREC-8 and TREC-2001 QA tracks main task; [Ranking Algorithm] N.A.;
[Non-traditional IR System Assistance Tool] N.A.
Behavioral and Search Experience Measures
[Search Behavior Measures] response to the options; query formulation and document
examination in the Scenario 2; [Instrument for Collecting Search Behavioral Data]
choosing options by pressing buttons; submitting query verbally via a noise-cancelling
microphone; [Relevance Judgment] N.A.; [Instrument for Collecting User Judgment]
N.A.; [Search and System Performance Measures] N.A.; [Neuro-physiological Mea-
sures] BOLD activities; [Instruments for Capturing Neuro-physiological Measures]
image presented using presentation software; fMRI data was collected using a 3T
Tim Trio Siemens scanner and 32-channel head coil; [Oine Information Seeking
Behavior] N.A.; [Other Information Behavior] N.A.; [Data Analysis Method] mul-
tiple linear gression (GLM) for rst-level analysis; random eects analysis of variance
for second-level analysis; [Qualitative Analysis] N.A.; [Level of Analysis] task level;
[Task-independent Measures] no information; [Task/Session Perception Measures]
post-search questionnaire: the tasks we asked you to perform is (easy/stressful/familiar/
clear/satisfactory) answer 1-5 (level of agree); [Search Experience and System Evalua-
tion Measures] no information.
Data Analysis and Results
[Statistical Test Assumption Check Reported] No; [Results] 1. Dierence in brain
activity due to whether participants experienced IN or not. 2. ese dierences appeared
sensitive to whether or not the IN was associated with actually making a search (or simply
deciding that a search would be necessary). 3. A particular region of the brain, posterior
cingulate, could be an essential component of IN; [Eect Size] reported.
61APPENDIX
User study annotation example A2: Jiang, He, and Allan (2017).
Research Focus and Variables
[Research Focus] meta-evaluation of relevance evaluation metrics; [Keywords (directly
extracted from the paper)] relevance judgment, search experience, implicit feedback;
[Independent Variable] in-situ relevance and usefulness judgment, multiple dimensions
of judgment; [Dependent Variable] post-search relevance/usefulness judgments, post-
search search experiences;
Participant
[Recruitment] recruited through iers posted on the campuses of two universities in
the U.S. [Controlled lab] Yes; [Field/Large-scale] N.A.; [Sample Size] 28 participants
[Gender Composition] 12 male, 16 female; [Participant Occupation] students; [Age]
no information; [Education Background/Level] undergraduate and graduate; [Par-
ticipants’ Native Language(s)] English; [Language Used in Study] English; [Regular
Incentives] $15 per hour; [Extra Incentives/Bonus] no information; [Length of Study]
about 100 min.
Task
[Task Source] TREC session tracks; [Search Task Type] 4 types (product + goal): fac-
tual specic, factual amorphous; intellectual specic, intellectual amorphous.; [Search
Task Topic] 7 topics; [Number of Tasks] 28 tasks; [Number of tasks/Person] 4 tasks;
[Time Length/Task] 10 min per task; [Work Task Type] problematic situation/context
provided in TREC task descriptions; [Did Work Task] N.A.; [Answer Search Task] No
information; [Evaluation Task] post-search judgment stage: rate search experience and
nish post-session judgments on each result they visited in the session.
Study Procedure and Experimental Design
[Task/Session Feature Controlled] same 4 types for every individual users; [Task Rota-
tion] Latin-square rotation; [Pilot Study] no information; [Pre-study Training] before
4 formal tasks, each participant worked on a training task (including all the steps) for 10
min; [Actual Task Completion Time] about 20 min per task; [Quality Control/Data
Filtering Criteria] only recruited native English speakers to exlude the eect on English
uency on relevance/usefulness judgments; 9 cases of revisitng URL (about 1% of the
data) were excluded; the researchers required the participants to take a 5-min break after
2 formal tasks to reduce fatigue; require participants to spend at least 30 s on judging each
result in post-search session; [Experimental Design] within-subjects design.
62 APPENDIX
System Features
[Study Interface Element Varied] N.A.; [Other System/Context Feature Varied]
N.A.; [Study Apparatus] desktop computer; [Search Collection/Corpus] open web:
ltered Google search results (only include 10-blue links); [Ranking Algorithm] N.A.;
[Non-traditional IR System Assistance Tool] N.A.
Behavioral and Search Experience Measures
[Search Behavior Measures] click dwell time; follow-up query features after a target
click (URL/title/body); follow-up click features; prior-to-click features (baseline) [infor-
mation available before users clicking on the result]; [Instrument for Collecting Search
Behavioral Data] logged by an experimental system; [Relevance Judgment] in situ
relevance judgment; post-search relevance judgment (TREC web track style: focusing on
topical relevance); [Instrument for Collecting User Judgment] in situ relevance judg-
ment: recorded by pop-up pages and results logged by experimental system; post-search
judgment: post-search survey in judgment stage; [Search and System Performance
Measures] N.A.; [Neuro-physiological Measures] N.A.; [Instruments for Capturing
Neuro-physiological Measures] N.A.; [Oine Information Seeking Behavior] N.A.;
[Other Information Behavior] N.A.; [Data Analysis Method] Pearsons r, Spearmans
r, multilevel regression analysis, Prediction: gradient boosted regression trees (GBRT);
[Qualitative Analysis] N.A.; [Level of Analysis] task level; page and action level (e.g.,
prior-click and post-click actions); [Task-independent Measures] pre-search survey:
gender, age, degree, search experiences (used in multilevel regression); [Task/Session
Perception Measures] pre-search survey: task topic familiarity; post-search survey: task
diculty; [Search Experience and System Evaluation Measures] post-search survey:
satisfaction, frustration, success, total eort, helpness of the system.
Data Analysis and Results
[Statistical Test Assumption Check Reported] multi-colinearity checked and reported
by using variance ination factor (VIF); [Results] 1 in situ judgments do not exhibit clear
benets over the judgments collected without context (post-session judgment); combing
relevance or usefulness with the four alternative judgments (four aspects) improves the
correlation with user experience measures; click dwell time is able to predict some but
not all dimensions of judgments; current implicit feedback methods plus post-click user
interaction can achieve better prediction for all six dimensions of judgments; [Eect Size]
no information.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset