62 APPENDIX
System Features
[Study Interface Element Varied] N.A.; [Other System/Context Feature Varied]
N.A.; [Study Apparatus] desktop computer; [Search Collection/Corpus] open web:
ltered Google search results (only include 10-blue links); [Ranking Algorithm] N.A.;
[Non-traditional IR System Assistance Tool] N.A.
Behavioral and Search Experience Measures
[Search Behavior Measures] click dwell time; follow-up query features after a target
click (URL/title/body); follow-up click features; prior-to-click features (baseline) [infor-
mation available before users clicking on the result]; [Instrument for Collecting Search
Behavioral Data] logged by an experimental system; [Relevance Judgment] in situ
relevance judgment; post-search relevance judgment (TREC web track style: focusing on
topical relevance); [Instrument for Collecting User Judgment] in situ relevance judg-
ment: recorded by pop-up pages and results logged by experimental system; post-search
judgment: post-search survey in judgment stage; [Search and System Performance
Measures] N.A.; [Neuro-physiological Measures] N.A.; [Instruments for Capturing
Neuro-physiological Measures] N.A.; [Oine Information Seeking Behavior] N.A.;
[Other Information Behavior] N.A.; [Data Analysis Method] Pearson’s r, Spearman’s
r, multilevel regression analysis, Prediction: gradient boosted regression trees (GBRT);
[Qualitative Analysis] N.A.; [Level of Analysis] task level; page and action level (e.g.,
prior-click and post-click actions); [Task-independent Measures] pre-search survey:
gender, age, degree, search experiences (used in multilevel regression); [Task/Session
Perception Measures] pre-search survey: task topic familiarity; post-search survey: task
diculty; [Search Experience and System Evaluation Measures] post-search survey:
satisfaction, frustration, success, total eort, helpness of the system.
Data Analysis and Results
[Statistical Test Assumption Check Reported] multi-colinearity checked and reported
by using variance ination factor (VIF); [Results] 1 in situ judgments do not exhibit clear
benets over the judgments collected without context (post-session judgment); combing
relevance or usefulness with the four alternative judgments (four aspects) improves the
correlation with user experience measures; click dwell time is able to predict some but
not all dimensions of judgments; current implicit feedback methods plus post-click user
interaction can achieve better prediction for all six dimensions of judgments; [Eect Size]
no information.