Chapter 11

Further readings

Abstract

This chapter provides readers with further readings in related areas of this book. These areas are closely related with the problems, theories, and techniques that are reviewed in the book. Readers are recommended to take the material presented in this chapter as references if they want to explore related problems from a broader perspective. Examples of the reviewed areas in this chapter include: estimation theory, data quality and trust analysis, outlier analysis and attack detection, recommender systems, surveys, and opinion polling.

Keywords

Social sensing

Further reading

Estimation theory

Data quality

Trust analysis

Outlier detection

Recommender systems

Surveys and opinions

11.1 Estimation Theory

In estimation theory, expectation maximization (EM) is a general optimization technique for finding the maximum likelihood estimation (MLE) of parameters in a statistic model where the data are “incomplete” or involve latent variables in addition to estimation parameter and observed data [93]. That is, either there are some missing value among the data, or the model can be formulated more simply by assuming the existence of some unobserved data. In many cases, the EM is used for parameter estimation of mixture distributions where the latent variable inform which mixture is active [136]. The general EM algorithm iterates between two main steps: the Expectation step (E-step) and the Maximization step (M-step) until the estimation converges (i.e., the likelihood function reaches the maximum). In the E-step, the algorithm computes the expectation of the log-likelihood function (so-called Q-function) of complete data w.r.t. the conditional distribution of the latent variables given the current settings of the parameters and the observed data. In the M-step, it re-estimates the parameters in the next iteration that maximizes the expectation of the log-likelihood function defined in the E-step. EM is frequently used for data clustering in data mining and machine learning because the collection of clusters can be modeled as one mixture distribution. For language modeling, the EM is often used to estimate parameters of a mixed model where the exact model from which the data is generated is unobservable [137]. EM has also been used in PLSA [138, 139] and community detection [140, 141]. There are also many good tutorials on EM algorithms [142144]. In this book, we showed that social sensing applications lend themselves nicely to an EM formulation because it is natural to think that each individual source speaks out different for true claims as compared to untrue ones. In other words, a source’s willingness to espouse a claims is drawn from a mixture distribution, where the possible ground truths form the mixtures. The optimal solution, in the sense of MLE, directly leads to an accurate quantification of measurement correctness as well as participant reliability.

The Cramer-Rao lower bound (CRLB) is a fundamental bound used in estimation theory to characterize the lower bound on the estimation variance of a deterministic parameter [109]. The Fisher information is defined as the second moment of the score vector of random variable and estimation parameter [111]. Intuitively, if the Fisher information is large, the distribution with the θ0 (i.e., true value) of the estimation parameter will be different and well distinguished from the distributions with a parameter that is not so close to θ0. This means we are able to estimate θ0 well (hence a small variance) based on the data. If the Fisher information is small, our estimation will be worse due to the similar reason. We reviewed the basics of CRLB and Fisher information in Chapter 3. CRLB has been used to study the performance of estimators in different applications such as range estimation [145], sinusoidal parameter estimation [146], and bearing estimation [147]. For example, Wang et al. leveraged CRLB to estimate the accuracy of time-based range estimation (TBRE) using Orthogonal frequency-division multiplexing (OFDM) [145]. Qian et al. used CRLB to show a frequency domain nonlinear least squares estimation algorithm achieved the near-optimal performance for noisy damped sinusoidal signals [146]. Wang et al. analyzed the performance bounds (i.e., CRLB) of a location-penalized MLE for bearing-only target localization [147]. One of the key properties of MLE is the asymptotic normality. This property basically states that the MLE estimator is asymptotically distributed with a normal distribution as the data sample size increases [112]. The mean of the normal distribution is the MLE of the estimation parameter and the variance is given by the CRLB of the estimation. The asymptotic normality has been recently studied in various contexts such as stochastic blockmodels [148], maximum entropy models [149], Markov jump process [150], and binary neural networks [151]. The EM scheme we reviewed in this book provides the MLE of source reliability for social sensing applications. We also reviewed an quantification approach to compute the confidence interval for source reliability estimation based on both the actual and asymptotic CRLB by leveraging the asymptotic normality of our MLE estimator.

11.2 Data Quality and Trust Analysis

Data quality and integration is a critical problem in the database communities and a number of techniques have been developed. These techniques include methods for detecting erroneous values [152, 153], entity resolution [154, 155], information extraction [156, 157], type inference [158, 159], and schema matching [160, 161]. Besides, an end-to-end data curation system (Data Tamer) has been developed to perform data cleaning and reusable transformation [162]. Direct manipulation and programming-by-demonstration (PBD) methods have been applied to specific cleaning tasks in many data cleaning applications [163165]. Supervised learning presumes the existence of labeled data for training. Because it is difficult to collect ground truthed data, many researchers have turned to crowd sourcing to label the data. There exists a significant literature in the machine learning community to improve data quality and identify low quality labelers in a multi-labeler environment. In such context, multiple non-expert sources could offer cheap but noisy labels at scale for supervised modeling. Robust techniques have been developed to improve the data quality of using noisy labels. Sheng et al. proposed a repeated labeling scheme to improve label quality by selectively acquiring multiple labels and empirically comparing several models that aggregate responses from multiple labelers [166]. Dekel et al. applied a classification technique to simulate aggregate labels and prune low-quality labelers in a crowd to improve the label quality of the training dataset [167]. However, all of the above approaches made explicit or implicit assumptions that are not appropriate in the social sensing context. For example, the work in [166] assumed labelers were known a priori and could be explicitly asked to label certain data points. The work in [167] assumed most of labelers were reliable and the simple aggregation of their labels would be enough to approximate the ground-truth. In contrast, participants in social sensing usually upload their measurements based on their own observations and the simple aggregation technique (e.g., majority voting) was shown to be inaccurate when the reliability of participant is not sufficient [37]. The MLE approach reviewed in this book addressed these challenges by intelligently casting the reliable social sensing problem into an optimization problem that can be efficiently solved by the EM scheme.

We reviewed several important trust analysis schemes developed to solve fact-finding problems in information networks (i.e., fact-finders) in Chapter 4. They normally depend on the source and claim networks that describe “who said what” to make the trust decision. In addition to those schemes, there exists a large amount of literature on trust analysis that look into attributes of the sources as well as the lexicon, syntax, and semantics of the claims to improve the analysis performance. For example, Pasternack et al. proposed a generalized fact-finding framework that incorporates a wealth of background knowledge and contextual information such as source attributes (e.g., age, educational attainment, groups), claim similarity and the uncertainty in the information extraction of claims [44]. Amin et al. designed an extended version of the MLE based fact-finding framework by explicitly considering source’s bias in their model and showed performance improvement over the state-of-the-art schemes in scenarios where source opinions are polarized [168]. Gupta et al. developed a credibility analysis scheme based on Twitter to identify credible events [169]. Their scheme explored attributes/features of both sources (e.g., number of friends, followers, status updates, profile) and claims (e.g., existence of slang words, supportive URLs, words in first/second/third person pronouns, number of named entities related to the event, sentiment analysis). Vydiswaran et al. developed a content-driven trust propagation framework that helps ascertain the veracity of free-text claims and estimate the trustworthiness of their sources. Their approach explored the evidence related with a claim, uncertainty in the quality of those evidence artifacts, and the information network structure [170]. Finally, Castillo et al. and O’Donovan et al. investigated a combination of different content, social, and behavioral features to assess the credibility of Twitter messages [171, 172].

11.3 Outlier Analysis and Attack Detection

Several previous efforts on data cleaning and outlier analysis from data mining and noise removal from statistics addressed some notion of noisy data [115, 116, 173176]. They differ in the assumption made, the modeling approach applied and the targeted objective. For example, Bayesian inference and decision tree induction techniques are applied to fill the missing values of data by predictions from their constructed model [173]. Binning and linear regression techniques are used to smooth the noisy data by either using bin means or fitting data into some linear functions [174, 175]. Clustering techniques are widely used to detect outliers by organizing similar data values into clusters and identifying the ones that fall outside the clusters as outliers [176]. Other approaches are used in statistics to estimate model parameters or filter noises from continuous data [115, 116, 177]. Random Sample Consensus (RANSAC) algorithm is a widely used robust parameter estimation algorithm can potentially deal with a large outlier contamination rate [177]. Kalman filter is an efficient reclusive filter that estimates some latent variables of a linear dynamic system from a series of noisy measurements [115]. It produces estimations of the measurements by computing a weighted average of the predicted values based on their uncertainty. Particle filters are more sophisticated filters that are based on Sequential Monte Carlo methods. They are often used to determine the distribution of a latent variable whose state space is not restricted to Gaussian distribution [116]. Our work is complementary to the above efforts. On one hand, an appropriately cleaned and outlier-removed dataset will likely result in a better estimation of our scheme. On the other hand, outliers or noises may not be completely (or even possibly) removed by the data cleaning and outlier analysis techniques mentioned above due to their own limitations (e.g., linear model assumption, continuous data assumption, known data distribution assumption, etc.). The quantifiable and confident estimation provided by our approach on both information source and observed data could actually help the data cleaning and outlier analysis tools do a better job.

In intrusion detection, one critical task is to detect (or identify) the malicious nodes (or sources) accurately and confidently. Two main kinds of detection techniques exist: signature-based detection and anomaly-based detection [176, 178]. The signature-based detection takes the predefined attack patterns (by domain experts) as signatures and monitor the node’s behavior (or network traffic) for matches to report the anomaly [176]. The anomaly-based detection builds profiles of normal node’s (or network) behavior and use the profiles to detect new patterns that have remarkable deviation [178]. For the reliable social sensing problem in our work, it is not obvious what behavior patterns the malicious (unreliable) sources will have without knowing the correctness of their measurements. Hence, there might not be an easy way to apply the intrusion techniques mentioned above to discover malicious sources for social sensing applications. Instead, given the MLE on participant reliability and the corresponding confidence interval provided by our scheme, we are able to both identify unreliable sources and quantify their reliability with certain confidence without prior knowledge of their behavior patterns.

Since people are an indispensable element in social sensing, some popular attacks originated from human (or source) interactions are interesting to investigate. Collusion attack is carried out by a group of colluded attackers who collectively perform some malicious (sometimes illegal) actions based on their agreement to defraud honest sources or obtain objective forbidden by the system. This attack could be mitigated by monitoring the interactions or relationships among colluded attackers or identifying the abnormal behavior from the group [179]. Sybil attack is another related attack carried out by a single attacker who intentionally create a large number of pseudonymous entities and use them to gain a disproportionately large influence on the system. This attack could be mitigated by certifying trust of identity assignment, increasing the cost of creating identities, limiting the resource the attacker can use to create new identities, etc. [180]. By handling reports from colluded or duplicate sources in a way that takes care of the source dependency, we will be able to address the above attacks to some extent. For example, by identifying duplicate sources, we can remove them along with their reports from the observed dataset, which is expected to improve the estimation performance. Problems become more interesting when sources are not just duplicates but actually linked through some orthogonal information network (e.g., social network). Recent work has investigated theory to characterize ones ability to identify and compensate for attacked nodes within a large scale sensor network performing target detection [181]. These principles may provide a starting point to analyze and characterize sensing over social networks.

11.4 Recommender Systems

Our work is related with a type of information filtering system called recommender systems, where the goal is usually to predict a user’s rating or preference to an item using the model built from the characteristics of the item and the behavioral pattern of the user [182]. EM has been used in either collaborative recommender systems as a clustering module [183] to mine the usage pattern of users or in a content-based recommender systems as a weighting factor estimator [184] to infer the user context. However, the reliable social sensing problem targets a different goal: we try to quantify how reliable a source is and identify whether a measured variable is true or not rather than predict how likely a user would choose one item compared to another. Moreover, users in recommender systems are commonly assumed to provide reasonably good data while the sources in social sensing are in general unreliable and the likelihood of the correctness of their measurements is unknown a priori. There appears no straightforward use of methods in the recommender systems regime for the target problem with unpredictably unreliable data. Additionally, the rating or preference we get from users in the recommender systems are sometimes subjective [185]. For example, some people may prefer Ford car to Toyota while others prefer exactly the opposite. It is hard to say who is right and who is wrong due to the fact that there is no universal ground truth on the items to be evaluated. We note that the work in this book may not be directly applicable to handle the above case due to the different assumptions made in models for truth finding. In social sensing applications, we aim to leverage the data contributed by common individuals and reconstruct the state of the physical world, where we usually do have the universal ground truth associated with the assertions that describe those physical states (e.g., a building is either on fire or not). The techniques reviewed in our book make much more sense under this assumption of social sensing applications. It enables the application to not only obtain the optimal estimation (in MLE sense) on source and information reliability, but also assess the quality of the estimation compared to the ground truth.

11.5 Surveys and Opinion Polling

Surveys and influence analysis are often subjective [186]. They tend to survey personal facts, or individual emotions and sentiments [187]. This is as opposed to assessing physical state that is external to the human (sensor). For example, a survey question may ask “Was the customer service representative knowledgeable?” or it may ask “Do you support government’s decision to increase tax?” Survey participants answer the questions with their own ideas independently, and the responses are often private [188]. Source dependency is not the main issue in these studies [189]. In contrast, in this book, it is not our goal to determine what individuals feel, think, or support, or to extract who is influential, popular, or trending. Instead of assessing humans’ own beliefs, opinions, popularity, or influence, we focus on applications concerned with the observation and state estimation of an external environment. That external state has a unique ground truth that is independent of human beliefs. Humans act merely as sensors of that state. There is therefore an objective and unambiguous notion of sensing error, leading to a clear optimization problem whose goal is to reconstruct ground truth with minimum error from reported human observations.

The work reviewed in this book should not be confused with work from sociology and statistics on opinion polling, opinion sampling, influence analysis, and surveys. Opinion polling and sampling are usually carefully designed and engineered by the experts to create appropriate questionnaires and select representative participants [190, 191]. These are often controlled experiments, and the provenance of the information is also controllable [192]. Moreover, data cleaning is domain specific and semantic knowledge is required [193]. In contrast, in the reliable sensing problem studied in this book, the data collection is open to all. We assume no control over both the participants (data sources) and the measurements in their reports. The reliability of sources and their data provenance is usually unknown to the applications. The approaches reviewed in this book are designed to be general and not require domain specific knowledge to clean the data.

References

[37] Pasternack J, Roth D. Knowing what to believe (when you already know something). In: International Conference on Computational Linguistics (COLING); 2010.

[44] Pasternack J, Roth D. Making better informed trust decisions with generalized fact-finding. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Three, ser. IJCAI’11; AAAI Press; 2011:2324–2329. [Online]. Available: http://dx.doi.org/10.5591/978-1-57735-516-8/IJCAI11-387.

[93] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of The Royal Statistical Society, Series B. 1977;39(1):1–38.

[109] Cramer H. Mathematical Methods of Statistics. Princeton Univ. Press; 1946.

[111] Hogg RV, Craig AT. Introduction to mathematical statistics. Prentice Hall; 1995.

[112] Casella G, Berger R. Statistical Inference. Duxbury Press; 2002.

[115] Kalman RE. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME Journal of Basic Engineering. 1960;82(Series D):35–45. [Online]. Available: http://www.cs.unc.edu/~welch/kalman/media/pdf/Kalman1960.pdf.

[116] Doucet A, De Freitas N, Gordon N, eds. Sequential Monte Carlo methods in practice. Springer; 2001. [Online]. Available: http://www.worldcatlibraries.org/wcpa/top3mset/839aaf32b6957a10a19afeb4da09e526.html.

[136] Shental N, Bar-Hillel A, Hertz T, Weinshall D. Computing gaussian mixture models with EM using equivalence constraints. Advances in neural information processing systems. 2004;16(8):465–472.

[137] Samdani R, Chang M-W, Roth D. Unified expectation maximization. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics; 2012:688–698.

[138] Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; New York, NY, USA: ACM; 1999:50–57. ser. SIGIR ’99. [Online]. Available: http://doi.acm.org/10.1145/312624.312649.

[139] Masseroli M, Chicco D, Pinoli P. Probabilistic latent semantic analysis for prediction of gene ontology annotations. In: Neural Networks (IJCNN), The 2012 International Joint Conference on; IEEE; 2012:1–8.

[140] Ball B, Karrer B, Newman M. Efficient and principled method for detecting communities in networks. Physical Review E. 2011;vol. 84(no. 3):036103.

[141] S. Bhattacharyya and P. J. Bickel, “Community detection in networks using graph distance”, arXiv, preprint arXiv:1401.3915, 2014.

[142] Bilmes J. A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley; 1997 ICSI-TR-97-021.

[143] McLachlan GJ, Krishnan T. The EM algorithm and extensions. John Wiley and Sons, Inc.; 1997.

[144] Zhai C. A note on the expectation maximization (EM) algorithm. Department of Computer Science, University of Illinois at Urbana Champaign; 2007.

[145] Wang D, Fattouche M. OFDM transmission for time-based range estimation. Signal Processing Letters, IEEE. 2010;17(6):571–574.

[146] Qian F, Leung S, Zhu Y, Wong W, Pao D, Lau W. Damped sinusoidal signals parameter estimation in frequency domain. Signal Processing. 2012;92(2):381–391.

[147] Wang Z, Luo J-A, Zhang X-P. A novel location-penalized maximum likelihood estimator for bearing-only target localization. Signal Processing, IEEE Transactions on. 2012;60(12):6166–6181.

[148] Bickel P, Choi D, Chang X, Zhang H, et al. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics. 2013;41(4):1922–1943.

[149] T. Yan, Y. Zhao, and H. Qin, “Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters”, arXiv, preprint arXiv:1308.1768, 2013.

[150] Kremer A, Weißbach R. “Asymptotic normality for discretely observed Markov jump processes with an absorbing state”, Statistics &amp. Probability Letters. 2014;90:136–139.

[151] H. D. Nguyen and I. A. Wood, “Asymptotic normality of the maximum pseudolikelihood estimator for fully visible Boltzmann machines”, arXiv, preprint arXiv:1409.8047, 2014.

[152] Hellerstein JM. Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE); 2008.

[153] Hodge VJ, Austin J. A survey of outlier detection methodologies. Artificial Intelligence Review. 2004;22(no. 2):85–126.

[154] Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: A survey. Knowledge and Data Engineering, IEEE Transactions on. 2007;19(no. 1):1–16.

[155] Köpcke H, Rahm E. Frameworks for entity matching: A comparison. Data & Knowledge Engineering. 2010;69(2):197–210.

[156] Arasu A, Garcia-Molina H. Extracting structured data from web pages. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data; ACM; 2003:337–348.

[157] Soderland S. Learning information extraction rules for semi-structured and free text. Machine learning. 1999;34(1–3):233–272.

[158] Fisher K, Gruber R. Pads: a domain-specific language for processing ad hoc data. In: ACM; 295–304. ACM Sigplan Notices. 2005;vol. 40 no. 6.

[159] Mandelbaum Y, Fisher K, Walker D, Fernandez M, Gleyzer A. Pads/ml: A functional data description language. In: ACM; 77–83. ACM SIGPLAN Notices. 2007;vol. 42 no. 1.

[160] Haas LM, Hernández MA, Ho H, Popa L, Roth M. Clio grows up: from research prototype to industrial tool. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data; ACM; 2005:805–810.

[161] Rahm E, Bernstein PA. A survey of approaches to automatic schema matching. the VLDB Journal. 2001;10(4):334–350.

[162] Stonebraker M, Bruckner D, Ilyas IF, Beskales G, Cherniack M, Zdonik SB, Pagan A, Xu S. Data curation at scale: The data tamer system. In: CIDR. 2013.

[163] Huynh DF, Miller RC, Karger DR. Potluck: semi-ontology alignment for casual users. In: The Semantic Web. Springer; 2007:903–910.

[164] Kandel S, Paepcke A, Hellerstein J, Heer J. Wrangler: Interactive visual specification of data transformation scripts. In: PART 5——–Proceedings of the 2011 annual conference on Human factors in computing systems; ACM; 2011:3363–3372.

[165] Lin J, Wong J, Nichols J, Cypher A, Lau TA. End-user programming of mashups with vegemite. In: Proceedings of the 14th international conference on Intelligent user interfaces; ACM; 2009:97–106.

[166] Sheng VS, Provost F, Ipeirotis PG. Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining; New York, NY, USA: ACM; 2008:614–622. ser. KDD ’08. [Online]. Available: http://doi.acm.org/10.1145/1401890.1401965.

[167] Dekel O, Shamir O. Vox populi: Collecting high-quality labels from a crowd. In: In Proceedings of the 22nd Annual Conference on Learning Theory; 2009.

[168] Amin MTA, Abdelzaher T, Wang D, Szymanski B. Crowd-sensing with polarized sources. In: Distributed Computing in Sensor Systems (DCOSS), 2014 IEEE International Conference on; IEEE; 2014:67–74.

[169] Gupta M, Zhao P, Han J. Evaluating event credibility on twitter. In: SDM. SIAM; 2012:153–164.

[170] Vydiswaran V, Zhai C, Roth D. Content-driven trust propagation framework. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining; ACM; 2011:974–982.

[171] Castillo C, Mendoza M, Poblete B. Information credibility on twitter. In: Proceedings of the 20th international conference on World wide web; ACM; 2011:675–684.

[172] O’Donovan J, Kang B, Meyer G, Hollerer T, Adalii S. Credibility in context: An analysis of feature distributions in twitter. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom); IEEE; 2012:293–301.

[173] Duda RO, Hart PE, Stork DG. Pattern Classification (2nd Edition). 2nd ed. Wiley-Interscience; Nov. 2001. [Online]. Available: http://www.worldcat.org/isbn/0471056693.

[174] Inc UT, Staff UTI. Solving Data Mining Problems Using Pattern Recognition Software with Cdrom. 1st ed. Upper Saddle River, NJ, USA: Prentice Hall PTR; 1997.

[175] Johnson RA, Wichern DW. Applied multivariate statistical analysis. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.; 2002.

[176] Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques. Third Edition Morgan Kaufman; 2011.

[177] Yaniv Z. Random sample consensus (ransac) algorithm, a generic implementation. Imaging. 2010.

[178] Whitman ME, Mattord HJ. Principles of Information Security. Boston, MA, United States: Course Technology Press; 2004.

[179] Lian Q, Zhang Z, Yang M, Zhao BY, Dai Y, Li X. An empirical study of collusion behavior in the Maze P2P file-sharing system. In: Proceedings of the 27th International Conference on Distributed Computing Systems; Washington, DC, USA: IEEE Computer Society; 2007:56. ser. ICDCS ’07.

[180] Yu H, Kaminsky M, Gibbons PB, Flaxman A. SybilGuard: defending against Sybil attacks via social networks. SIGCOMM Comput. Commun. Rev. August 2006;36:267–278.

[181] Zhang J, Blum RS. Distributed estimation in the presence of attacks for large scale sensor networks. In: Information Sciences and Systems (CISS), 2014 48th Annual Conference on; IEEE; 2014:1–6.

[182] Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering. 2005;17(6):734–749.

[183] Mustapha N, Jalali M, Jalali M. Expectation maximization clustering algorithm for user modeling in web usage mining systems. European Journal of Scientific Research. 2009;32(4):467–476.

[184] Pomerantz D, Dudek G. Context dependent movie recommendations using a hierarchical Bayesian model. In: Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence; Berlin, Heidelberg: Springer-Verlag; 2009:98–109. ser. Canadian AI ’09.

[185] Adomavicius G, Kwon Y. New recommendation techniques for multicriteria rating systems. IEEE Intelligent Systems. May 2007;vol. 22(no. 3):48–55 [Online]. Available: http://dx.doi.org/10.1109/MIS.2007.58.

[186] Cano AE, Mazumdar S, Ciravegna F. Social influence analysis in microblogging platforms–a topic-sensitive based approach. Semantic Web; 2011.

[187] Fu K-w, Chan C-h. Analyzing online sentiment to predict telephone poll results. Cyberpsychology, Behavior, and Social Networking. 2013.

[188] Toch E, Wang Y, Cranor LF. Personalization and privacy: A survey of privacy risks and remedies in personalization-based systems. User Modeling and User-Adapted Interaction. 2012;22(1–2):203–220.

[189] Blair J, Czaja RF, Blair EA. Designing surveys: A guide to decisions and procedures. SAGE Publications, Incorporated; 2013.

[190] Lax JR, Phillips JH. How should we estimate public opinion in the states?”. American Journal of Political Science. 2009;vol. 53(no. 1):107–121 [Online]. Available: http://dx.doi.org/10.1111/j.1540-5907.2008.00360.x.

[191] Splichal S. Public opinion and opinion polling: Contradictions and controversies. In: Opinion Polls and the Media: Reflecting and Shaping Public Opinion. 2012:25.

[192] Zhu J, Wang H, Zhu M, Tsou BK, Ma M. Aspect-based opinion polling from customer reviews. IEEE Trans. Affect. Comput. Jan. 2011;vol. 2(no. 1):37–49 [Online]. Available: http://dx.doi.org/10.1109/T-AFFC.2011.2.

[193] Gardner RM, Brown DL, Boice R. Using Amazon’s Mechanical Turk website to measure accuracy of body size estimation and body dissatisfaction. Body Image. 2012.


2606 “To view the full reference list for the book, click here

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset