为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

云计算时代的社交网络平台和技术_谷歌中国_张智威

2012-03-07 50页 ppt 7MB 20阅读

用户头像

is_860711

暂无简介

举报
云计算时代的社交网络平台和技术_谷歌中国_张智威null云计算时代的社交网络 平台和技术 云计算时代的社交网络 平台和技术 张智威 副院长, 研究院, 谷歌中国 教授, 电机工程系, 加州大学China Opportunity China & US in 2006-07China Opportunity China & US in 2006-07180 million (↑ 25%)208 million (↑ 3%)60 million (↑ 90%)60 million (↑ 29%)500 million180...
云计算时代的社交网络平台和技术_谷歌中国_张智威
null云计算时代的社交网络 平台和技术 云计算时代的社交网络 平台和技术 张智威 副院长, 研究院, 谷歌中国 教授, 电机工程系, 加州大学China Opportunity China & US in 2006-07China Opportunity China & US in 2006-07180 million (↑ 25%)208 million (↑ 3%)60 million (↑ 90%)60 million (↑ 29%)500 million180 million600 kEngineeringGraduatesMobile PhonesBroadband UsersInternetPopulationChinaU.S.72 k72000Google ChinaGoogle ChinaSize (~700) 200 engineers 400 other employees Almost 100 interns Locations Beijing (2005) Taipei (2006) Shanghai (2007) Organizing the World’s Information, SociallyOrganizing the World’s Information, Socially社区平台 (Social Platform) 云运算 (Cloud Computing) 结论与前瞻 (Concluding Remarks) Web 1.0Web 1.0.htm.htm.htm.jpg.jpg.doc.htm.msg.htm.htmWeb with People (2.0)Web with People (2.0).htm.jpg.doc.xls.msg+ Social Platforms+ Social Platforms.htm.jpg.doc.xls.msgApp (Gadget)App (Gadget)nullnullnullnull开放社区平台开放社区平台nullnullnullnull开放社区平台开放社区平台社区平台nullnull开放社区平台开放社区平台社区平台nullSocial GraphSocial GraphnullWhat Users Want?What Users Want?People care about other people care about people they know connect to people they do not know Discover interesting information based on other people about who other people are about what other people are doingInformation Overflow ChallengeInformation Overflow ChallengeToo many people, too many choices of forums and apps “I soon need to hire a full-time to manage my online social networks” Desiring a Social Network Recommendation SystemRecommendation SystemRecommendation SystemFriend Recommendation Community/Forum Recommendation Application Suggestion Ads Matching Organizing the World’s Information, SociallyOrganizing the World’s Information, Socially社区平台 (Social Platform) 云运算 (Cloud Computing) 结论与前瞻 (Concluding Remarks) nullpicture source: http://www.sis.pitt.edu (1)数据在云端 不怕丢失 不必备份 (2)软件在云端 不必下载 自动升级 (3)无所不在的云计算 任何设备 登录后就是你的 (4)无限强大的云计算 无限空间 无限速度 业界趋势:云计算时代的到来null互联网搜索:云计算的例子1. 用户输入查询关键字Cloud Computing2. 分布式预处理数据以便为搜索提供服务: Google Infrastructure (thousands of commodity servers around the world) MapReduce for mass data processing Google File System3. 返回搜索结果Collaborative FilteringCollaborative FilteringGiven a matrix that “encodes” datanullGiven a matrix that “encodes” dataMany applications (collaborative filtering): User – Community User – User Ads – User Ads – Community etc.UsersCommunitiesCollaborative Filtering (CF) [Breese, Heckerman and Kadie 1998]Collaborative Filtering (CF) [Breese, Heckerman and Kadie 1998]Memory-based Given user u, find “similar” users (k nearest neighbors) Bought similar items, saw similar movies, similar profiles, etc. Different similarity measures yield different techniques Make predictions based on the preferences of these “similar” users Model-based Build a model of relationship between subject matters Make predictions based on the constructed model Memory-Based Model [Goldbert et al. 1992; Resnik et al. 1994; Konstant et al. 1997] Memory-Based Model [Goldbert et al. 1992; Resnik et al. 1994; Konstant et al. 1997] Pros Simplicity, avoid model-building stage Cons Memory and Time consuming, uses the entire database every time to make a prediction Cannot make prediction if the user has no items in common with other users Model-Based Model [Breese et al. 1998; Hoffman 1999; Blei et al. 2004] Model-Based Model [Breese et al. 1998; Hoffman 1999; Blei et al. 2004] Pros Scalability, model is much smaller than the actual dataset Faster prediction, query the model instead of the entire dataset Cons Model-building takes timeAlgorithm Selection CriteriaAlgorithm Selection CriteriaNear-real-time Recommendation Scalable Training Incremental Training is Desirable Can deal with data scarcity Cloud Computing! Model-based Prior WorkModel-based Prior WorkLatent Semantic Analysis (LSA) Probabilistic LSA (PLSA) Latent Dirichlet Allocation (LDA)Latent Semantic Analysis (LSA) [Deerwester et al. 1990]Latent Semantic Analysis (LSA) [Deerwester et al. 1990]Map high-dimensional count vectors to lower dimensional representation called latent semantic space By SVD decomposition: A = U ∑ VTA = Word-document co-occurrence matrix Uij = How likely word i belongs to topic j ∑jj = How significant topic j is VijT= How likely topic i belongs to doc j Latent Semantic Analysis (cont.)Latent Semantic Analysis (cont.)LSA keeps k-largest singular values Low-rank approximation to the original matrix Save space, de-noisified and reduce sparsity Make recommendations using  Word-word similarity:  ÂT Doc-doc similarity: ÂT  Word-doc relationship: ÂProbabilistic Latent Semantic Analysis (PLSA) [Hoffman 1999; Hoffman 2004]Probabilistic Latent Semantic Analysis (PLSA) [Hoffman 1999; Hoffman 2004]Document is viewed as a bag of words A latent semantic layer is constructed in between documents and words P(w, d) = P(d) P(w|d) = P(d)∑zP(w|z)P(z|d) Probability delivers explicit meaning P(w|w), P(d|d), P(d, w) Model learning via EM algorithmPLSA extensionsPLSA extensionsPHITS [Cohn & Chang 2000] Model document-citation co-occurrence A linear combination of PLSA and PHITS [Cohn & Hoffmann 2001] Model contents (words) and inter-connectivity of documents LDA [Blei et al. 2003] Provide a complete generative model with Dirichlet prior AT [Griffiths & Steyvers 2004] Include authorship information Document is categorized by authors and topics ART [McCallum 2004] Include email recipient as additional information Email is categorized by author, recipients and topicsCombinational Collaborative Filtering (CCF)Combinational Collaborative Filtering (CCF)Fuse multiple information Alleviate the information sparsity problem Hybrid training scheme Gibbs sampling as initializations for EM algorithm Parallelization Achieve linear speedup with the number of machinesNotationsNotationsGiven a collection of co-occurrence data Community: C = {c1, c2, …, cN} User: U = {u1, u2, …, uM} Description: D = {d1, d2, …, dV} Latent aspect: Z = {z1, z2, …, zK} Models Baseline models Community-User (C-U) model Community-Description (C-D) model CCF: Combinational Collaborative Filtering Combines both baseline modelsBaseline ModelsBaseline Models Community-User (C-U) modelCommunity-Description (C-D) model Community is viewed as a bag of users c and u are rendered conditionally independent by introducing z Generative process, for each user u 1. A community c is chosen uniformly 2. A topic z is selected from P(z|c) 3. A user u is generated from P(u|z) Community is viewed as a bag of words c and d are rendered conditionally independent by introducing z Generative process, for each word d 1. A community c is chosen uniformly 2. A topic z is selected from P(z|c) 3. A word d is generated from P(d|z)Baseline Models (cont.)Baseline Models (cont.) Community-User (C-U) modelCommunity-Description (C-D) model Pros 1. Personalized community suggestion Cons 1. C-U matrix is sparse, may suffer from information sparsity problem 2. Cannot take advantage of content similarity between communities Pros 1. Cluster communities based on community content (description words) Cons 1. No personalized recommendation 2. Do not consider the overlapped users between communitiesCCF ModelCCF Model Combinational Collaborative Filtering (CCF) model CCF combines both baseline models A community is viewed as - a bag of users AND a bag of words By adding C-U, CCF can perform personalized recommendation which C-D alone cannot By adding C-D, CCF can perform better personalized recommendation than C-U alone which may suffer from sparsity Things CCF can do that C-U and C-D cannot - P(d|u), relate user to word - Useful for user targeting ads Algorithm RequirementsAlgorithm RequirementsNear-real-time Recommendation Scalable Training Incremental Training is Desirable Parallelizing CCF Parallelizing CCF Details omittednullpicture source: http://www.sis.pitt.edu (1)数据在云端 不怕丢失 不必备份 (2)软件在云端 不必下载 自动升级 (3)无所不在的云计算 任何设备 登录后就是你的 (4)无限强大的云计算 无限空间 无限速度 业界趋势:云计算时代的到来Experiments on Orkut DatasetExperiments on Orkut DatasetData description Collected on July 26, 2007 Two types of data were extracted Community-user, community-description 312,385 users 109,987 communities 191,034 unique English words Community recommendation Community similarity/clustering User similarity Speedup Community RecommendationCommunity RecommendationEvaluation Method No ground-truth, no user clicks available Leave-one-out: randomly delete one community for each user Whether the deleted community can be recovered Evaluation metric Precision and Recall ResultsResultsObservations: CCF outperforms C-U For top20, precision/recall of CCF are twice higher than those of C-U The more communities a user has joined, the better CCF/C-U can predict Runtime SpeedupRuntime Speedup The Orkut dataset enjoys a linear speedup when the number of machines is up to 100 Reduces the training time from one day to less than 14 minutes But, what makes the speedup slow down after 100 machines?Runtime Speedup (cont.)Runtime Speedup (cont.)Training time consists of two parts: Computation time (Comp) Communication time (Comm)CCF Summary CCF Summary Combinational Collaborative Filtering Fuse bags of words and bags of users information Hybrid training provides better initializations for EM rather than random seeding Parallelize to handle large-scale datasetsChina’s Contributions on/to Cloud ComputingChina’s Contributions on/to Cloud ComputingParallel CCF Parallel SVMs (Kernel Machines) Parallel SVD Parallel Spectral Clustering Parallel Expectation Maximization Parallel Association Mining Parallel LDA Speeding up SVMs [NIPS 2007]Speeding up SVMs [NIPS 2007]Approximate Matrix Factorization Parallelization Open source @ code.google.com/p/psvm 350+ downloads since December 07 A task that takes 7 days on 1 machine takes 1 hours on 500 machines Incomplete Cholesky Factorization (ICF)Incomplete Cholesky Factorization (ICF)n x nn x pp x np << n  Conserve StorageMatrix ProductMatrix Product=p x nn x pp x pOrganizing the World’s Information, SociallyOrganizing the World’s Information, Socially社区平台 (Social Platform) 云运算 (Cloud Computing) 结论与前瞻 (Concluding Remarks) Web With PeopleWeb With People.htm.htm.htm.jpg.jpg.doc.xls.msg.msg.htmWhat Next for Web Search?What Next for Web Search? Personalization Return query results considering personal preferences Example: Disambiguate synonym like fuji Oops: several tried, the problem is hard Training data difficult to collect enough (for collaborative filtering) Computational intensive to support personalization (e.g., for personalizing page rank) User profile may be incomplete, erroneous 个人搜索 智能搜索个人搜索 智能搜索搜索“富士” 可返回 富士山 富士苹果 富士相机nullnullnullnullOrganizing World’s Information , SociallyOrganizing World’s Information , SociallyWeb is a Collection of Documents and People Recommendation is a Personalized, Push Model of Search Collaborative Filtering Requires Dense Information to be Effective Cloud Computing is Essential ReferencesReferences[1] Alexa internet. http://www.alexa.com/. [2] D. M. Blei and M. I. Jordan. Variational methods for the dirichlet process. In Proc. of the 21st international conference on Machine learning, pages 373-380, 2004. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, 2003. [4] D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the Seventeenth International Conference on Machine Learning, pages 167-174, 2000. [5] D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, pages 430-436, 2001. [6] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990. [7] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977. [8] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern recognition and Machine Intelligence, 6:721-741, 1984. [9] T. Hofmann. Probabilistic latent semantic indexing. In Proc. of Uncertainty in Arti cial Intelligence, pages 289-296, 1999. [10] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information System, 22(1):89-115, 2004. [11] A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical report, Computer Science, University of Massachusetts Amherst, 2004. [12] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In Advances in Neural Information Processing Systems 20, 2007. [13] M. Ramoni, P. Sebastiani, and P. Cohen. Bayesian clustering by dynamics. Machine Learning, 47(1):91-121, 2002. References (cont.)References (cont.)[14] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative ltering. In Proc. Of the 24th international conference on Machine learning, pages 791-798, 2007. [15] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In Proc. of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 678-684, 2005. [16] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griths. Probabilistic author-topic models for information discovery. In Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306-315, 2004. [17] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR), 3:583-617, 2002. [18] T. Zhang and V. S. Iyengar. Recommender systems using linear classi ers. Journal of Machine Learning Research, 2:313-334, 2002. [19] S. Zhong and J. Ghosh. Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS), 8:374-384, 2005. [20] L. Admic and E. Adar. How to search a social network. 2004 [21] T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, pages 5228-5235, 2004. [22] H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering. Communitcations of the ACM, 3:63-65, 1997. [23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD Rec., 22:207-116, 1993. [24] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998. [25] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004.References (cont.)References (cont.)[26] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001. [27] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004. [28] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001. [29] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proceedings of the 3rd SIAM International Conference on Data Mining, 2003. [30] D. Goldbberg, D. Nichols, B. Oki and D. Terry. Using collaborative filtering to weave an information tapestry. Communication of ACM 35, 12:61-70, 1992. [31] P. Resnik, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for aollaborative filtering of netnews. In Proceedings of the ACM, Conference on Computer Supported Cooperative Work. Pages 175-186, 1994. [32] J. Konstan, et al. Grouplens: Applying collaborative filtering to usenet news. Communication ofACM 40, 3:77-87, 1997. [33] U. Shardanand and P. Maes. Social information filtering: Algorithms for automating “word of mouth”. In Proceedings of ACM CHI, 1:210-217, 1995. [34] G. Kinden, B. Smith and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7:76-80, 2003. [35] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning Journal 42, 1:177-196, 2001. [36] T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of International Joint Conference in Artificial Intelligence, 1999. [37] http://www.cs.carleton.edu/cs_comps/0607/recommend/recommender/collaborativefiltering.html [38] E. Y. Chang, et. al., Parallelizing Support Vector Machines on Distributed Machines, NIPS, 2007.
/
本文档为【云计算时代的社交网络平台和技术_谷歌中国_张智威】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索