2012-5-31 Dep. Phys., Univ. Fribourg 1
Information Filtering on
Complex Networks:
Prediction, Ranking and Recommendation
Linyuan
Lü
吕琳媛
E-mail: linyuan.lue@unifr.ch
2012-5-31 Dep. Phys., Univ. Fribourg 2
Outline
z Background & Motivation
z What is Link Predicting?
z How to Rank Influential Nodes?
z What is Personalized Recommendation?
z Real Applications
z Ongoing works
2012-5-31 Dep. Phys., Univ. Fribourg 3
Background
z Everyone can create
information
z More information are
available
Information Overload
z Organization & Reconstruction &
Prediction
Confusion
Motivation
2012-5-31 Dep. Phys., Univ. Fribourg 4
Prediction—complete partial information
Sample networks:
Prediction of existed yet unknown
links
,e.g. food webs, protein-protein
interaction networks, metabolic
networks.
Ranking—understand the
significant part based on known
information
Evolving networks
Prediction of future links,
e.g. online friendship
networks
Recommendation—
guide the information
generating
Subproblem of LP
Target User
Ask: which link will exist?—who will be friend?
Ask: who is the leadership?
Ask: who will be recommended
to user1 as friend?
2012-5-31 Dep. Phys., Univ. Fribourg 5
Why it is significant?
z Prediction——How to complete information?
z Reduce experimental costs (missing links)
Discovery of links/interactions of biological networks costs much. Instead of blindly checking all
possible links, to predict in advance and focus on the most likely existing links can sharply
reduce costs if the predictions are accurate enough.
z Online friendship recommendations (future links)
Recommend new friends to the users in web society.
z Applications in other fields
It can be applied in solving the node
classification problem in partially labeled networks,
improving recommendation for sparse systems, reconstrcucting noisy networks
z Ranking——How to identify the important parts?
z Information security: public opinions supervision, early warning and control.
z Commercial value: advertisement delivery, news spreading.
z Potentially enhance the activeness and adhesive capacity of the social network.
z Recommendation——How to find what the users like efficiently and guide the
information generating?
z E-Commercial applications
2012-5-31 Dep. Phys., Univ. Fribourg 6
CS & Physics
We are trying to explain why it works like this!
Perspectives Methodologies
2012-5-31 Dep. Phys., Univ. Fribourg 7
What is Link Prediction?
z Problem Description
z Given a network G(V, E), V is the set of nodes, E is the set of links.
z Estimating the likelihood of the existence of a link between two nodes, based on the
observed topology.
z Sort the nonexistent links in descending order according to their likelihoods, the top ones
are most likely to exist.
z Data
z Training set as known information
z Probe set for testing
z Metric-AUC (area under the receiver operating characteristic curve)
z Probability that a randomly chosen link in the probe set (PL) has higher likelihoods than
a randomly chosen nonexistent link (NL).
z Independently testing n times, and n1 PL>NL, n2 PL=NL, then
[Physica A 390 (2011) 1150]
1 20.5n nAUC
n
+=
2012-5-31 Dep. Phys., Univ. Fribourg 8
New Local Indices
z Resource Allocation (RA) [Eur. Phys. J. B 71, 623-630 (2009)]
z Node x can send some resource to y with their common neighbors
playing the role of transmitters
z Local Path (LP) [Phys. Rev. E 80, 046122 (2009)]
z where A is the adjacent matrix, is a free parameter
z Local random walk (LRW) [EPL 89, 58007 (2010)]
z the probability that after n steps the walker happen to arrived at j.
z Superposed random walk (SRW) [EPL 89, 58007 (2010)]
z Reset the initial resource to node i at each time step
( ) ( )
1
( )
RA
xy
z x y
s
k z∈Γ Γ
= ∑
∩
2 3S A Aε= +
ε
1 1 5
4 6 12xy
s = + =
( ) ( ) ( )
2 2
ji
ij ij ji
kks n n n
E E
π π= ⋅ + ⋅
1
' ( ) [ ( )]
n
ij ij
l
s n s l
=
=∑
ijπ
2012-5-31 Dep. Phys., Univ. Fribourg 9
Experimental Results
z CN: Common Neighbors
z ACT: Average Commute Time [J. Math. Chem. 12 (1993) 81]
z RWR: Random Walk with Restart [Comput. Netw. ISDN Syst. 30 (1998) 107]
z HSM: Hierarchical Structure Model [Nature 453 (2008) 98]
2012-5-31 Dep. Phys., Univ. Fribourg 10
New Network Types
z Weighted networks [EPL 89 (2010) 18001]
z Directed Networks [arXiv:1202:2709]
( ) ( )
( , ) ( , )
2
WCN
xy
z x y
w x z w z ys
α α
∈Γ ∩Γ
+= ∑
( ) ( )
( , ) ( , )
log(1 ( ))
WAA
xy
z x y
w x z w z ys
s z
α α
∈Γ ∩Γ
+= +∑
( ) ( )
( , ) ( , )
( )
WRA
xy
z x y
w x z w z ys
s z
α α
∈Γ ∩Γ
+= ∑
Consider six minimal loop-embedded subgraphs
of orders 3 and 4.
2012-5-31 Dep. Phys., Univ. Fribourg 11
{S1,S2,S3}=3-FFL, {S4}=3-Loop, {S5}=Bi-fan,{S6,S7}=Bi-parallel, {S8}=4-Loop, {S9,S10,S11,S12}=4-FFL
2012-5-31 Dep. Phys., Univ. Fribourg 12
Others
z Local naïve Bayes model (LNB) [EPL 96 (2011) 48007]
z Distinguish the different contributions to form a link from
different common neighbors.
z Classification in partially labeled networks [IJMPC 21 (2010)
813]
z Predict the labels of unlabeled nodes based on the known
network structure and labels
z Apply LP to address the sparsity problem based on the
assumption that similar nodes are more likely to be
categorized into same class.
z Application: Prediction of losing rate
2012-5-31 Dep. Phys., Univ. Fribourg 13
How to Rank Influential
Nodes?
LeaderRank
Add one ground node and N bi-directed links
1
1
( 1) ( )
N
ji
i jout
j j
A
s t s t
k
+
=
+ =∑ ( )( ) g ci i c s tS s t N= +
PageRank
if i points to j, is the number of leaders of i1ijA = outjk
Resilience to spammersEffectiveness Robustness
[PLoS One 6 (2011) e21202]
,0 ,0
1
1( 1) (1 ) [ (1 ) ] ( )out out
j j
N
ji
i jout k k
j j
A
s t c c s t
k N
δ δ
=
+ = + − − +∑
Experiments on Delicious dataset
z Ranking in directed networks
2012-5-31 Dep. Phys., Univ. Fribourg 14
A semi-local index
( )
( ) ( )
w u
Q u N w
∈Γ
= ∑
( )
( ) ( )L
u v
C v Q u
∈Γ
= ∑
N(w) is the number of the nearest and next
nearest neighbors
Centrality indices
Degree
Betweenness
Closeness
1
i
i
kD
N
= −
( )st
i
s i t st
iB σσ≠ ≠= ∑
\
1
i
ij
j V i
C
d
∈
= ∑
Small degree (4)
Large CL (14008)
Large degree (50)
Small CL (9388)
A B
A
B
[Physica A 391 (2012) 1777]
z Ranking in undirected networks
2012-5-31 Dep. Phys., Univ. Fribourg 15
z S is the set of scientists.
z P is the set of papers.
z if author i cites paper
z if i is the author of paper
p
si iout
P pa
Q
Q b
k
α
α
α∈
= ⋅∑
1p si i
i S
Q Q aα α
∈
= + ⋅∑
1iaα =
p
si iout
P pa
Q
Q b
k
α
α
α∈
= ⋅∑
1ibα =
,
0.644τ =
0.784τ =
CC=52 (38)
AP=3
CC=19 (70)
AP=158
[New J. Phys. 14 (2012) 033033]
z Ranking in bipartite networks
α
α
2012-5-31 Dep. Phys., Univ. Fribourg 16
z Paper2:
z A. Drăgulescu, Victor M.
Yakovenko, Exponential and
power-law probability
distributions of wealth and
income in the United Kingdom
and the United States,
Physica A 299 (2001) 213
z Paper1:
z G. Caldarelli, M. Marsili and
Y.-C. Zhang, A prototype
model of stock exchange,
EPL 40 (1997) 479
2012-5-31 Dep. Phys., Univ. Fribourg 17
Co-authorship network in econophysics
The size of the circle indicates the scientist’s score. The lighter color, the fewer citations of the author.
The width of the edges indicate the number of co-authorized papers.
Top-150
2012-5-31 Dep. Phys., Univ. Fribourg 18
What is Personalized
Recommendation?
z Recommender systems use the
personal information of a user (the
historical record of his activities and
possibly his profile) to uncover his habits
and to find out what the user may like.
z Recommender systems have already
been successfully applied in many e-
commerce web sites, such as
Amazon.com.
z Known information: the record of
interactions between users and objects,
the users’ profiles, the objects’
attributes, the content, the time stamps,
the user-user relationships, etc.
z Required information: whether a target
user will like an unselected object, and if
so, to what extent he/she likes it.
Basically, a personalized recommender
system should provide an ordered list of
unselected objects to every target user.
2012-5-31 Dep. Phys., Univ. Fribourg 19
Evaluation Metrics
z Accuracy
z Overall Ranking: AUC, Ranking Score, MAE,
RMSE, PCC, NDPM
z Top Recommended Objects: Precision, Recall,
F1-Measure
z Diversity
z Intra-diversity: Recommendations to a user are
diverse
z Inter-diversity: Recommendations to different
users are diverse
z Novelty
z Popularity
z Self-information
z Coverage
z Measures the percentage of objects that an
algorithm is able to recommend to users in the
system
2log ( / )U M kα α=
J. L. Herlocker et al., ACM Trans. Inf. Syst. 22 (2004) 5;
L. Lü et al., Physics Reports, doi:10.1016/j.physrep.2012.02.006
1
1( )
i
R
M
i O
N L k
ML αα= ∈
= ∑ ∑
1( ) ( , )
( 1)i
I L s o o
L L α βα β≠
= − ∑
( )
1 ijij
Q L
H
L
= −
( ) /dCOV L N N=
| |i Ti
rR
U E
α
α = −
M is the number of users
N is the number of objects
2012-5-31 Dep. Phys., Univ. Fribourg 20
Metrics for RS
z L. Lü et al., Physics Reports, doi:10.1016/j.physrep.2012.02.006
2012-5-31 Dep. Phys., Univ. Fribourg 21
Diffusion-based methods
'f Wf=
1
1 M l l
l l
a a
w
k k
α β
αβ
β =
= ∑
1
1 M l l
l l
a a
w
k k
α β
αβ
α =
= ∑
1
1
1 M l l
l l
a a
w
k k k
α β
αβ λ λ
α β
−
=
= ∑
f is the initial resource vector, W is the transfer matrix
Hybrid-PH
Accuracy
Inter-Diversity
Novelty
[T. Zhou et al., PNAS 107 (2010) 4511 ]
R
S
H
(
L
)
U
(
L
)
2012-5-31 Dep. Phys., Univ. Fribourg 22
Solving Diversity-accuracy
Dilemma via Preferential Diffusion
z At the last step (i.e., diffusing
from users to objects), the
amount of resource that an
object received is proportional
to , then the transfer matrix
is
z Where
z indicates the mean
value of over all the objects
having been collected by user
l
k εα
1
1 M l l
l
a a
w
k k
α β
αβ ε
β α
−
=
= Β∑
rk
ε
1
( )N lr r l lr rr a k k a k
ε ε
=Β = = Ε∑
( )lr ra k
εΕ
[Phys. Rev. E 83 (2011) 066119]
Accurate recommend low-degree objects
I(L)
2012-5-31 Dep. Phys., Univ. Fribourg 23
Solving the Sparsity Problem via
Information Coarse Graining
z Relevance is more important than correlation
z Negative ratings play positive role
Dense Sparse
[EPL 88 (2009) 68008]
[Physica A 390 (2011) 4486]
User1 [0,1,5]
User2 [0,5,1]
2012-5-31 Dep. Phys., Univ. Fribourg 24
Main challenges
z Diversity vs. Accuracy
z Data Sparsity
z Cold Start
z Vulnerability to Attacks
z Value of Time
z Social Filtering
z Evaluation of Recommendations
2012-5-31 Dep. Phys., Univ. Fribourg 25
Real Business!
2012-5-31 Dep. Phys., Univ. Fribourg 26
Real Applications I: Mobile Nets
z Collaborated with ZhuHai China Mobile Company (Heguang)
z 1000 target users respectively selected by LeaderRank and degree.
z Results show that LeaderRank can beat high-degree users.
2012-5-31 Dep. Phys., Univ. Fribourg 27
Book recommendation
实验组 推送人数 成功数 成功率
组1 16708 467 2.8%
组2 18415 165 0.9%
z Recommend books to users who has been inactive
for at least one month.
z Group1: Personalized
z Group2: Aggregated Analysis
Group1 Group2
Improvement 300%
2012-5-31 Dep. Phys., Univ. Fribourg 28
Real Application II: A
Recommender System in China
z Collaborated with Baifendian
Technology
Serve more than 200 e-commerce companies
www.baifendian.com
# visiting
# recommendation
# users
# sales
2012-5-31 Dep. Phys., Univ. Fribourg 29
Real Application III: Sina Blog
z Collaborated with Sina (新浪) & 智梦创科
What we can do:
i)
Recommend friends
ii)
Recommend blogs
iii)
Finding leadership for marketing
Topic-related & interested-based Social network
2012-5-31 Dep. Phys., Univ. Fribourg 30
Blog recommendations:
social-network-based & interest-based
2012-5-31 Dep. Phys., Univ. Fribourg 31
我们正在关注
信息时代多领域数据 交叉、融合及精炼
2012-5-31 Dep. Phys., Univ. Fribourg 32
Who am I?
2012-5-31 Dep. Phys., Univ. Fribourg 33
终极目标
Data 1
Data 2
Data 3
Data 4
Data 5
Data 6
分析,集成,精炼 黑匣子
复杂+复杂+复杂
不确定+不确定+不确定
1)不是所有的数据都是有价值
的(去除冗余和噪音)
2)在一个数据集上不凸显的特
征通过整合其他数据集凸现出来 黑匣子特征:◆更小的数据量
◆更优化的数据存
储格式
◆更有价值的信息
(更大的信息量)
简单
确定
清晰不清晰+不清晰+不清晰
数据的交叉融合 提供更好的服务
伟大而又艰难!
2012-5-31 Dep. Phys., Univ. Fribourg 34
交叉推荐
z 如果我们在分析名鞋库的数
据时,发现用户2和用户1
具有很强的相似性,我们就
可以把用户2在果皮网购买
的产品推荐给用户1,即使
用户1对于果皮网而言是一
个全新的用户。
用户进行交叉购物的一个示意图。
解决冷启动问题!
2012-5-31 Dep. Phys., Univ. Fribourg 35
最基本的问题
z 多个网络上用户身份识别与统一
实现匹配的重要前提:用户在不同网络上行为或兴趣的一致性!
2012-5-31 Dep. Phys., Univ. Fribourg 36
Related references
z L. Lü, T. Zhou, Link prediction in complex networks: A survey, Physica A 390, 1150 (2011).
z L. Lü, M. Medo, C. H. Yeung, Y.-C. Zhang, Z.-K. Zhang, T. Zhou, Recommender Systems, Physics Reports
(to be published). doi:10.1016/j.physrep.2012.02.006
z L. Lü, Y.-C. Zhang, C. H. Yeung, T. Zhou,Leaders in Social Networks, the delicious case, PLoS ONE 6(6): e21202 (2011).
z L. Lü, Z.-K. Zhang, T. Zhou, Zip’f Law Leads to Heaps’ Law: Analyzing Their Relation in Finite-Size Systems, PLoS ONE 5(12), e14139
(2010).
z L. Lü, D.-B. Chen, T. Zhou, Small world yields the most effective information spreading, New J. Phys. 13, 123005(2011).
z Y.-B. Zhou, L. Lü*,M. Li, Quantifying the influence of scientists and their publications: distinguishing between prestige and popularity, New J.
Phys. 14 (2012) 033033 .
z L. Lü*, W. Liu, Information filtering via preferential diffusion, Phys. Rev. E 83, 066119 (2011) .
z An Zeng, L. Lü*, Coarse Graining for Synchronization in Directed Networks, Phys. Rev. E 83, 056123 (2011).
z L. Lü, C.-H. Jin, T. Zhou, Similarity index based on local paths for link prediction of complex network, Phys. Rev. E 80, 046122 (2009).
z Z. Liu, Q.-M. Zhang, L. Lü*, T. Zhou, Link prediction in complex networks: a local naïve Bayes model, EPL 96, 48007 (2011).
z L. Lü, T. Zhou, Link Prediction in Weighted Networks: The Role of Weak Ties, EPL 89, 18001 (2010).
z W.-P. Liu, L. Lü*, Link Prediction based on Local Random walk, EPL 89, 58007 (2010).
z M.-S. Shang, L. Lü, Y.-C. Zhang, T. Zhou, Empirical analysis of web-based user-object bipartite networks, EPL 90, 48006 (2010).
z M.-S. Shang, L. Lü, W. Zeng, Y.-C. Zhang, T. Zhou, Relevance is More Significant than Correlation: Information Filtering on Sparse Data,
EPL 88, 68008 (2009).
z T. Zhou, L. Lü, Y.-C. Zhang, Predicting Missing Links via Local Information, Eur. Phys. J. B 71, 623-630 (2009).
z D. Chen, L. Lü*, M.-S. Shang, Y.-C. Zhang, T. Zhou, Identifying influential nodes in complex networks, Physica A 391(4)1777-1787 (2012).
z D. Chen, G. Cimini, M. Medo, L. Lü, Y.-C. Zhang, T. Zhou, Enhancing topology adaptation in information-sharing social networks, Phys. Rev.
E 85, 046108 (2012).
Review articles
Research articles
2012-5-31 Dep. Phys., Univ. Fribourg 37
Thank you!Thank you!
Acknowledgements to my collaborators
Information Filtering on Complex Networks: � Prediction, Ranking and Recommendation
Outline
Background
幻灯片编号 4
Why it is significant?
CS & Physics
What is Link Prediction?
New Local Indices
Experimental Results
New Network Types
幻灯片编号 11
Others
How to Rank Influential Nodes?
幻灯片编号 14
幻灯片编号 15
幻灯片编号 16
Co-authorship network in econophysics��The size of the circle indicates the scientist’s score. The lighter color, the fewer citations of the author. �The width of the edges indicate the number of co-authorized papers.
What is Personalized Recommendation?
Evaluation Metrics
Metrics for RS
Diffusion-based methods
Solving Diversity-accuracy Dilemma via Preferential Diffusion
Solving the Sparsity Problem via Information Coarse Graining
Main challenges
幻灯片编号 25
Real Applications I: Mobile Nets
Book recommendation
Real Application II: A Recommender System in China
Real Application III: Sina Blog
Blog recommendations:�social-network-based & interest-based
我们正在关注
幻灯片编号 32
终极目标
交叉推荐
最基本的问题
Related references
幻灯片编号 37