An Hypergraph Data Model for Expert Finding in Multimedia Social Networks

Nowadays, the tremendous usage of multimedia data within Online Social Networks (OSNs) has led the born of a new generation of OSNs, called Multimedia Social Networks (MSNs). They represent particular social media networks – particularly interesting for Social Network Analysis (SNA) applications – that combine information on users, belonging to one or more social communities, together with all the multimedia contents that can be generated and used in the related environments. In this work, we present a novel expert finding technique exploiting a hypergraph-based data model for MSNs. In particular, some user ranking measures, obtained considering only particular useful hyperpaths, have been profitably used to evaluate the related expertness degree with respect to a given social topic. Several preliminary experiments on Last.fm show the effectiveness of the proposed approach, encouraging the future work in this direction.


Introduction
According to the 2018 Global Digital report [1], Internet users exceeded 4 billion people i.e., more than half of the world's population is online. The advent of Online Social Networks (OSNs) has revolutionized communicating, especially for new generations. Through social media, first of all Facebook and Instagram, young people exchange information, share photos and spread data, in real time, and such a speed of communication was unthinkable before the birth of social media.
Schneider et al. [2] define OSNs as user communities composed of people that share common interests, activities, backgrounds, and/or friendships and can interact with others in numerous ways, directly or by means of posted information. In [3] authors define OSNs as a new particular type of virtual community, while in [4] authors associate OSNs with advanced social networking applications.
The evolution of Information and Communication Technologies has enriched OSN features, enabling users to share their interests, tastes, friendships, and behaviors by using multimedia objects, such as text, audio, video, and images. The use of such multimedia objects facilitates the interaction of the users with the OSN, since, with them, they can express with more fidelity sentiments, comments, opinions, or feelings related to posted data. Such a new kind of OSN, where users mainly share multimedia objects, is called a Multimedia Social Network (MSN) [5,6].
In this type of network, several relationships among heterogeneous entities can be used for improving classical Social Network Analysis (SNA) applications. For example, an ever-increasing number of commercial and business activities are exploiting the possibility of sharing data and information very quickly through social networks, not only for trying to sell their products but also for starting a marketing strategy to retain a number of customers always greater than potential buyers.
The potential that social media has to influence shopping online has been the subject of various sociological and marketing research, which has highlighted the fact that users use social media to exchange tips when buying a product; in particular, from numerous surveys [7] it has emerged that about 75% of millennials, i.e., people born in the new millennium, can be influenced by the posts of their Facebook friends and most of all by the posts of so-called influencers, when they must buy an online product.
In this work, we present a novel expert-finding technique within social networks exploiting a hypergraph-based data model for MSNs. In particular, the hypergraph model permits easy representation of all the described MSN relationships for supporting different SNA applications, such as finding experts, by defining novel ranking criteria. User-ranking measures, obtained considering only particular useful hyperpaths, can be profitably adopted to evaluate the expertness degree with respect to a given social topic. A hypergraph-building strategy has also been proposed to exploit data of different OSNs such as Yelp, Flickr, Twitter, and so on.
The paper is organized as follows. Section 2 discusses the related work concerning expert-finding within social networks. Section 3 describes the adopted MSN data model together with the related ranking/centrality measures, and the hypergraph-building process. Section 4 outlines the expert-finding system architecture with some implementation details. Finally, Section 5 reports some experiments using a dataset from Last.FM and provides conclusions and future work.

Related Work
The large diffusion of OSNs has led to defining new challenges and opportunities for expert-finding applications. In particular, different techniques focusing on human behavior have been investigated.
Different approaches about user activity modeling and recognition have been investigated by Jin et al. [8]. From a different perspective, the authors also analyze network traffic to infer different information about the user. Furthermore, a methodology, namely ClickStream [9], based on the analysis of user browsing activity, is modeled as a Markov model for identifying users' patterns. Another ClickStream model has been proposed in [10] by using a first-order Markov Chain. Similarly, Schneider et al. [2] investigate ClickStream data to unveil user patterns. However, the insufficient amount of ClickStream data only allows the adoption of these approaches for monitoring purposes.
Nevertheless, widespread use of OSNs have led to many security issues about "private information" that can be stolen by different types of attack, as well as spam and sibyls. For this reason, several approaches about anomaly detection have been proposed using supervised learning [11,12], unsupervised learning [13,14], statistical modeling, etc. In our opinion, it is possible to classify the anomaly-detection techniques into two categories: (i) approaches based on user behavior that analyze user actions on OSNs; and (ii) approaches based on graph topology.
Graph-based techniques rely on network topology by using different ranking measures. Social spamming on Twitter has been investigated in [15], defining a machine-learning approach based on shared URL properties. An anomaly-detection approach based on graph data structure has been proposed by [16] for identifying anomaly users. In [17] an approach based on fuzzy techniques has been proposed for identifying an anomaly in unlabeled OSNs. Furthermore, graph-traversal queries have been used for profiling researchers over the DBLP dataset [18] while in [19,20] data quality techniques have been applied to longitudinal data.
Behavior-based approaches analyze user activity on OSNs to unveil behavior patterns according to set of signatures. Wang et al. [14] developed a tool based on ClickStream for identifying fake identities in OSNs, analyzing the difference between real users and sibyls. Another approach using Principal Component Analysis has been proposed in [21] demonstrating that user normal behavior is low-dimensional along a set of latent features chosen by PCA. ClickStream is also used for sequence modeling. Indeed, Ye et al. [22] propose a Markov Chain Model to represent temporal profiles of normal behavior. Furthermore, an architecture for anomaly detection in wireless networks exploits clustering approaches. In [23], a Markov model on graph-based knowledge base built on instruction traces of the target executable has been proposed for malware detection. Two interesting approaches ( [10,14]) propose another two interesting approaches of the Markov model. In this paper, we use an approach similar to user ClickStream, defining a more complex probabilistic model based on the concept of possible worlds to detect anomalous activities [24].
To the best of our knowledge, techniques based on action sequences in user data logs are not properly explored for identifying experts in MSNs. Furthermore, the use of multimedia data is playing a key role in several applications as well as system recommendation [25] and so on. The proposed approach exploits graph-based formalisms to model normal human behavior and advanced reasoning techniques for detecting anomalies that have more similarity with the used approaches for video surveillance [26][27][28].

Modeling MSNs
In this section, we describe the proposed MSNs that are composed of heterogeneous entities and several relationships between them. In particular, we can identify the following three entities: Users, users or organizations with information about profile, interests, preference, and so on, Objects, multimedia content that can be described by low-and high-level features, and Topics, main words or phrases derived from comments or tags.
Furthermore, it is possible to define several types of relationships between the above-defined entities: for instance, a user can share a photo, provide comments or feedback and so on. The hypergraph data model has been proposed to deal with the high variety and intrinsic complexity of these relationships.
To better describe the proposed model, the following definitions have been produced. It is possible to define an incidence matrix H to represent a hypergraph whose entries are: In our model, we consider both vertices and hyperedges as abstract data types where the use of "dot notation" allows identification of their attributes; for instance, e i .time represents the time instance in which a given action has been made.

Definition 2 (Social Path). A Social Path between vertices v s 1 and v s k of a MSN is a sequence of distinct vertices (v s i ) and hyperedges (e s
1 ω(e s i ) , α being a normalizing factor. We say that a Social Path contains a vertex v h if ∃e s i : v h ∈ e s i .
The length of a given Social Path has been defined in accordance with the weight of the path to evaluate the distance between two users of a Social Network. In particular, Social Paths between two nodes can "directly" or "indirectly" connect two users because they are "friends" or because they commented on the same video. Furthermore, we choose to define the weight of a Social Path based on the weighted length of the same path because it decreases its values according to the number of steps required to reach a given user.
For this aim, we define minimum distance (d min (v k , v j )), maximum distance (d max (v k , v j )) and average distance (d avg (v k , v j )) between two vertices of an MSN the length of the shortest hyperpath, the length of the longest hyperpath and the average length of the hyperpaths between v k and v j , respectively. In a similar manner, we define the minimum distance (d min (v k , v j |v z )), maximum distance (d max (v k , v j |v z )) and average distance (d avg (v k , v j |v z )) between two vertices v k and v j , for which there exists a hyperpath containing v z . Therefore, it is possible to define a set of neighbors of a given vertex v k according to the defined distance measures.
Definition 3 (λ-Nearest Neighbors Set). Given a vertex v k ∈ V of an MSN, we define the λ-Nearest Neighbors Set of v k the subset of vertices NN λ k such that ∀v j ∈ NN λ k we have d min (v k , v j ) ≤ λ with v j ∈ U. Considering only the constrained hyperpaths containing a vertex v z , we denote with NN λ iz the set of nearest neighbors of v k such that ∀v j ∈ NN λ iz we haved min (v k , v j |v z ) ≤ λ.
In more detail, we define λ-Nearest Users Set (NNU λ ) and λ-Nearest Objects Set (NNO λ ) in the case of we consider as neighbors respectively on user or multimedia objects. These sets will be used for assigning a user an expert score according to a novel centrality measure.

Relationships
Several relationships can be established between MSN entities that can be classified into the following three categories: (i) User-to-User, (ii) similarity and (iii) User-to-Object.
Then, a formal definition has been provided for each category.
Definition 4 (User-to-User relationship). Let U ⊆ U a subset of users in an MSN, we define user-to-user relationship each hyperedge e i with the following properties: Membership and following are typical examples of User-to-User relationships.

Definition 5 (Similarity relationship).
Let v k , v j ∈ V (k = j) two vertices of the same type of a MSN, we define similarity relationship each hyperedge e i with V + e i = v k and V − e i = v j . The weight function for this relationship returns a similarity value between the two vertices.
In our vision, it is possible to define a similarity function ( f sim : V × V → R) between two users, according to their interests, profile or preferences, two objects, based on high-and low-level features, and annotation assets, based on ontologies or vocabularies. However, a given threshold θ could be chosen for generating a similarity hyperedge that is ω( e i ) ≥ θ.
Definition 6 (User-to-Object relationship). Let U ⊆ U a set of users in an MSN and O ⊆ O a set of objects, we define user to multimedia relationship each hyperedge e i with the following properties: It is easy to note from the above definition that the set V − e i can contain one or more topics in annotation, review, comment. Other examples of this category are publishing, and reaction relationship.

Hypergraph-Building
The proposed approach for MSN building is made up of three steps: (i) hypergraph structure construction; (ii) topic distribution; (iii) similarity learning. Nodes and hyperedges of the proposed model have been built according to the crawled information about users, object, and annotation. Furthermore, a Latent Dirichlet Allocation (LDA) approach [29] has been used for learning and inferring the most important topics to build user-to-object relationships. In particular, we discover topics based on the analysis of tags used in the annotation of multimedia objects, combining statistic (co-occurrence values) and semantic (general purpose or domain-specific lexical databases) information.
Indeed, different strategies [30] can be used for similarity hyperedges between users, multimedia objects, and topics.
Eventually, the hypergraph global and topic sensitivity ranking is performed with respect to the discovered topics.

Centrality Measures for Expert-Finding
The Centrality measure represents a key point in Social Network Analysis for ranking user nodes of an MSN. Specifically, the user "relevance" in a given community can be represented by centrality measures and is useful for different applications.
Despite different centrality measures proposed in the literature, in this paper, two novel centrality measures are proposed for identifying experts on a given topic or domain based on the analysis of information in the MSN. Our idea concerns the correlation of the rank of a given node with the concept of influence that can be measured by the number of user nodes that are "reachable" within a certain number of steps using any hyperpath, with respect to a social community of users, and eventually to a given topic of interest. In a similar manner to most known influence-diffusion models, the influence of a node decays with the path distance necessary to reach the other ones.
In particular, we exploit the "neighborhood" concept among users through λ-Nearest Neighbors Set in MSNs for defining centrality measures.

Definition 7 (Neighborhood Centrality).
Let v k ∈ V be a vertex of an MSN and λ a given threshold; we define the neighborhood centrality of v k as: NN λ k being the λ-Nearest Users Set of v k .
Summarizing, we define the neighborhood centrality according to the number of users that can reach it in a given hop number. It is also possible to compute local centrality measures based on a given community ( U ⊆ U ⊆ V). In this manner, centrality concerns user importance within the related community. We define user centrality as such kind of measure.
In addition, to give more importance to user-to-content relationships during the computation of distances for the user neighborhood centrality, we can apply a penalty if the considered hyperpaths contain some users; in this way, all the distances can be computed asd(v k , v j ) = d(v k , v j ) + β · N, N being the number of user vertices in the hyperpath between v i and v j and β a scaling factor. This strategy has been chosen because an expert, in our opinion, is defined according to its behavior on MSN described by published multimedia object and annotation asset.
Finally, a topic-sensitive user neighborhood centrality has been defined considering in the distance computation only of hyperpaths that contain a given topic node: Definition 8. Topic-sensitive user neighborhood centrality Given a user u k ∈ U and a subset of users U ⊆ U (u k / ∈ U) of an MSN, a topic-sensitive user neighborhood centrality function (MSNTUR) is a particular function nc(u k | U, t z ) : UxT ← [0, 1] able to associate a specific rank to the user u k with respect to the community U given the topic t z that is computed as in the following: U being a user community, u k a single user, and t z a given topic.

System Architecture
An overview of the proposed system architecture has been represented in Figure 1. As is easy to note, it is possible to identify three main modules:: Data Ingestion, Knowledge Management, and Social Network Analysis that provides expert-finding tools. In the first module, data coming from a heterogeneous OSN (such as Facebook, Twitter, LastFM, etc.) are crawled by using their own API and stored into the No-SQL columnar database Cassandra (http://cassandra.apache.org/) for properly storing a large amount of data.
The Knowledge Management module has the aim to extract information from the No-SQL database for building the MSN data model (Hypergraph-Building Module) and storing it into the HypergraphDB (http://www.hypergraphdb.org/), a No-SQL database based on hypergraph data structure. We choose this No-SQL database because it natively supports the hypergraph data structure allowing the performance of traversal queries directly on the proposed data model. Eventually, the Social Network Analysis module is composed of the Expert-Finding module relying on the HyperX (https://github.com/jinhuang/hyperx), a framework built upon Apache Spark (https://spark.apache.org/) for processing hypergraphs, to rank users using centrality with respect to a given topic and Visualization module, based on Jung API (http://jung.sourceforge.net/), to represent and provide insights about the analyzed network. The Jung framework has been chosen because it allows on one hand the design of a property graph, labelling nodes and edges, and on other hand easy support of end users in the browsing of the graph.

Experimental Results
In this section, we describe the preliminary experiments for evaluating the effectiveness of the proposed approach based on a music collection (http://carl.cs.indiana.edu/data/last.fm/), composed of a set of data extracted from Last.FM, whose details are shown in Table 1. In addition, through the help of some domain experts, we preliminary classified the songs belonging to the dataset with respect to the related musical genre (e.g., rap, pop, rock, etc.). We built an MSN network considering users, multimedia items, and topics, inferred by applying the LDA approach described in on songs' tags, as nodes and friendship, membership, annotation, and users, and multimedia similarity as edges, computed respectively according to Last.FM's neighborhood measurements and Spotalike (http://www.spotalike.com/) facilities in conjunction with Last.FM score between two songs. In particular, we use Spotalike to compute similarity score between two songs using low-level features. In Table 2 the main characteristics of the generated MSN are reported.  Figure 2 shows the average values of topic-sensitive user neighborhood centrality score for each community varying λ. We can note that these communities have a strong degree of interconnection among users: using low values of λ, we rapidly obtain that each node assumes the highest ranking value. Thus, it is easy to note in Figure 2 that the ranking value for each user assumes the same value when λ's value increases.
We compare the proposed ranking method based on neighborhood centrality, choosing λ = 2, some well-known approaches (PageRank, K-Step Markov and Topic-Sensitive Influence Mining [31]) and a human-generated ranking (representing the unique gold standard of users within pop, rap, and pop-rap communities). Specifically, we ask a group of our students to rank user expertness regarding the different communities considering number and relevance of the related comments. Table 3 shows the obtained results in terms of Kendall' Tau (τ) and Spearman's Rank Correlation (ρ) coefficients. We notice that our user topic-sensitive ranking presents the most similar behavior with respect to the human ground truth because it combines topological and semantic information for finding relevant users with respect to a given topic. In our opinion, this approach could be useful for several applications (i.e., multimedia recommendation, influence analysis, and so on) for identifying relevant users that can suggest a given item or spread out a given product.  The obtained results show the goodness of the approach in detection of experts regarding human ground truth, and encourages future work in this direction.

Conclusions
In this paper, we propose a novel expert-finding technique based on a novel hypergraph-based data model for MSN. The obtained results on a Last.FM dataset show the effectiveness of the proposed approach.
Future work will be devoted to extending experimentation of our system prototype to other multimedia social networks. In addition, other future work will be devoted to use this expert-finding methodology for supporting several applications such as multimedia recommendation, influence analysis, and so on.
Author Contributions: F.A. and G.S. conceived of the presented idea, directed the project and co-wrote the paper. G.C. developed the application prototype and, with G.S., carried out the experimental phase. All authors discussed the results and contributed to the writing of the final manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.