CSL709 - Network Science

COURSE INTRODUCTION

This is an advanced course that deals with the study of several complex networks. The aim will be to understand the structural properties that are commonly present across several complex networks, and their implications to the system. This course will also give you the opportunity to perform investigations on real world complex networks.

Primary Textbook
The textbook for the course is Networks, Crowds, and Markets: Reasoning about a Highly Connected World,David Easley and Jon Kleinberg. We will cover most of the chapters from this text book. This book is available on Flipkart and Amazon.

MARKS DISTRIBUTION

The major components for the course are as follows

Component	Percentage of Marks Allocated
Concepts	20%
All Lab Projects	30%
Final Project	50%

SYLLABUS

Please note: Each item presented below does not denote a class session of 50 minutes duration. They are just the list of topics that will be covered.

Sl. No.	Topic	Reference Material
1	Introduction to Graph Theory and Python
2	Analyzing Online Social Network Datasets
3	Power Law and Emergent Properties
4	Strength of Weak Ties
5	Homophily and Social Influence
6	Structural Balance
7	The Structure of the Web
8	Link Analysis and Web Search
9	Link Prediction
10	Information Cascades
11	Diffusion Behavior in Networks

PROJECTS

1	Estimating Degree Ranking Locally
	A. Rank Me Thou Shalln’t Compare Me!: We are humans and humans always compare. We measure our importance in terms of others whether it’s money, grades, profit, or loss. In complex networks like, online social networks, WWW network, collaboration networks, a node can be interested in finding its importance just by computing its ranking in the network. Many centrality measures, like degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, coreness, pagerank have been coined to measure the importance of a node based on different application context. But to get the exact ranking of a node, we need entire network. With time, size of real world complex networks is increasing exponentially. It is infeasible to completely collect, store, and process these networks. In the current project, we estimate the global ranking of a node using its local information. Ranking can be based on any centrality measure depending on the requirement. We focus on proposing algorithms that use local information of the node and other properties of the real world networks, like power law degree distribution, high clustering coefficient, small world phenomenon, assortativity, and so on. This will be a new direction of approximation algorithm in complex networks.
2	Privacy Preserving analysis of Social Networks
	A. The Truth or Falsehood of Zodiac Sign Compatibility: The term zodiac is derived from the Latin word “zodiacus” that translates to “the circle of animals”. The concept of zodiac signs is found commonly across western, hindu as well as the chinese astrology. Each practise of astrology advocates the strength of compatibility of the various zodiac signs. For example, you might not be surprised to hear that arranged marriages across India follow hindu astrology to check for the zodiac sign compatibility of the couple, prior to marriage. The practice of astrology, which dates back to the 2nd millennium BCE, many objections have been raised against the practise. The aim of this project is to validate or debunk the zodiac sign compatibility claims. In order to do so, we study the social network of a large group of individuals (ex: IIT ropar students) with the intention of analysing the strength of relationships depicting friendship/enmity among the individuals. Gathering such true relationship data from real world, while preserving privacy, would allow to comment on the truth/falsehood of the astrological claims. Hence, the project is twofold consisting of data collection and social network analysis.
	B. Secure Social Network Analysis: Social network analysis as a technique has been applied to a diverse set of fields, including, organizational behavior, sociology, economics and biology. However, for sensitive networks such as hate networks, trust networks and sexual networks, these techniques have been sparsely used. This is majorly attributed to the unavailability of network data. Anonymization is the most commonly used technique for performing privacy preserving network analysis. The process involves the presence of a trusted third party, who is aware of the complete network, and releases a sanitized version of it. Literature points to several drawbacks that the technique suffers from. The current project looks at an alternative solution, in which, the desired analysis can be performed in a distributed setting, such that : (a) no central third party is required; (b) the topology of the underlying network is kept hidden. The key is to design data oblivious algorithms for securely performing various social network analysis methods. It then can be translated to designing secure multiparty computation protocols to achieve the same.
3	Understanding knowledge Building Dynamics
	A. Predicting the Completeness of a given Wikipedia Article This project is about analyzing the publicly available data dump of Wikipedia. We already have past literature that looks into the completeness of a crowdsourced enumeration query where different users in the crowd give an answer to a query that has answer in the form of a list. For example, I ask 10 people to list the names of restaurants in Ropar. I keep track of the names that these people give me and keep updating a list of unique names. The question is, if I have no knowledge about Ropar, when should I stop taking answers from these people to be sure that the list is, say, 80 percent complete. The proposed project is about exploiting a similar concept to measure the completeness of wikipedia articles by analyzing the frequencies of edits that happen in the articles.
	B. Analysis of Crowdsourced Portals in the Knowledge Building Context : 7.9 billion pageviews, 3.7 million question asked, 4.6 million answers submitted…, these are the statistics of one of the world's most famous technical question and answer (Q/A) site named as Stack Exchange. Q/A sites are getting famous day by day. People come there not only to get technical knowledge but to seek advice, satisfy their curiosity for just about anything in life and collect different opinions. It has become a great platform for people to come and share their knowledge with each other. Some such platforms are Stack Exchange communities, Quora and Wikipedia. These websites provide us with huge amount of data dump which is open for analysis. This project is about analyzing this data and answer various questions related to knowledge building, crowdsourcing, user interaction etc.

TOPICS COVERED

S. No.	Date-Time	Summary of Topics Covered	Lecture notes/References
1	8 Aug	We had a brief introduction to the topic of social networks and discussed some recent research ideas pursued in the topic. We also discussed the privacy/security issues associated with analyzing social network data and their possible solutions.
2	9 Aug	This was majorly a doubt clearing session where we discussed a few doubts with respect to the NPTEL videos that was included as a part of the syllabus. We also discussed the emergence of connectivity in a random graph.
3	18 Aug (09:00-12:35)	We looked at the notion of signed graphs, where we studied the possibility of positive and negative edges. We also observed the structural characteristics of such a graph and defined the stability of such a graph. We looked at the stability of a complete signed graph and drew a comparison to any given signed graph.	Chapter 3 NCM
4	19 Aug (14:00-17:35)	We discussed on the possible projects that one could take up during the course. Assignment of students to their PhD mentors was also done. Focusing on one of the projects, we discussed the question of predicting the graph properties by observing only its local properties. The motivation for the same was also discussed. With this question in mind, we brainstormed about the possible ways of predicting the degree rank of a node by observing its local properties.	Project
5	1 Sept (09:00-12:35)	As an introduction to the Pagerank algorithm, we first began by understanding the notion of associating importance of a node to the sum of the importances of its friends (nodes with in-links to it). In this regard we defined a stable assignment as the distribution of strength of each node such that the value for a node is precisely equal to the strengths of its neighbours. We then observed through examples that a repeated improvement technique of redistributing strengths converges to the stable assignment. We also looked at the repeated application of a matrix on a random vector converges in the direction of the eigenvector with the highest eigenvalue.	Chapter 14 NCM
6	2 Sept (14:00-17:35)	We continued the discussion on Pagerank by proving the equivalence of the Random walk method and the repeated application of the Markov matrix that captures the transitions in the considered graph. We also observed that the highest eigenvalue for a Markov matrix is 1 and that it is in fact unique.	Chapter 14 NCM
7	8 Sept (09:30-13:30)	Quiz-1 was conducted on all the topics covered so far as well as on the video lectures that were assigned. Post quiz, we discussed on the notion of a distributed sensitive social network (DSSN). We looked at several examples of a DSSN. We addressed the question of analyzing a DSSN without compromising on the private data that it houses.	Quiz Project - 1
8	10 Sept (14:00-17:30)	We began with a brief overview of the vast subject of cryptography. Then, we had a brainstorming session on the possible applications of tools and techniques of cryptography to social computing. We listed out several applications and possible questions on these lines.	Project - 2
9	16 Sept (09:30-13:30)	The class contained a mix of some of the important concepts in the domain of network science. We discussed concepts like normal distribution versus power law distribution, BA Model, Random network (i.e. G(n,p)) and central limit theorem. We also discussed that G(n,p) model follows normal distribution whereas BA model follows power law for degree distribution. We further discussed the significance of these concepts for real-world situations.
10	22 Sept (09:30-13:30)	We discussed the topic of Crowdsourced knowledge building and brainstormed about how the current set of tools and crowdsourced portals can help us in understanding the way we build knowledge in general. In particular, we examined the existing work done using the popular Q&A website Stack Overflow and discussed furtehr possibilities	Project - 3
11	23 Sept (10:00-13:00)	We continued with the topic of Crowdsourced knowledge building and discussed how knowledge building takes place on the popular online encyclopedia Wikipedia. We also discussed the need for the existence of a Q&A facility for a knowledge repository such as Wikipedia for building a comprehensive database of knowledge.	Project - 3
12	29 Sept (09:30-13:00)	In this class, we brainstormed about some of the features of crowdsourced knowledge building portals and discussed their implications on knowledge building. In particular, we discussed Quora and Github with a perspective of understanding the knowledge building. We also discussed the networks that could be analysed on Github to understand its dynamics.	Project - 3
13	13th Oct (10:00-13:30)	This class consisted of discussions on the research projects to be carried out by the students. Some of the projects that were discussed were: finding rank of a node in a network using computation of the cliques, identifying top k nodes using H-index, XML based data representation format for knowledge building portals, privacy preservation in collaborative filtering.
14	27th Oct (10:00-13:30)	In this class, the students gave presentations on their chosen topic, for the projects to be carried out by them as part of the course. The projects that were presented were: A general KML format for capturing data from knowledge building portals, An encyclopedia containing all multimedia of a given term: Multipedia and Citations Analysis to understand knowledge building process.
15	28th Oct (10:00-13:30)	This session was conducted virtually, in which the progress in two of the ongoing projects were discussed (Sagarika and Prateek).
16	3rd Nov (10:00-13:30)	In this class, the Knowledge Markup Language (i.e. KML was discusssed). We explored the existing projects on the similar lines, viz. WaterMl, RoadML and VoiceXML.
17	4th Nov (10:00-13:30)	This session consisted of a discussion on the Annotation Project and Citation project. W3C annotation architecture was investigated. Also, the method to collect data from Google Scholar was explored.
18	10th Nov (10:00-13:30)	A brainstorming on the analysis performed on the Wikipedia data happened in this session. we discussed about Gini Coefficient as a measure to compute inequality.
19	17th Nov (10:00-13:30)	In this class, various machine learning approaches were discussed, to estimate the global rank of a node.

ASSIGNMENTS

The following papers are assigned to be reviewed by the course students. The review instructions are here

Ranking Vision Article (will be uploaded soon)

Degree Ranking using Local Information

A Faster Method to Estimate Closeness Centrality Ranking

How Does Knowledge Come By?

Specialized User Behavior and its Implications on Knowledge Building over StackExchange

Ideal Composition of a Group for Maximal Knowledge Building in Crowdsourced Environments

Privacy Preserving Distributed Analysis of Social Networks (vision)

Privacy Preserving Network Analysis of Distributed Social Networks

MPC meets SNA: A Privacy Preserving Analysis of Distributed Sensitive Social Networks (Improved Pagerank and K-Shell decomposition)

Relevant Videos

1	The Emergence of Network Science