Sunday, January 4, 2015

Network Analysis of Collaborations among Scientists Working on High-Energy Theory using gephi

I obtained collaboration data among scientists working on high energy theory online (link). Here lots of such data sets are available.I chose collaboration data set (Search for “High-energy theory collaborations”). I used gephi (version 0.8.2) to analyze this data.
It is a weighted undirected network. Here each node indicates an author. If two or more scientists co-author a publication they form a clique (they all are connected with each other). Let’s say, paper ‘A’ has only two authors, whereas paper ‘B’ has four co-authors. So there is a high probability that authors of paper ‘A’ know each other very well compared to authors of paper ‘B’. So we cannot give equal weights to the edges popping out of a paper with different number of co-authors.  So each edge resulting from a paper with ‘n’ co-authors will carry a weight of “1/ (n-1)”.
For example, let’s say two authors – ‘X’ and ‘Y’ – co-authored 5 papers together. First three papers have only two others. Last two papers have four and five authors respectively. First three edges will be given weight ‘1’. Each edge resulting from fourth paper will be given weight 1/3. Similarly edges resulting from fifth paper are given weight ¼.
Now cumulative weight of edge between ‘X’ and ‘Y’ is: 1+1+1+ (1/3) + (1/4) = 3.583
My analysis of the network gives following empirical results.
Nodes: 8361
Edges: 15751
Average degree can be found by clicking on run adjacent to “Average Degree” in “statistics” in gephi.
Average Degree: 3.768
Maximum degree can be found by sorting “degree” column in “Data Laboratory” in gephi. But Degree is calculated only once we run statistics to find “Average Degree”
Maximum Degree: 50
Diameter and Average Path Length can be found by clicking on run adjacent to “Network Diameter” in “statistics” in gephi.
Diameter (longest shortest path): 19
Average Path length: 7.025
Average Clustering Coefficient and Total triangles can be found by clicking on run adjacent to “Average Clustering Coefficient” in “statistics” in gephi.
Average Clustering Coefficient: 0.636
Total triangles: 13302
Modularity and Number of communities can be found by clicking on run adjacent to “Modularity” in “statistics” in gephi.
Modularity: 0.867
Number of Communities: 1382
Number of connected components can be obtained by clicking on run adjacent to “connected components” in “statistics” in gephi.
Number of Weakly Connected Components: 1332
Giant Component can be found by applying “giant component” filter under option “topology” in “filters” in gephi.
Nodes in Giant Component: 5835 (69.79%)
Edges in Giant Component: 13815 (87.71%)
Modularity in Giant Component: 0.846
Number of Communities in Giant Component: 51
From above results we can interpret that as the average path length is close to six degrees of separation, this collaboration networks exhibit similar nature as “small world networks”. 
Below you can see a visualization of the network. Here, each individual component is colored separately. You can easily see that there are two components which are significant. All other components are negligible. Compared to the first component (Giant component which is colored in red), the second one (colored in blue) is insignificant in size.
Visualization of the Network

But as this data is collected over five years in late 1990s and this represents only scientists working in field of “high energy theory”, we should try and experiment on the collaboration networks of scientists working on different domains. Only then we can come up with solid conclusions about similarity of “Small world networks” and “collaboration networks”.