Data visualisation: Estimation of ERGM parameters with Bergm package

Network scientists analyse the structure of the networks to unearth and quantify the internal processes responsible for creating the links between the nodes. Nodes can represent computers, people, proteins, and links between them. The connections can represent friendships, or interactions.

Interactive plot of a simple network with four actors and the friendship relationships between them.

One of the methods used to analyse the relational data, such as in the networks, is Exponential Random Graph Models (ERGMs). The likelihood in the ERGM model is defined as:

\[ p(y \mid \boldsymbol\theta) = \frac{\exp\left( \sum_{i=1}^d\theta_i s_i(x) \right)}{z(\boldsymbol\theta)} \]

It is in the hands of the researcher to choose the set of \(d\) sufficient statistics \(s_i(x)\) in such a way that they will capture the internal processes responsible for the creation of the links between the nodes.

The algorithms used in the estimation of the value of the parameter vector \(\boldsymbol{\theta}\) in ERGM models calculate its approximate values. Different methods were developed, but these that gained much popularity in recent years are based on Bayesian inference. These algorithms have to deal with the doubly intractable posterior distributions and use the MCMC sampling and clever workarounds to provide a good approximation in a reasonable amount of time. The intractability comes from the fact that both normalising constant \(z(\boldsymbol{\theta})\) and the model evidence cannot be calculated precisely due to the size of the parameter space and the number of possible configurations of the networks with \(n\) nodes.

Luckily the algorithms such as Approximate Exchange Algorithm (Caimo and Friel 2011) or the Pseudo-Posterior Adjustment Algorithm (Bouranis, Friel, and Maire 2017) can overcome these obstacles and return the posterior samples of the parameters for the selected sufficient statistics. The Bergm R package (Caimo and Friel 2014) includes the implementation of these algorithms in bergm() and evidence() methods respectively.

Lazega

One of the most popular undirected didactic network datasets is called lazega and is included in the Bergm package. This dataset represents the network of legal professionals with the same power within the company and connections that express professional relationships. The dataset includes information about the nodal covariates such as the Age, Office, Practice ant other.

Interactive plot of the Lazega network (nodes can be rearranged)

In this tutorial, I will demonstrate how to choose sufficient statistics to model the network’s dependencies. These dependencies can rely on the structure of the connections or the homophily of the nodal attributes. Then, I will show how to use the value of the log.evidence obtained from the evidence() function to obtain a more economical model. Lastly, I will illustrate how to check our model’s adequacy by checking the goodness of fit diagnostics.

Modelling

Bergm package uses the sufficient statistics defined in the ergm package, a building block of the statnet project. The decision on which statistics should be included in the model is in the researchers’ hands and may require further study to obtain expert knowledge. The list of all available terms can be checked in R with the command help("ergm-terms", package="ergm"). Some sufficient statistics are designed to work only with specific types of networks such as undirected, directed or bipartite and should be chosen carefully. The statistics such as gwesp, edges or nodematch can model dependencies and clustering.

The Bergm package requires the R environment, and it uses R scripting language. However, the interactive Shiny web application has been developed (https://mrppdex.shinyapps.io/bergm-shiny-d3/). Its goal is to facilitate experimentation with networks through the pipeline, which entails data exploration, model creation, comparison and goodness-of-fit checks of the adequacy of the model.

Bergm-shiny web application’s main screen

The description above the plot of the networks contains information such as the number of nodes, number of links and whether the network is directed or undirected.

In this tutorial, I will use the evidence() function from the Bergm package to simulate the posterior distributions of the ERGM model parameters. The bergm-shiny web application, based on the shiny (Chang et al. 2020) framework, allows the practitioner to choose between the evidence() function, which is fast and produces good mixing of samples and the bergm() method, which has good accuracy and convergence but requires many more iterations of the MCMC algorithm in order to get quality samples. To choose the function used in the parameter estimation stage, the user can go to the Options tab and select the evidence function from the dropdown menu with the label Simulate using:.

The Options tab also allows the user to change the arguments of the selected estimation method, the goodness of fit (gof()) function. It also allows to turn off the summary box plots, which is a good practice during the experimentation stage.

Select the evidence function in the options

In the first attempt, we will choose the following ergm terms in our model:

Table 1: Model 1. Descriptions extracted from documentation to the package ergm
terms	description
gwesp(decay=0.2, fixed=TRUE)	geometrically weighted edge-share partners statistic is a parametric sufficient statistics which can reveal clustering.
edges	this statistic is dependent on the density of the network and is one of the most significant factors in creating a well fitted model.
nodematch(attr=“Practice,” diff=FALSE)	Adds differential or uniform homophily to the model. When the `diff` argument is set to, FALSE counts the number of dyads where the nodal covariates are the same. When the name of the covariate is specified (`attr`), it counts only the number of dyads with both actors having the same specified, nodal attribute. If the `diff` is set to FALSE this term adds p statistics to the model, where p is equal to all the levels of the selected covariate. In this example `Practice` attribute takes a binary value and it is sufficient to set `diff=FALSE`. This statistic counts the number of connected attorneys working in the same practice.
nodematch(attr=“Office,” diff=TRUE, levels=1:2)	Adds two statistics, each equal to the number of connected dyads with the same value of the `Office` covariate. Because only one node has this attribute set to `3`, we want to omit it and focus on the more frequent offices, namely `1` and `2`.
absdiff(attr=“Age,” pow=1)	Adds one network statistic equal to the sum of absolute values of differences of the values of the nodal attributes raised to the specified power. In this example it might show if the age difference is a significant factor in forming relationships.
absdiff(attr=“Years”)	As above, this statistic can determine if the time spent working in the company plays a role in forming relationships

First, (1) we choose the term and modify its arguments and then (2) we add the selected term to the formula. Once the terms are selected we can invoke the estimation method by pressing the ESTIMATE buttion in the top left part of the screen.

Once the estimation is complete, the Summary screen is activated. This tab includes information such as the acceptance rate of the Metropolis-Hastings algorithm, summary statistics of the model’s parameters and their visual representation in the form of the box plots.

The value of the optimal acceptance ratio of the Metropolis-Hastings algorithm should lie in the interval 0.2 to 0.3. The acceptance ratios from outside this interval may result in highly correlated samples. When the acceptance is too low, the resulting MCMC chain will contain flat lines due to rejecting new proposed values. A too high value will make the MCMC chain look like a random walk. In this example, the acceptance ratio is too low with the value of 0.1, resulting in poor mixing of the samples.

Illustration of exceptionally bad sampling with acceptance rate 0.02

The easiest way to adjust it and improve the quality of samples is to either increase or decrease the V.proposal parameter of the evidence function. This parameter controls the variance of samples obtained from the multivariate proposal function in the MCMC algorithm. It can be adjusted in the options tab. Empirically, the optimal value for the V.proposal argument turns out to be around 1.0. Once the estimation is invoked once again, the acceptance rate is improved with the value of 0.22.

Improved acceptance rate after modification of the V.proposal argument of the evidence() function

The inspection of the Plots tab shows that the parameter’s chains have autocorrelation close to 0 for lags around 60, and the distributions of samples are unimodal.

Autocorrelation of the parameter samples

The number of sufficient statistics in the ergm term might be excessive, and some of them may not play an important role in creating the observed network. The evidence() function uses either Chib and Jeliazkov’s (CJ) or Power posterior (PP) methods to estimate the marginal likelihoods and returns the log.evidence of the model that can be used to compare competing models. The web application allows checking the evidence with selected terms toggled on and off. To select the terms, the user has to click them in the grey box with the formula on the top of the screen. If the term is selected, it will change its colour.

Once the terms are selected, we can compare the models with and without them by pressing the COMPARE MODELS button. After the estimation is finished, the Model comparison tab is activated, and the results can be sorted by the value of the log.evidence. This summary allows us to determine which terms do harm the model and their removal is beneficial.

Log evidence of different models. The red terms harm the model, blue is the baseline and the green terms are beneficial in the model

In this example, we can see that the value of the log evidence of our model with all terms is equal to -295.6970. Removal of the absdiff(attr="Age", pow=1) improves the model the most. That indicates that the model without this term performs better. The removal of the edges term significantly lowers the log-likelihood and therefore damages the model. Because the values of the parameters might be correlated, a good practice is to remove only the worst-performing term and repeat the model comparison step.

To remove the term from the formula, click the checkbox next to it and press the REMOVE FROM FORMULA button.

After removing the worst performer, we select all the terms once again and execute the model comparison to see how the log evidence values for the new model change. It turns out that the absolute difference of the Years nodal attribute is damaging the model, and we want to remove it as well. After repeating the same steps as above to remove absdiff(attr="Years", pow=1) from the formula, we obtain the parsimonious (economic) model where removing any other sufficient statistics would harm the model.

Model 2 Further improvement of the model is impossible, and we should press the Estimate button again to estimate the parsimonious model’s parameters before the goodness of fit checks.

Summary statistics of the parsmonious model

Once the estimation is complete, the next step is to perform the goodness-of-fit test. It can be accomplished by pressing the GOF button.

NOTE: Model comparison does not update the latest model. Estimations (ESTIMATE button) have to be performed before the godness-of-fit tests.

Goodness-of-fit plots of the parsimonous model

The goodness of fit tests shows that the minimum geodesic distance and the edge-wise shared partners’ distributions of the networks simulated from our model have a good fit, meaning the chosen set of sufficient statistics captures the centrality measures and the clustering. Degree distribution, however, is not well fitted for degrees lower than three and might require further investigation.

The posteriors of the model parameters are well defined and have most of their masses less or greater than zero. The results show strong evidence of clustering within the network, with both Practice and Office nodal attributes increasing the chances of forming the connection between the co-workers.

Conclusion

The Bergm R package provides the users with the black box implementation of its AEA and pseudo-posterior correction algorithms. It allows quick inference of the ERGM model parameters. The familiarity with the most frequently used sufficient statistics is required to analyse the networks and get insight into the processes responsible for creating the links between the nodes. The bergm-shiny web application provides an interface to visually inspect the network dataset, select and configure sufficient statistics, compare the models and check their adequacy. It allows to quickly experiment with the dataset and get visually appealing results in a shorter amount of time.

Bouranis, Lampros, Nial Friel, and Florian Maire. 2017. “Efficient Bayesian Inference for Exponential Random Graph Models by Correcting the Pseudo-Posterior Distribution.” Social Networks 50: 98–108.

Caimo, Alberto, and Nial Friel. 2011. “Bayesian Inference for Exponential Random Graph Models.” Social Networks 33 (1): 41–55.

———. 2014. “Bergm: Bayesian Exponential Random Graphs in R.” Journal of Statistical Software 61 (2): 1–25. http://www.jstatsoft.org/v61/i02/.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.

Estimation of ERGM parameters with Bergm package

Lazega

Modelling

Conclusion

References