Tutorial on community detection in biogeographical site-species networks by bipartite modularity analysis

https://loewenecology.wordpress.com/

https://github.com/loewenecology/Tut-bipartite-modularity-analysis

There are several packages and options available to construct and analyze ecological networks, just as there are many tools for finding groups of species forming recurrent communities. Here, we will use the bipartite package (Dormann et al. 2008) to plot a basic site by species network and then apply Beckett’s (2016) DIRTLPAb+ algorithm to perform community detection by bipartite modularity analysis. Once the communities are found, we will work through some visualization options. For this demonstration, we will use phytoplankton community data from Alberta, Canada.

While detailed knowledge of phytoplankton community ecology is not a prerequisite for this tutorial, readers should come with a baseline understanding and/or interest in network/graph theory and how such ideas might be used to describe general ecological patterns and organization of assemblages. I also assume some proficiency with R, including setting up projects, installing packages, and wrangling data for subsequent visualization (including mapping spatial data) and use with statistical packages.

Objectives

1. Background

2. Building the network

3. Analyzing the network

4. Visualizing the proposed module structure

5. Comparing module structure of weighted and unweighted networks

6. References

1. Background

Ecological networks take many forms, including classic examples of plant-pollinator and host-parasite networks, where different species (or groups of species) are represented as nodes with lines between them (called edges) indicating potential species interactions. Site-species connections can also be represented this way, whereby species nodes are connected to the site nodes at which they were found to occur. Such biogeographical networks hold a wealth of spatial and ecological information, which can accessed through a range of specialized network analyses.

One important method for finding commonalities among nodes is modularity analysis, which provides unsupervised learning of community structure based solely on network topology, where groups (or modules) of similar nodes are found by maximizing the number of edges within, rather than between, them (Newman & Girvan 2004). While there are a range of approaches to grouping sets of objects based on their attributes, most distance-based procedures flatten bipartite networks (those that have two sets of nodes, such as in a site-species network), reducing species information to some index of site dissimilarity. While many important ecological insights have been obtained by these traditional methods, a major advantage of modularity analysis is that it can maintain the bipartite nature of site-species networks, identifying both groups of sites that share distinct biota and sets of species that tend to co-occur.

Although modularity analysis was initially developed for the unipartite case (only one type of node; Newman & Girvan 2004), a bipartite formulation was later proposed by Barber (2007) to allow links only between nodes of opposing types. Maximizing Barber’s bipartite modularity causes both sets of nodes to be classified simultaneously and produces one-to-one node correspondence (i.e. producing the same number of modules for sites and species), associating assemblages directly to their physical locations. The resultant module structure of a site-species network can then be visualized to describe various spatial and biological patterns (such as the manifestation of bioregions), where trends can inform the development of testable hypotheses for further study.

2. Building the network

For this tutorial, we will use R software (https://www.r-project.org/). We start by opening R, setting the working directory, and loading the bipartite (Dormann et al. 2008) and dplyr packages (Wickham et al. 2021). Here, we will invoke a mix of tidyverse and base R procedures to wrangle data in preparation for analysis and subsequent visualizations.

# Load packages
library(bipartite)
library(dplyr)

Next, we read in our ecological community data. Data should be loaded or transformed to a wide format, with species as columns and sites as rows. The matrix can be binary (0s and 1s) or weighted (e.g. by abundance or biomass). For this demonstration, we will use data on phytoplankton communities in lakes and reservoirs across Alberta, Canada (see Loewen et al. 2020 for details). These data, sampled by the Alberta Government and the Alberta Lake Management Society Lakewatch Program, are presented as biomass for individual taxa at specific sites on multiple sampling dates, and are available for download from the Dryad Digital Repository (https://doi.org/10.5061/dryad.gf1vhhmk7).

Download the data into your working directory and load it into the R environment. As data from multiple sampling events are provided fore each site, we will first subset the matrix to work with records obtained in August only and remove extra fields.

# Read the species data
phyto.comm <- read.csv("Loewen_et_al._2020._Ecol_App._Species_data_-_final.csv", header = T)

# Subset data to select August sampling events
phyto.comm.aug.bio <- subset(phyto.comm, Month == "Aug")

# Set Lake.Name as row names
rownames(phyto.comm.aug.bio) <- phyto.comm.aug.bio$Lake.Name

# Remove extra fields
phyto.comm.aug.bio <- phyto.comm.aug.bio[, 5:308]

# Remove any species not found in August
i <- (colSums(phyto.comm.aug.bio) != 0)
phyto.comm.aug.bio <- phyto.comm.aug.bio[, i]

# Create binary community matrix (presence/absence) from weighted matrix
phyto.comm.aug.bin <- phyto.comm.aug.bio
phyto.comm.aug.bin[phyto.comm.aug.bin > 0] <- 1

Now that we have our data wrangled, we can visualize them as bipartite networks. Lets plot the unweighted bipartite graph first.

# Default bipartite graph plot
plotweb(phyto.comm.aug.bin, text.rot = 90)

Now plot a bipartite graph weighted by biomass.

# Default bipartite graph plot
plotweb(phyto.comm.aug.bio, text.rot = 90)

While there are too many species to make out the details in these plots, nodes at the top represent individual taxa while sites are represented along the bottom. Note that in the second (weighted) plot, thicker links between nodes indicate relatively greater biomass for species at particular sites, whereas all links are the same thickness in the first (unweighted presence/absence) plot. Node thickness also varies, indicating relative biomass in the weighted graph and relative occurrence in the unweighted graph.

3. Analyzing the network

We will use the computeModules function in the bipartite package (Dorman et al. 2008) to find groups of densely linked sites and species (i.e. modules) by maximizing Barber’s index with Beckett’s (2016) label propagation algorithm. First, we will analyze the unweighted network.

# Set random seed for reproducibility
set.seed(99)

# Conduct bipartite modularity analysis on unweighted network using Beckett's algorithm
bin.DIRT <- computeModules(phyto.comm.aug.bin, method = "Beckett")

# Determine number of modules
nrow(bin.DIRT@modules) - 1

## [1] 6

# Calculate participation coefficient
bin.DIRT.cz.higher <- czvalues(bin.DIRT, level = "higher")
bin.DIRT.cz.lower <- czvalues(bin.DIRT, level = "lower")

# Check modularity (Q) value of proposed module structure
bin.DIRT@likelihood

## [1] 0.2477575

The algorithm detected six modules with a modularity value of 0.2477575, but are these modules truly non-random?

For this, we can compare the computed value to those obtained from a series of random networks. The issue of obtaining random networks is also non-trivial, and several approaches exist. We will use the sequentially swapping curveball algorithm (Strona et al. 2014), which has been shown to efficiently produce uniformly distributed null matrices that maintain row (site) and column (species) totals for presence/absence data. Because the approach is sequential (i.e. further randomized with each step), it requires a large number of iterations as well as a burn-in period. We will use the oecosimu function in the vegan package (Oksanen et al. 2020) to run the simulations in parallel (parallel computing across multiple cores helps to reduce computation time). We will use 5,000 burn-in and thinning steps, which discard iterations at the beginning and between each sample, respectively.

# Load packages
library(parallel)
library(vegan)

# Create a function that captures the modularity statistic for a given network
func <- function(x) computeModules(x)@likelihood

# Detect and set the number of available cores to run simulations in parallel (reducing computation time)
mc.cores <- parallel::detectCores()
clus <- makeCluster(mc.cores)
clusterEvalQ(clus, library(bipartite))

# Set random seed for reproducibility
set.seed(99)

# Calculate module structure for 20 null networks, obtained using the curveball algorithm, and conduct one-sided randomization test
bin.test <- oecosimu(comm = phyto.comm.aug.bin, nestfun = func, method = "curveball", nsimul = 20, burnin = 5000, thin = 5000, statistic = "likelihood", alternative = "greater", parallel = clus)

# Stop the cluster
stopCluster(clus)

# Extract P-value
bin.test[["oecosimu"]][["pval"]]

## [1] 0.04761905

# Plot observed and simulated modularity values
plot(density(bin.test[["oecosimu"]][["simulated"]]), xlim = c(min(bin.test[["statistic"]], min(bin.test[["oecosimu"]][["simulated"]])), max(bin.test[["statistic"]], max(bin.test[["oecosimu"]][["simulated"]]))), main = "Modularity Null Comparisons", xlab = "Modularity (Q)", ylab = "Density")
abline(v = bin.test[["statistic"]], col = "red", lwd = 2)

In this plot, the red line represents the observed modularity statistic, while the density plot shows values obtained for random networks. As we can see, the proposed module structure is highly non-random, as the observed statistic is greater than any obtained for the random networks (P < 0.05). While we only generated 20 null matrices for demonstration purposes, several hundred (or thousand) should be used to obtain a null distribution for formal analyses (this can take a long time with large networks). It is also important to check that null matrices generated by sequential swapping procedures have converged on a random structure. Autocorrelation (i.e. relatedness between sequentially produced null matrices) can be avoided with large numbers of burn-in and thinning steps.

4. Visualizing the proposed module structure

Now, lets visualize the proposed module structure. We’ll start with the site modules. Here, we’ll use ggmap (Kahle & Wickham 2013) and ggplot2 (Wickhman 2016). We will also use ggsn to add in a scale bar and north arrow (Baquero 2019). We’ll extract site modules and merge these results with site coordinates (obtained from the environmental data matrix in Loewen et al. 2020) and plot the results on a terrain basemap (note that map tiles are by Stamen Design under CC BY 3.0 with data by OpenStreetMap under ODbL).

# Load packages
library(ggmap)
library(ggsn)
library(ggplot2)

# Read the environmental data
phyto.env <- read.csv("Loewen_et_al._2020._Ecol_App._Environmental_data_-_final.csv", header = T)
phyto.env.aug <- subset(phyto.env, Month == "Aug")
rownames(phyto.env.aug) <- phyto.env.aug$Lake.Name

# Extract data for site and species module structure
bin.DIRT.list <- listModuleInformation(bin.DIRT)

# Create vectors of site assignments for each module (accessed manually from nested lists as follows)
bin.DIRT.site.mod.1 <- bin.DIRT.list[[2]][[1]][[1]]
bin.DIRT.site.mod.1.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.1, 1, 0))
bin.DIRT.site.mod.2 <- bin.DIRT.list[[2]][[2]][[1]]
bin.DIRT.site.mod.2.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.2, 1, 0))
bin.DIRT.site.mod.3 <- bin.DIRT.list[[2]][[3]][[1]]
bin.DIRT.site.mod.3.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.3, 1, 0))
bin.DIRT.site.mod.4 <- bin.DIRT.list[[2]][[4]][[1]]
bin.DIRT.site.mod.4.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.4, 1, 0))
bin.DIRT.site.mod.5 <- bin.DIRT.list[[2]][[5]][[1]]
bin.DIRT.site.mod.5.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.5, 1, 0))
bin.DIRT.site.mod.6 <- bin.DIRT.list[[2]][[6]][[1]]
bin.DIRT.site.mod.6.vector <- as.matrix(if_else(rownames(bin.DIRT@moduleWeb) %in% bin.DIRT.site.mod.6, 1, 0))

# Bind vectors of site module assignments
bin.DIRT.site.mods <- cbind(bin.DIRT.site.mod.1.vector, bin.DIRT.site.mod.2.vector, bin.DIRT.site.mod.3.vector, bin.DIRT.site.mod.4.vector, bin.DIRT.site.mod.5.vector, bin.DIRT.site.mod.6.vector)

# Assign lake names to site module assignments
rownames(bin.DIRT.site.mods) <- rownames(bin.DIRT@moduleWeb)

# Merge participation coefficients with site module structure 
bin.DIRT.site.mods <- merge(bin.DIRT.site.mods, bin.DIRT.cz.lower$c, by = "row.names")
rownames(bin.DIRT.site.mods) <- bin.DIRT.site.mods$Row.names
bin.DIRT.site.mods <- bin.DIRT.site.mods[, -1]

# Assign module names to site module structure
colnames(bin.DIRT.site.mods) <- c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "c")

# Bind site coordinates and merge with site module structure
coords2var <- cbind(phyto.env.aug$Latitude.N, phyto.env.aug$Longitude.W)
coords2var <- as.data.frame(coords2var, row.names = row.names(phyto.env.aug))
colnames(coords2var) <- c("Latitude", "Longitude")
bin.DIRT.site.mods <- merge(bin.DIRT.site.mods, coords2var, by = "row.names")
rownames(bin.DIRT.site.mods) <- bin.DIRT.site.mods$Row.names
bin.DIRT.site.mods <- bin.DIRT.site.mods[, -1]

# Create factor indicating site module structure
bin.DIRT.site.mods['Mods'] <- NA
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod01 == 1] <- "Mod01"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod02 == 1] <- "Mod02"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod03 == 1] <- "Mod03"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod04 == 1] <- "Mod04"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod05 == 1] <- "Mod05"
bin.DIRT.site.mods$Mods[bin.DIRT.site.mods$Mod06 == 1] <- "Mod06"
bin.DIRT.site.mods$Mods <- as.factor(bin.DIRT.site.mods$Mods)

# Plot binary module map
basemap <- get_stamenmap(bbox = c(left = -max(bin.DIRT.site.mods$Longitude) - 2, bottom = min(bin.DIRT.site.mods$Latitude) - 2, right = -min(bin.DIRT.site.mods$Longitude) + 2, top = max(bin.DIRT.site.mods$Latitude) + 2), zoom = 5, maptype = "terrain-background")

bin.site.mod.map <- ggmap(basemap, maprange = TRUE) +
  geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude, color = Mods), shape = 16, size = 3) +
  geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude), color = "black", shape = 1, size = 3) +
  scale_x_continuous("Longitude", breaks = c(-120, -115, -110), 
                     labels = c("-120", "-115", "-110"), limits = c(-120, -110)) +
  scale_y_continuous("Latitude", breaks = c(49, 54, 59), 
                     labels = c("49", "54", "59"), limits = c(49, 59.7)) +
  labs(color = "Modules", title = "Unweighted site modules") +
  scalebar(x.min = attr(basemap, "bb")[[2]], 
           y.min = attr(basemap, "bb")[[1]], 
           x.max = attr(basemap, "bb")[[4]], 
           y.max = attr(basemap, "bb")[[3]], 
           dist = 100, anchor = c(x = -119.6, y = 49.6), transform = T, 
           location = "bottomleft", st.size = 3, st.dist = 0.032, dist_unit = "km") +
  theme(panel.border = element_rect(colour = "black", fill = NA),
        plot.title = element_text(hjust = 0.5),
        legend.key = element_rect(fill = "white"))
north2(bin.site.mod.map, x = 0.9, y = 0.9, symbol = 16)

The six modules are distributed across the province. The lack of any clear spatial pattern suggests a potentially important role for local factors (including water quality and biotic interactions) in driving community differentiation. We can also plot the participation coefficients of each site, which is a measure of among-module connectivity (Guimerà & Amaral 2005). Generally, higher participation coefficients indicate transitional zones in spatially structured data.

# Plot binary participation coefficient map
bin.site.pc.map <- ggmap(basemap, maprange = TRUE) +
  geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude, color = c), shape = 16, size = 3) +
  geom_point(data = bin.DIRT.site.mods, aes(-Longitude, Latitude), color = "black", shape = 1, size = 3) +
  scale_x_continuous("Longitude", breaks = c(-120, -115, -110), 
                     labels = c("-120", "-115", "-110"), limits = c(-120, -110)) +
  scale_y_continuous("Latitude", breaks = c(49, 54, 59), 
                     labels = c("49", "54", "59"), limits = c(49, 59.7)) +
  labs(color = "Participation\ncoefficient", title = "Unweighted site modules") +
  scale_colour_gradientn(colours = terrain.colors(10)) +
  scalebar(x.min = attr(basemap, "bb")[[2]],
           y.min = attr(basemap, "bb")[[1]],
           x.max = attr(basemap, "bb")[[4]],
           y.max = attr(basemap, "bb")[[3]],
           dist = 100, anchor = c(x = -119.6, y = 49.6), transform = T, 
           location = "bottomleft", st.size = 3, st.dist = 0.032, dist_unit = "km") +
  theme(panel.border = element_rect(colour = "black", fill = NA),
        plot.title = element_text(hjust = 0.5),
        legend.key = element_rect(fill = "white"))
north2(bin.site.pc.map, x = 0.9, y = 0.9, symbol = 16)

Again, we see no clear spatial patterns, suggesting that local factors may structure community differentiation at this scale of analysis.

Next, we’ll plot the species modules. We’ll extract the results, wrangle them, and produce a hierarchical edge bundling plot using ggraph (Pedersen 2021) and igraph (Csardi & Nepusz 2006). For plot, code was adapted from Yan Holtz’s R Graph Gallery (https://www.r-graph-gallery.com/311-add-labels-to-hierarchical-edge-bundling.html), where further step-by-step details are available. We’ll also bring in some trait data to highlight groups of phytoplankton with specific attributes. Trait data are available from Supporting Information S2 in Loewen et al. (2021b) available at https://doi.org/10.1002/lno.11694.

# Load packages
library(igraph)
library(ggraph)

# Load phytoplankton trait data
phyto.traits <- read.csv("lno11694-sup-0002-supinfo02.csv", header = T)
phyto.traits$SPECIES.NAME <- gsub(" ", ".", phyto.traits$SPECIES.NAME)
phyto.traits$SPECIES.NAME <- gsub("\\.\\.", ".", phyto.traits$SPECIES.NAME)
rownames(phyto.traits) <- phyto.traits$SPECIES.NAME

# Create vectors of species assignments for each module (accessed manually from nested lists as follows)
bin.DIRT.sp.mod.1 <- bin.DIRT.list[[2]][[1]][[2]]
bin.DIRT.sp.mod.1.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.1, 1, 0))
bin.DIRT.sp.mod.2 <- bin.DIRT.list[[2]][[2]][[2]]
bin.DIRT.sp.mod.2.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.2, 1, 0))
bin.DIRT.sp.mod.3 <- bin.DIRT.list[[2]][[3]][[2]]
bin.DIRT.sp.mod.3.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.3, 1, 0))
bin.DIRT.sp.mod.4 <- bin.DIRT.list[[2]][[4]][[2]]
bin.DIRT.sp.mod.4.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.4, 1, 0))
bin.DIRT.sp.mod.5 <- bin.DIRT.list[[2]][[5]][[2]]
bin.DIRT.sp.mod.5.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.5, 1, 0))
bin.DIRT.sp.mod.6 <- bin.DIRT.list[[2]][[6]][[2]]
bin.DIRT.sp.mod.6.vector <- as.matrix(if_else(colnames(bin.DIRT@moduleWeb) %in% bin.DIRT.sp.mod.6, 1, 0))

# Bind vectors of species module assignments
bin.DIRT.sp.mods <- cbind(bin.DIRT.sp.mod.1.vector, bin.DIRT.sp.mod.2.vector, bin.DIRT.sp.mod.3.vector, bin.DIRT.sp.mod.4.vector, bin.DIRT.sp.mod.5.vector, bin.DIRT.sp.mod.6.vector)

# Assign species names to species module assignments
rownames(bin.DIRT.sp.mods) <- colnames(bin.DIRT@moduleWeb)

# Merge participation coefficients with species module structure 
bin.DIRT.sp.mods <- merge(bin.DIRT.sp.mods, bin.DIRT.cz.higher$c, by = "row.names")
rownames(bin.DIRT.sp.mods) <- bin.DIRT.sp.mods$Row.names
bin.DIRT.sp.mods <- bin.DIRT.sp.mods[, -1]

# Assign module names to species module structure
colnames(bin.DIRT.sp.mods) <- c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "c")

# Create factor indicating species module structure
bin.DIRT.sp.mods['Mods'] <- NA
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod01 == 1] <- "Mod01"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod02 == 1] <- "Mod02"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod03 == 1] <- "Mod03"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod04 == 1] <- "Mod04"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod05 == 1] <- "Mod05"
bin.DIRT.sp.mods$Mods[bin.DIRT.sp.mods$Mod06 == 1] <- "Mod06"
bin.DIRT.sp.mods$Mods <- as.factor(bin.DIRT.sp.mods$Mods)

# Create adjacency matrix from species module structure for hierarchical edge bundling plot
bin.sp.adjac <- as.one.mode(phyto.comm.aug.bin, fill = 0, project = "higher", weighted = TRUE)

# Define origins, destinations, and edges
bin.sp.origins <- data.frame(c("origin", "origin", "origin", "origin",  "origin",  "origin"), 
                             c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06"))
names(bin.sp.origins) <- c("from", "to")
bin.sp.mods.edges <- as.data.frame(bin.DIRT.sp.mods$Mods)
colnames(bin.sp.mods.edges) <- c("from")
bin.sp.mods.edges$to <- rownames(bin.DIRT.sp.mods)
bin.sp.mods.edges <- bin.sp.mods.edges[order(bin.sp.mods.edges$from), ]
bin.sp.mods.edges <- rbind(bin.sp.origins, bin.sp.mods.edges)

# Create temporary igraph object from adjacency matrix and edge list defining connections
bin.sp.temp.graph <- graph_from_adjacency_matrix(bin.sp.adjac, mode = "undirected", weighted = TRUE, diag = FALSE)
bin.sp.temp.graph.edge.att <- E(bin.sp.temp.graph)$weight
bin.sp.temp.graph.edge.list <- get.edgelist(bin.sp.temp.graph)
bin.sp.mods.connect <- as.data.frame(cbind(bin.sp.temp.graph.edge.list, bin.sp.temp.graph.edge.att))
colnames(bin.sp.mods.connect) <- c("from", "to", "value")
bin.sp.mods.connect$value <- as.numeric(bin.sp.mods.connect$value)

# Calculate number of times each species occurs
bin.sp.mods.colsums <- as.data.frame(colSums (phyto.comm.aug.bin))
bin.sp.mods.colsums.match <- as.data.frame(bin.sp.mods.colsums$`colSums(phyto.comm.aug.bin)`[match(bin.sp.mods.edges$to, rownames(bin.sp.mods.colsums))])
bin.sp.mods.colsums.match <- rbind(c(NA), bin.sp.mods.colsums.match)
colnames(bin.sp.mods.colsums.match) <- "occur"

# Create data frame of vertices weighted by number of times species occur
bin.sp.mods.vertices <- data.frame(name = unique(c(as.character(bin.sp.mods.edges$from), as.character(bin.sp.mods.edges$to))), value = bin.sp.mods.colsums.match)
bin.sp.mods.vertices$group <- bin.sp.mods.edges$from[match(bin.sp.mods.vertices$name, bin.sp.mods.edges$to)]

# Calculate angle of labels
bin.sp.mods.vertices$id <- NA
bin.sp.mods.myleaves <- which(is.na(match(bin.sp.mods.vertices$name, bin.sp.mods.edges$from)))
bin.sp.mods.nleaves <- length(bin.sp.mods.myleaves)
bin.sp.mods.vertices$id[bin.sp.mods.myleaves] <- seq(1:bin.sp.mods.nleaves)
bin.sp.mods.vertices$angle <- 90 - 360 * (bin.sp.mods.vertices$id - 22) / bin.sp.mods.nleaves #correction required depending on number of nodes/leaves

# Calculate the alignment of labels
bin.sp.mods.vertices$hjust <- if_else((bin.sp.mods.vertices$angle < 90.01 & bin.sp.mods.vertices$angle > -90), 1, 0)

# Flip label angles depending on their alignments to make them readable
bin.sp.mods.vertices$angle <- if_else(bin.sp.mods.vertices$angle < -90 | bin.sp.mods.vertices$angle > 90, bin.sp.mods.vertices$angle + 180, bin.sp.mods.vertices$angle)

# Add vectors of trait data (requires values for each species in the plot)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["SUSPECTED_TOXIN_PRODUCING"], by.x = "name", by.y = 0, all.x = TRUE)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["SILICA_USING"], by.x = "name", by.y = 0, all.x = TRUE)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["CHL_B"], by.x = "name", by.y = 0, all.x = TRUE)
bin.sp.mods.vertices <- merge(bin.sp.mods.vertices, phyto.traits["CHL_C"], by.x = "name", by.y = 0, all.x = TRUE)

# Create final igraph object based on edges and vertices
bin.sp.mods.mygraph <- graph_from_data_frame(bin.sp.mods.edges, vertices = bin.sp.mods.vertices)

# Create connection objects defining line origins and destinations
bin.sp.mods.from <- match(bin.sp.mods.connect$from, bin.sp.mods.vertices$name)
bin.sp.mods.to <- match(bin.sp.mods.connect$to, bin.sp.mods.vertices$name)

# Plot binary hierarchical edge bundling plot
bin.sp.edge.group.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) + 
  geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
                   edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value), show.legend = F) + 
  scale_edge_color_continuous(low = "cornsilk2", high = "red4") + 
  geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, hjust = hjust, colour = group), 
                 size = 3, alpha = 0.6) + 
  geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur), 
                  alpha = 0.6, shape = 21, stroke = 0, show.legend = F) + 
  scale_size_continuous(range = c(0.1, 10)) + 
  labs(color = "Modules", title = "Unweighted species modules") + 
  theme_void() + 
  theme(plot.title = element_text(hjust = 0.5), 
        plot.margin = unit(c(0, 0, 0, 0), "cm")) + 
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.group.plot

This plot shows the module structure of phytoplankton species, with different colours reflecting community groups. Node size indicates the frequency that each species occurs in the dataset and edges indicate species co-occurrence. Darker edges reflect more instances of co-occurrence; however, patterns may be obscured in cases such as this with large numbers of co-occurring taxa.

We can see that module 6 (in pink) is the most species rich, while module 3 (in green) is the most species poor. We can also observe that module 2 (in gold) has several common taxa, including Plagioselmis nannoplanctica, Katablepharis ovalis, and species of the genus Chromulinaceae.

We can also visualize the distribution of traits among species modules. For example, using our trait data, we can highlight those taxa that are suspected toxin producers, silica users, or chlorophyll b and c producers.

# Plot identifying suspected toxin producers
bin.sp.edge.toxin.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
  geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
                   edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
  scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
  geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, 
                     hjust = hjust, colour = SUSPECTED_TOXIN_PRODUCING), size = 3, alpha = 0.6) +
  geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur), 
                  alpha = 0.6, shape = 21, stroke = 0) +
  scale_size_continuous(range = c(0.1, 10)) +
  labs(title = "Suspected toxin producers") +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(hjust=0.5),
        plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.toxin.plot

# Plot identifying species with silica requirement
bin.sp.edge.silica.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
  geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
                   edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
  scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
  geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, 
                     hjust = hjust, colour = SILICA_USING), size = 3, alpha = 0.6) +
  geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur), 
                  alpha = 0.6, shape = 21, stroke = 0) +
  scale_size_continuous(range = c(0.1, 10)) +
  labs(title = "Silica requirement") +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(hjust=0.5),
        plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.silica.plot

# Plot identifying chlorophyll b pigmented species
bin.sp.edge.chlb.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
  geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
                   edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
  scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
  geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, 
                     hjust = hjust, colour = CHL_B), size = 3, alpha = 0.6) +
  geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur), 
                  alpha = 0.6, shape = 21, stroke = 0) +
  scale_size_continuous(range = c(0.1, 10)) +
  labs(title = 'Chlorophyll'~italic(b)~'pigmented') +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(hjust=0.5),
        plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.chlb.plot

# Plot identifying chlorophyll c pigmented species
bin.sp.edge.chlc.plot <- ggraph(bin.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
  geom_conn_bundle(data = get_con(from = bin.sp.mods.from, to = bin.sp.mods.to, value = bin.sp.mods.connect$value),
                   edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value)) +
  scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
  geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, 
                     hjust = hjust, colour = CHL_C), size = 3, alpha = 0.6) +
  geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur), 
                  alpha = 0.6, shape = 21, stroke = 0) +
  scale_size_continuous(range = c(0.1, 10)) +
  labs(title = 'Chlorophyll'~italic(c)~'pigmented') +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(hjust=0.5),
        plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
bin.sp.edge.chlc.plot

From these plot, we can see how different traits are dispersed across modules. For instance, suspected toxin producers occur in each of the modules except number 3 (green), including the relatively common Aphanocapsa delicatissima and Aphanocapsa elachista in module 1 (red), Aphanocapsa rivularis and Aphanizomenon flosaquae in module 2 (gold), Dolichospermum spiroides in module 4 (turquoise), Aphanothece nidulans in module 5 (blue), and Limnothrix sp. in module 6 (pink). From this, we might conclude that toxin producers (and potentially harmful algal blooms) are widely dispersed, not being limited to any particular set of communities. Species with silica requirement (i.e. diatoms) and different photosynthetic pigments were also well dispersed across groups. However, chlorophyll b pigmented algae (i.e. chloro- and euglenophytes) were generally more common in modules 5 (blue) and 6 (pink), while chlorophyll c producing species (i.e. ochrophytes, cryptomonads, and dinoflagellates) were more associated with modules 2 (gold), 3 (green), and 4 (turquoise). Whether related to these or other correlated traits, findings indicate potential existence of spatial, environmental, or biotic factors driving clustering of species with particular photosynthetic pigments. A subsequent step might involve testing these patterns to determine whether trait clusters are significantly non-random.

5. Comparing module structure of weighted and unweighted networks

Finally, we can compare the module structure of the unweighted network graph to that of a network weighted by species biomass. First, we’ll compare site modules, using the gridExtra package (Auguie 2017) to arrange the maps side by side.

# Load packages
library(gridExtra)

# Set random seed for reproducibility
set.seed(99)

# Conduct bipartite modularity analysis on weighted network using Beckett's algorithm
bio.DIRT <- computeModules(phyto.comm.aug.bio, method = "Beckett")

# Determine number of modules
nrow(bio.DIRT@modules) - 1

## [1] 9

# Calculate participation coefficient
bio.DIRT.cz.higher <- czvalues(bio.DIRT, level = "higher")
bio.DIRT.cz.lower <- czvalues(bio.DIRT, level = "lower")

# Check modularity (Q) value of proposed module structure
bio.DIRT@likelihood

## [1] 0.6277079

The algorithm detected nine modules with a modularity value of 0.6277079.

# Extract data for site and species module structure
bio.DIRT.list <- listModuleInformation(bio.DIRT)

# Create vectors of site assignments for each module (accessed manually from nested lists as follows)
bio.DIRT.site.mod.1 <- bio.DIRT.list[[2]][[1]][[1]]
bio.DIRT.site.mod.1.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.1, 1, 0))
bio.DIRT.site.mod.2 <- bio.DIRT.list[[2]][[2]][[1]]
bio.DIRT.site.mod.2.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.2, 1, 0))
bio.DIRT.site.mod.3 <- bio.DIRT.list[[2]][[3]][[1]]
bio.DIRT.site.mod.3.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.3, 1, 0))
bio.DIRT.site.mod.4 <- bio.DIRT.list[[2]][[4]][[1]]
bio.DIRT.site.mod.4.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.4, 1, 0))
bio.DIRT.site.mod.5 <- bio.DIRT.list[[2]][[5]][[1]]
bio.DIRT.site.mod.5.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.5, 1, 0))
bio.DIRT.site.mod.6 <- bio.DIRT.list[[2]][[6]][[1]]
bio.DIRT.site.mod.6.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.6, 1, 0))
bio.DIRT.site.mod.7 <- bio.DIRT.list[[2]][[7]][[1]]
bio.DIRT.site.mod.7.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.7, 1, 0))
bio.DIRT.site.mod.8 <- bio.DIRT.list[[2]][[8]][[1]]
bio.DIRT.site.mod.8.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.8, 1, 0))
bio.DIRT.site.mod.9 <- bio.DIRT.list[[2]][[9]][[1]]
bio.DIRT.site.mod.9.vector <- as.matrix(if_else(rownames(bio.DIRT@moduleWeb) %in% bio.DIRT.site.mod.9, 1, 0))

# Bind vectors of site module assignments
bio.DIRT.site.mods <- cbind(bio.DIRT.site.mod.1.vector, bio.DIRT.site.mod.2.vector, bio.DIRT.site.mod.3.vector, bio.DIRT.site.mod.4.vector, bio.DIRT.site.mod.5.vector, bio.DIRT.site.mod.6.vector, bio.DIRT.site.mod.7.vector, bio.DIRT.site.mod.8.vector, bio.DIRT.site.mod.9.vector)

# Assign lake names to site module assignments
rownames(bio.DIRT.site.mods) <- rownames(bio.DIRT@moduleWeb)

# Merge participation coefficients with site module structure 
bio.DIRT.site.mods <- merge(bio.DIRT.site.mods, bio.DIRT.cz.lower$c, by = "row.names")
rownames(bio.DIRT.site.mods) <- bio.DIRT.site.mods$Row.names
bio.DIRT.site.mods <- bio.DIRT.site.mods[, -1]

# Assign module names to site module structure
colnames(bio.DIRT.site.mods) <- c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "Mod07", "Mod08", "Mod09", "c")

# Merge site coordinates with site module structure
bio.DIRT.site.mods <- merge(bio.DIRT.site.mods, coords2var, by = "row.names")
rownames(bio.DIRT.site.mods) <- bio.DIRT.site.mods$Row.names
bio.DIRT.site.mods <- bio.DIRT.site.mods[, -1]

# Create factor indicating site module structure
bio.DIRT.site.mods['Mods'] <- NA
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod01 == 1] <- "Mod01"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod02 == 1] <- "Mod02"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod03 == 1] <- "Mod03"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod04 == 1] <- "Mod04"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod05 == 1] <- "Mod05"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod06 == 1] <- "Mod06"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod07 == 1] <- "Mod07"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod08 == 1] <- "Mod08"
bio.DIRT.site.mods$Mods[bio.DIRT.site.mods$Mod09 == 1] <- "Mod09"
bio.DIRT.site.mods$Mods <- as.factor(bio.DIRT.site.mods$Mods)

# Plot weighted module map
bio.site.mod.map <- ggmap(basemap, maprange = TRUE) +
  geom_point(data = bio.DIRT.site.mods, aes(-Longitude, Latitude, color = Mods), shape = 16, size = 3) +
  geom_point(data = bio.DIRT.site.mods, aes(-Longitude, Latitude), color = "black", shape = 1, size = 3) +
  scale_x_continuous("Longitude", breaks = c(-120, -115, -110), 
                     labels = c("-120", "-115", "-110"), limits = c(-120, -110)) +
  scale_y_continuous("Latitude", breaks = c(49, 54, 59), 
                     labels = c("49", "54", "59"), limits = c(49, 59.7)) +
  labs(color = "Modules", title = "Weighted site modules") +
  scalebar(x.min = attr(basemap, "bb")[[2]],
           y.min = attr(basemap, "bb")[[1]],
           x.max = attr(basemap, "bb")[[4]],
           y.max = attr(basemap, "bb")[[3]],
           dist = 100, anchor = c(x = -119.6, y = 49.6), transform = T, 
           location = "bottomleft", st.size = 3, st.dist = 0.032, dist_unit = "km") +
  theme(panel.border = element_rect(colour = "black", fill = NA),
        plot.title = element_text(hjust = 0.5),
        legend.key = element_rect(fill = "white"))

# Arrange site module maps
north2(compare.site.mod.map <- grid.arrange(bin.site.mod.map, bio.site.mod.map, nrow = 1), x = 0.95, y = 0.9, symbol = 16)

Comparing these maps, we can see that the module structure of communities is very different depending on whether or not the connections are weighted by species biomass Not only did the weighted analysis detect a larger number of modules (nine), but the spatial arrangement of modules also differed somewhat. However, as with the unweighted network, the module structure based on biomass lacks a clear spatial pattern, generating the testable hypothesis that more local processes differentiate phytoplankton communities across Alberta, Canada.

We can now look at how species modules differ between the unweighted and weighted networks.

# Create vectors of species assignments for each module (accessed manually from nested lists as follows)
bio.DIRT.sp.mod.1 <- bio.DIRT.list[[2]][[1]][[2]]
bio.DIRT.sp.mod.1.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.1, 1, 0))
bio.DIRT.sp.mod.2 <- bio.DIRT.list[[2]][[2]][[2]]
bio.DIRT.sp.mod.2.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.2, 1, 0))
bio.DIRT.sp.mod.3 <- bio.DIRT.list[[2]][[3]][[2]]
bio.DIRT.sp.mod.3.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.3, 1, 0))
bio.DIRT.sp.mod.4 <- bio.DIRT.list[[2]][[4]][[2]]
bio.DIRT.sp.mod.4.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.4, 1, 0))
bio.DIRT.sp.mod.5 <- bio.DIRT.list[[2]][[5]][[2]]
bio.DIRT.sp.mod.5.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.5, 1, 0))
bio.DIRT.sp.mod.6 <- bio.DIRT.list[[2]][[6]][[2]]
bio.DIRT.sp.mod.6.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.6, 1, 0))
bio.DIRT.sp.mod.7 <- bio.DIRT.list[[2]][[7]][[2]]
bio.DIRT.sp.mod.7.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.7, 1, 0))
bio.DIRT.sp.mod.8 <- bio.DIRT.list[[2]][[8]][[2]]
bio.DIRT.sp.mod.8.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.8, 1, 0))
bio.DIRT.sp.mod.9 <- bio.DIRT.list[[2]][[9]][[2]]
bio.DIRT.sp.mod.9.vector <- as.matrix(if_else(colnames(bio.DIRT@moduleWeb) %in% bio.DIRT.sp.mod.9, 1, 0))

# Bind vectors of species module assignments
bio.DIRT.sp.mods <- cbind(bio.DIRT.sp.mod.1.vector, bio.DIRT.sp.mod.2.vector, bio.DIRT.sp.mod.3.vector, bio.DIRT.sp.mod.4.vector, bio.DIRT.sp.mod.5.vector, bio.DIRT.sp.mod.6.vector, bio.DIRT.sp.mod.7.vector, bio.DIRT.sp.mod.8.vector, bio.DIRT.sp.mod.9.vector)

# Assign species names to species module assignments
rownames(bio.DIRT.sp.mods) <- colnames(bio.DIRT@moduleWeb)

# Merge participation coefficients with species module structure 
bio.DIRT.sp.mods <- merge(bio.DIRT.sp.mods, bio.DIRT.cz.higher$c, by = "row.names")
rownames(bio.DIRT.sp.mods) <- bio.DIRT.sp.mods$Row.names
bio.DIRT.sp.mods <- bio.DIRT.sp.mods[, -1]

# Assign module names to species module structure
colnames(bio.DIRT.sp.mods) <- c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "Mod07", "Mod08", "Mod09", "c")

# Create factor indicating species module structure
bio.DIRT.sp.mods['Mods'] <- NA
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod01 == 1] <- "Mod01"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod02 == 1] <- "Mod02"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod03 == 1] <- "Mod03"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod04 == 1] <- "Mod04"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod05 == 1] <- "Mod05"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod06 == 1] <- "Mod06"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod07 == 1] <- "Mod07"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod08 == 1] <- "Mod08"
bio.DIRT.sp.mods$Mods[bio.DIRT.sp.mods$Mod09 == 1] <- "Mod09"
bio.DIRT.sp.mods$Mods <- as.factor(bio.DIRT.sp.mods$Mods)

# Create adjacency matrix from species module structure for hierarchical edge bundling plot
bio.sp.adjac <- as.one.mode(phyto.comm.aug.bio, fill = 0, project = "higher", weighted = TRUE)

# Define origins, destinations, and edges
bio.sp.origins <- data.frame(c("origin", "origin", "origin", "origin",  "origin", "origin",  "origin",  "origin",  "origin"), c("Mod01", "Mod02", "Mod03", "Mod04", "Mod05", "Mod06", "Mod07", "Mod08", "Mod09"))
names(bio.sp.origins) <- c("from", "to")
bio.sp.mods.edges <- as.data.frame(bio.DIRT.sp.mods$Mods)
colnames(bio.sp.mods.edges) <- c("from")
bio.sp.mods.edges$to <- rownames(bio.DIRT.sp.mods)
bio.sp.mods.edges <- bio.sp.mods.edges[order(bio.sp.mods.edges$from), ]
bio.sp.mods.edges <- rbind(bio.sp.origins, bio.sp.mods.edges)

# Create temporary igraph object from adjacency matrix and edge list defining connections
bio.sp.temp.graph <- graph_from_adjacency_matrix(bio.sp.adjac, mode = "undirected", weighted = TRUE, diag = FALSE)
bio.sp.temp.graph.edge.att <- E(bio.sp.temp.graph)$weight
bio.sp.temp.graph.edge.list <- get.edgelist(bio.sp.temp.graph)
bio.sp.mods.connect <- as.data.frame(cbind(bio.sp.temp.graph.edge.list, bio.sp.temp.graph.edge.att))
colnames(bio.sp.mods.connect) <- c("from", "to", "value")
bio.sp.mods.connect$value <- as.numeric(bio.sp.mods.connect$value)

# Calculate number of times each species occurs
bio.sp.mods.colsums <- as.data.frame(colSums (phyto.comm.aug.bio))
bio.sp.mods.colsums.match <- as.data.frame(bio.sp.mods.colsums$`colSums(phyto.comm.aug.bio)`[match(bio.sp.mods.edges$to, rownames(bio.sp.mods.colsums))])
bio.sp.mods.colsums.match <- rbind(c(NA), bio.sp.mods.colsums.match)
colnames(bio.sp.mods.colsums.match) <- "occur"

# Create data frame of vertices weighted by number of times species occur
bio.sp.mods.vertices <- data.frame(name = unique(c(as.character(bio.sp.mods.edges$from), as.character(bio.sp.mods.edges$to))), value = bio.sp.mods.colsums.match)
bio.sp.mods.vertices$group <- bio.sp.mods.edges$from[match(bio.sp.mods.vertices$name, bio.sp.mods.edges$to)]

# Calculate angle of labels
bio.sp.mods.vertices$id <- NA
bio.sp.mods.myleaves <- which(is.na(match(bio.sp.mods.vertices$name, bio.sp.mods.edges$from)))
bio.sp.mods.nleaves <- length(bio.sp.mods.myleaves)
bio.sp.mods.vertices$id[bio.sp.mods.myleaves]<-seq(1:bio.sp.mods.nleaves)
bio.sp.mods.vertices$angle <- 90 - 360 * (bio.sp.mods.vertices$id - 22) / bio.sp.mods.nleaves #correction required depending on number of nodes/leaves

# Calculate the alignment of labels
bio.sp.mods.vertices$hjust <- if_else((bio.sp.mods.vertices$angle < 90.01 & bio.sp.mods.vertices$angle > -90), 1, 0)

# Flip label angles depending on their alignments to make them readable
bio.sp.mods.vertices$angle <- if_else(bio.sp.mods.vertices$angle < -90 | bio.sp.mods.vertices$angle > 90, bio.sp.mods.vertices$angle + 180, bio.sp.mods.vertices$angle)

# Create final igraph object based on edges and vertices
bio.sp.mods.mygraph <- graph_from_data_frame(bio.sp.mods.edges, vertices = bio.sp.mods.vertices)

# Create connection objects defining line origins and destinations
bio.sp.mods.from <- match(bio.sp.mods.connect$from, bio.sp.mods.vertices$name)
bio.sp.mods.to <- match(bio.sp.mods.connect$to, bio.sp.mods.vertices$name)

# Plot weighted hierarchical edge bundling plot
bio.sp.edge.group.plot <- ggraph(bio.sp.mods.mygraph, layout = 'dendrogram', circular = TRUE) +
  geom_conn_bundle(data = get_con(from = bio.sp.mods.from, to = bio.sp.mods.to, value = bio.sp.mods.connect$value),
                   edge_alpha = 0.1, width = 0.3, tension = 0.8, aes(colour = value), show.legend = F) +
  scale_edge_color_continuous(low = "cornsilk2", high = "red4") +
  geom_node_text(aes(x = x * 1.15, y = y * 1.15, filter = leaf, label = name, angle = angle, 
                     hjust = hjust, colour = group), size = 3, alpha = 0.6) +
  geom_node_point(aes(filter = leaf, x = x * 1.07, y = y * 1.07, fill = factor(group), size = occur), 
                  alpha = 0.6, shape = 21, stroke = 0, show.legend = F) +
  scale_size_continuous(range = c(0.1, 10)) +
  labs(color = "Modules", title = "Weighted species modules") +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.margin = unit(c(0, 0, 0, 0), "cm")) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))

# Arrange species module plots
compare.sp.edge.group.plot <- grid.arrange(bio.sp.edge.group.plot, bin.sp.edge.group.plot, nrow = 2)

Again, we see very different module structures from the weighted and unweighted networks. Note that node sizes indicates relative biomass in the weighted network, which is concentrated in a few key species, including toxin producing Aphanizomenon gracile in module 2 (orange), Aphanizomenon flosaquae in module 4 (green), Dolichospermum spiroides in module 8 (purple), and Planktothrix agardhii in module 9 (pink). Thus, while the toxin trait is widely distributed across different types of communities, biomass of potential toxin producers (and thus risk of harmful algal blooms) is relatively more concentrated at sites associated with these modules.

As with the site module structure, species results lend themselves to further analysis to understand associations among species within and between groups, including their functional trait and phylogenetic similarities. Here, the conceptualization of multispecies distributions as site-species networks, and the subsequent analysis and visualization of their module structure, offers an valuable means of communicating biological patterns and generating novel, testable hypotheses for future investigations.

Additional reading on label propagation algorithms

(from Loewen et al. 2021a. Bioregions are predominately climatic for fishes of northern lakes. Global Ecology and Biogeography)

Modularity optimization is computationally challenging (NP-hard; Miyauchi & Sukegawa 2015), necessitating heuristics to search through solution space of larger networks. The DIRTLPAb+ algorithm of Beckett (2016) is a label propagation approach to maximize Barber’s bipartite modularity. Briefly, the label propagation algorithm (LPA) for community detection was proposed by Raghavan et al. (2007), where nodes were initialized with all unique labels, updated iteratively to adopt the most common labels of their neighbours (with ties broken uniformly), and grouped together once no further improvements could be made. The routine was subsequently modified to the bipartite case (LPAb); however, it had a tendency to become trapped in local maxima. To escape such traps, Liu & Murata (2010) incorporated a multistep greedy agglomerative algorithm (LPAb+). Their approach applied asynchronous label propagation (updating node labels of each type in turn) but added a step where resulting groups could be merged if doing so improved the solution, iterating between propagation and aggregation phases. Finally, because label propagation is inherently stochastic, and thus outcomes can vary depending on initialization, Beckett (2016) proposed that LPAb+ should be repeated under different node configurations (i.e. numbers of unique starting labels) and report the greatest ensuing score (DIRTLPAb+).

6. References

Auguie, B. (2017). gridExtra: miscellaneous functions for “grid” graphics. R package version 2.3. https://CRAN.R-project.org/package=gridExtra

Barber, M.J. (2007). Modularity and community detection in bipartite networks. Physical Review E, 76, 066102.

Baquero, O.S. (2019). ggsn: north symbols and scale bars for maps created with ‘ggplot2’ or ‘ggmap’. R package version 0.5.0. https://CRAN.R-project.org/package=ggsn

Beckett, S.J. (2016). Improved community detection in weighted bipartite networks. Royal Society Open Science, 3, 140536.

Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.

Dormann, C.F., Gruber, B., & Fründ, J. (2008). Introducing the bipartite package: analysing ecological networks. R News, 8, 8–11.

Holtz, Y. Add labels to Hierarchical Edge Bundling. R Graph Gallery. https://www.r-graph-gallery.com/311-add-labels-to-hierarchical-edge-bundling.html

Guimerà, R, & Amaral, L.A.N. (2005). Functional cartography of complex metabolic networks. Nature, 433, 895–900.

Kahle, D., & Wickha, H. (2013). ggmap: spatial Visualization with ggplot2. The R Journal, 5, 144-161.

Loewen, C.J.G., Wyatt, F.R., Mortimer, C.A., Vinebrooke, R.D., & Zurawell, R.W. (2020). Multiscale drivers of phytoplankton communities in north temperate lakes. Ecological Applications, 30, e02102.

Loewen C.J.G., Jackson D.A., Chu C., Alofs K.M., Hansen G.J.A., Honsey A.E., Minns C.K., & Wehrly K.E. (2021a). Bioregions are predominantly climatic for fishes of northern lakes. Global Ecology and Biogeography.

Loewen. C.J.G., Vinebrooke, R.D., & Zurawell R.W. (2021b). Quantifying seasonal succession of phytoplankton trait-environment associations in human-altered landscapes. Limnology and Oceanography, 66, 1409-1423.

Miyauchi, A., & Sukegawa, N. (2015). Maximizing Barber’s bipartite modularity is also hard. Optimization Letters, 9, 897–913.

Newman, M.E.J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.

Oksanen, J., Blanchet, F.G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., … Wagner, H. (2020). vegan: community ecology package. R package version 2.5-7. https://CRAN.R-project.org/package=vegan

Pedersen, T.L. (2021). ggraph: an implementation of grammar of graphics for graphs and networks. R package version 2.0.5. https://CRAN.R-project.org/package=ggraph

R Core Team. (2021). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Raghavan, U.N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76, 036106.

Strona, G., Nappo, D., Boccacci, F., Fattorini, S., & San-Miguel-Ayanz, J. (2014). A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nature Communications, 5, 4114.

Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer-Verlag, New York, USA.

Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: a grammar of data manipulation. R package version 1.0.5. https://CRAN.R-project.org/package=dplyr

Tutorial on community detection in biogeographical site-species networks by bipartite modularity analysis

Charlie Loewen

18/Nov/2021

Objectives

1. Background

2. Building the network

3. Analyzing the network

4. Visualizing the proposed module structure

5. Comparing module structure of weighted and unweighted networks

6. References

1. Background

2. Building the network

3. Analyzing the network

4. Visualizing the proposed module structure

5. Comparing module structure of weighted and unweighted networks

Additional reading on label propagation algorithms

6. References