Gene co-expression network analysis is extremely useful in interpreting a complex biological process. The recent droplet-based single-cell technology is able to generate much larger gene expression data routinely with thousands of samples and tens of thousands of genes. To analyze such a large-scale gene-gene network, remarkable progress has been made in rigorous statistical inference of high-dimensional Gaussian graphical model (GGM). These approaches provide a formal confidence interval or a p-value rather than only a single point estimator for conditional dependence of a gene pair and are more desirable for identifying reliable gene networks. To promote their widespread use, we herein introduce an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Unlike the existing tools, SILGGM provides statistically efficient inference on both individual gene pair and whole-scale gene pairs. It has a novel and consistent false discovery rate (FDR) procedure in all four methodologies. Based on the user-friendly design, it provides outputs compatible with multiple platforms for interactive network visualization. Furthermore, comparisons in simulation illustrate that SILGGM can accelerate the existing MATLAB implementation to several orders of magnitudes and further improve the speed of the already very efficient R package FastGGM. Testing results from the simulated data confirm the validity of all the approaches in SILGGM even in a very large-scale setting with the number of variables or genes to a ten thousand level. We have also applied our package to a novel single-cell RNA-seq data set with pan T cells. The results show that the approaches in SILGGM significantly outperform the conventional ones in a biological sense. The package is freely available via CRAN at -project.org/package=SILGGM.
There are some existing software packages for gene co-expression network analysis. For example, the popular R package WGCNA  provides functions to construct a gene co-expression network based on the marginal correlations. In terms of the partial correlation-based approaches particularly for large-scale settings, glasso  and huge  are two widely adopted packages for fast estimation of gene-gene conditional dependence based on the high-dimensional GGM. More recent packages include FastCLIME , flare  and XMRF . Unlike the marginal correlation-based approaches and high-dimensional GGM estimation, there are in practice few efficient packages or algorithms for the aforementioned approaches of rigorous statistical inference with the partial correlations that are supposed to be more powerful in large-scale gene-gene network analysis. FastGGM  is the recently developed package for an efficient and tuning-free implementation of B_NW_SL and has made the method computationally feasible to tens of thousands of genes. However, some redundant steps in the algorithm can be further improved and the outputs in only a matrix format make the package less friendly to users. Except FastGGM, no efficient R package has been proposed for the other above related works, and the expensive computation of naïve implementation also remains a challenge for these approaches.
We focus on the high-dimensional settings with p (the number of genes) allowed to be far larger than n (the number of subjects). The SILGGM package has one main function SILGGM() with various arguments and its workflow is described in Fig 1.
The source code of the package and a complete reference manual including dependencies, usage of all package functions and associated examples are freely available via CRAN at -project.org/package=SILGGM. The details of package installation are described in S3 Appendix.
The package SILGGM is computationally efficient compared to the MATLAB implementation of GFC_L and the R package FastGGM. Since R is a publicly free platform and has been more widely used in biological research compared to MATLAB which is a piece of commercially licensed software and has less accessibility to biologists, the R platform-based SILGGM will play a more important role in accelerating the biological gene network studies. SILGGM is also statistically efficient with both individual and global inference due to the theoretical justification of the four approaches and the validation of estimation accuracy in simulation studies. The analytical results from the single-cell data with Pan T cells further reflect the statistical efficiency of SILGGM since inferred gene networks are more reliable. Moreover, the comprehensiveness of SILGGM allows users to have more flexible choices of methods depending on the specific purpose of their study. Due to its computational feasibility, analytical reliability in results and methodological comprehensiveness, SILGGM can become a valuable and powerful tool to a wide range of biological researchers for high-dimensional or even whole genome-wide co-expression network analysis.
In the future, we will add parallel computing to SILGGM so as to allow users to use multiple clusters for bigger data analysis since the droplet-based single-cell technology will further increase the sample size . In addition, the new feature for the rigorous statistical inference of high-dimensional multiple gene networks is another potential extension of our package because differential gene network analysis among different cell types or cells of multiple individuals is being paid more attention to.
Total RNA was extracted from ES cells or embryoid bodies using Qiagen RNAeasy kit (Qiagen, Valencia, CA, USA). For quantitative PCR analysis, cDNA was synthesized using SuperScript III (Invitrogen, Carlsbad, CA, USA) and amplified using SYBR green brilliant PCR amplification kit (Stratagene, La Jolla, CA, USA) and Mx3000 thermocycler (Stratagene). For GeneChip expression analysis, RNA was amplified using Ovation amplification and labeling kit (NuGen, San Carlos, CA, USA) and hybridized to Affymetrix Mouse Genome 430 2.0 microarrays. Expression microarray experiments were performed in biological triplicate for each analyzed time point. Arrays were scanned using the GeneChip Scanner 3000. Data analysis was carried out using the affylmGUI BioConductor package . GC Robust Multi-array Average (GCRMA) normalization  was performed across all arrays, followed by linear model fitting using Limma . Differentially expressed genes after 8 hours of RA treatment were defined by ranking all probesets by the moderated t-statistic-derived P-value (adjusted for multiple testing using Benjamini and Hochberg's method ) and setting thresholds of P < 0.01 and a fold-change of at least 2. All arrays were submitted to the NIH Gene Expression Omnibus (GEO) database under accession number [GEO:GSE19372].
All assets belong to their original creators with permission for me to use them, by purchasing this package, no rights are granted for any of the assets* to be used in any way but private uploading (aka for yourself and yourself only).
JGL offers a competitive salary and benefits package to include new child care leave, retirement plan, personal marketing funds and more. Please forward your resume and cover letter to firstname.lastname@example.org.
Note: lichee is an Allwinner project for its CPUs. This project package contains U-boot's source code, Linux source code and various installation scripts. Generating an Android image relies on this \"lichee\" directory and its name cannot be changed.\"-b master-android7.0\" means the \"master-android7.0\" branch will be used. Generating an Android image from Android 7.0's source code relies on this branch. 59ce067264