Guoliang's Lab

cfMethDB

Cancer is a major global health threat, and early detection is crucial for improving patient outcomes. DNA methylation in circulating cell-free DNA (cfDNA) has emerged as a promising biomarker for non-invasive cancer diagnosis. However, the integration and utilization of existing cfDNA methylation data have been limited, hindering comprehensive research efforts, particularly in the discovery of cfDNA methylation biomarkers. To address this challenge, we introduce cfMethDB, a comprehensive database dedicated to cfDNA methylation in cancer that encompasses 4828 publicly available datasets. Through standardized analysis, we identified 1,048,770 differentially methylated cytosines (DMCs) as candidate biomarkers across seven cancer types. With cfMethDB, we not only identified known cfDNA methylation biomarkers, but also discovered several genes, such as ZIC4, that could be novel biomarkers. Moreover, cfMethDB offers a suite of user-friendly tools, including biomarker evaluation, pan-cancer search, and end motif analysis. We hope that cfMethDB will serve as a valuable platform for the discovery of novel cancer cfDNA methylation biomarkers and will facilitate cancer research and clinical applications. cfMethDB is publicly available at: https://cfmethdb.hzau.edu.cn/home.

Visit cfMethDB

cfMethDB: A Comprehensive cfDNA Methylation Data Resource for Cancer Biomarkers

DinoSource

Dinoflagellates are a taxonomically diverse and ecologically significant group of phytoplankton. They are also infamous for their involvement in harmful algal blooms, which have significant ecological and economic impacts. In recent years, substantial advances have been made in the analysis of dinoflagellate genomes, including sequencing, assembly and gene annotation, alongside the accumulation of extensive multi-omics data. Despite these developments, the large size and complexity of dinoflagellate genomes present ongoing challenges. Current resources, such as SAGER, primarily focus on genomic and transcriptomic data sets for Symbiodiniaceae. In this study, we have developed the first high-precision and comprehensive genome resource database for dinoflagellates, DinoSource (http://glab.hzau.edu.cn/dinosource), which provides 21 genome assemblies for all 20 currently sequenced dinoflagellate species (including two strains of Polarella glacialis). Our database integrates 703 omics samples, which have been generated from our experiments as well as collected from public repositories such as GEO and SRA up to the present date.

Visit DinoSource

DinoSource: A comprehensive database of dinoflagellate genomic resources

MethMarkerDB

DNA methylation plays a crucial role in tumorigenesis and tumor progression, sparking substantial interest in the clinical applications of cancer DNA methylation biomarkers. Cancer-related whole-genome bisulfite sequencing (WGBS) data offers a promising approach to precisely identify these biomarkers with differentially methylated regions (DMRs). However, currently there is no dedicated resource for cancer DNA methylation biomarkers with WGBS data. Here, we developed a comprehensive cancer DNA methylation biomarker database - MethMarkerDB (https://methmarkerdb.hzau.edu.cn/), which integrated 658 WGBS datasets, incorporating 724 curated DNA methylation biomarker genes from 1425 PubMed published articles. Based on WGBS data, we documented 5.4 million DMRs from 13 common types of cancer as candidate DNA methylation biomarkers. We provided search and annotation functions for these DMRs with different resources, such as enhancers and SNPs, and developed diagnostic and prognostic models for further biomarker evaluation. With the database, we not only identified known DNA methylation biomarkers, but also identified 781 hypermethylated and 5245 hypomethylated pan-cancer DMRs, corresponding to 693 and 2172 genes, respectively. These novel potential pan-cancer DNA methylation biomarkers hold significant clinical translational value. We hope that MethMarkerDB will help identify novel cancer DNA methylation biomarkers and propel the clinical application of these biomarkers.

Visit MethMarkerDB

MethMarkerDB: a comprehensive cancer DNA methylation biomarker database

AraENCODE

Here, we combine the published Arabidopsis epigenomic datasets (e.g. ChIP-seq, ATAC-seq, MNase-seq, BS-seq), 3D genome datasets (e.g. Hi-C, Capture Hi-C, HiChIP, ChIA-PET), and transcriptome datasets (e.g. RNA-seq and ncRNA-seq) to construct a comprehensive Arabidopsis thaliana Encyclopedia of DNA Elements Database (AraENCODE). This database contains 4,511 datasets. The Arabidopsis TAIR10 is uniformly selected as the reference genome. So, it's convenient for the display and comparison of data. AraENCODE database mainly includes seven search functions, including Histone Modification, Transcriptome, Open Chromatin Region, DNA methylation, 3D Genome, Chromatin State and Wildtype vs Mutant, and they help database users to quickly search for targeted epigenetics. This also shows the epigenetic landscape of Arabidopsis in AraENCODE database from five aspects: histone modification, transcriptional expression, open chromatin state, DNA methylation degree and mutant difference. AraENCODE database has also equipped with WashU Epigenome Browser, it can show the standardized Arabidopsis datasets more intuitively.

Visit AraENCODE

AraENCODE: a comprehensive epigenomic database of Arabidopsis thaliana.

ChromLoops

Gene transcription regulation is a complex but well-organized process in eukaryotes. Chromatin loops are an important element of chromatin structures and functions. It can help form interactions between regulatory elements, such as promoters, enhancers, silencers, and insulators, thus regulating spatiotemporal gene expression. The development of high-throughput techniques based on chromatin conformation capture (3C) (such as Hi-C, ChIA-PET, HiChIP and PLAC-Seq) has substantially furthered our understanding of the genome spatial organization and how it influences gene regulation. ChIA-PET, HiChIP and PLAC-Seq aim to capture genome-wide chromatin interactions mediated by specific proteins such as RNAPII, transcription factors (TFs), and histone proteins, resulting in highly specific and high-resolution chromatin loops.

The current version of ChromLoops integrated 1030 datasets (366 samples) of 13 species of ChIA-PET, HiChIP and PLAC-Seq, and documented a total of 1,491,416,813 high-quality chromatin loops. In addition, we further annotated chromatin loop anchors (genes or regions) with abundant functional annotations information, including enhancers, silencers, SNPs, QTLs, transcription factor binding sites, alternative splicing information, circRNAs, TWAS, chromatin accessibility information and gene expression information.

Visit ChromLoops

ChromLoops: a comprehensive database for speciﬁc protein-mediated chromatin loops in diverse organisms.

ASMdb

DNA methylation plays a crucial role in most organisms. Besides, parental alleles in haploids might exhibit different methylation patterns, which can lead to different phenotypes and even different therapeutic and drug responses to diseases. Moreover, related studies have revealed that allele-specific DNA methylation (ASM) can be used as an effective tumor marker and plays important roles in the development of seeds and seedlings. A comprehensive database of high-throughput BS-Seq data and ASM results from multiple species was not available prior to this study.

Here, we constructed the Allele-Specific DNA Methylation Databases (ASMdb), aiming to provide a comprehensive resource and a web tool for showing the DNA methylation level and differential DNA methylation in diverse organisms, including 47 species, such as Homo sapiens, Mus musculus, Arabidopsis thaliana, and Oryza Sativa, with high quality 5998 GEO samples (4400 BS-Seq data and 1598 RNA-Seq data) filted by bisulfite conversion.

Visit ASMdb

ASMdb: a comprehensive database for allele-specific DNA methylation in diverse organisms

RiceENCODE

The application of Hi-C, ChIA-PET, and other chromosome conformation capture technologies in plants in recent years has have made a the research on the spatial structure of plant genomes more and more important. With the continuous increase of a large number of plant chromosome interaction data, how to query the epigenetic modification and interaction information efficiently and quickly has become a major problem. Here, we combine the published three-dimensional interactive data ChIA-PET, Hi-C and the epigenomic datasets (ChIP-Seq, ATAC-Seq,MNase-Seq, FAIRE-Seq, WGBS, RNA-Seq) to construct a comprehensive rice Encyclopedia of DNA Elements database (riceENCODE). This database contains 694 datasets, which is the largest one for rice epigenomics up to our best knowledge, which can help us to outline the rice epigenome in a comprehensive way, and provide an important platform for studying rice molecular breeding, genetic mechanisms, tissue specificities and subgroup properties in epigenetic regulation.

Visit RiceENCODE

RiceENCODE: A comprehensive epigenomic database as a rice Encyclopedia of DNA Elements

RiceLncPedia

Long non‐coding RNAs (lncRNAs) are referred as RNA molecules with length of at least 200 nucleotides (nt) and usually have low protein‐coding potential. In plants, emerging evidence indicate that lncRNAs function as key modulators in development and stress response at the epigenetic, transcriptional and post‐transcriptional levels. Here, we developed a database, RiceLncPedia (http://3dgenome.hzau.edu.cn/RiceLncPedia), to systematically characterize rice lncRNAs with expression profile and multi‐omic features to facilitate the understanding and research of rice lncRNAs, including as follows: (i) lncRNA expression profiles in various tissues, development stages and stress treatments; (ii) lncRNA associations with genome variations; (iii) the linkage of lncRNAs with phenotypes; (iv) the overlap information of lncRNAs and transposon elements; and (v) the lncRNAs predicted as miRNA targets or miRNA precursors.

Visit RiceLncPedia

RiceLncPedia: a comprehensive database of rice long non‐coding RNAs

OUR INTEREST

OUR TEAM

RESEARCH

RESOURCES

COURSE