DNA methylation plays a crucial role in tumorigenesis and tumor progression, sparking substantial interest in the clinical applications of cancer DNA methylation biomarkers. Cancer-related whole-genome bisulfite sequencing (WGBS) data offers a promising approach to precisely identify these biomarkers with differentially methylated regions (DMRs). However, currently there is no dedicated resource for cancer DNA methylation biomarkers with WGBS data. Here, we developed a comprehensive cancer DNA methylation biomarker database - MethMarkerDB (https://methmarkerdb.hzau.edu.cn/), which integrated 658 WGBS datasets, incorporating 724 curated DNA methylation biomarker genes from 1425 PubMed published articles. Based on WGBS data, we documented 5.4 million DMRs from 13 common types of cancer as candidate DNA methylation biomarkers. We provided search and annotation functions for these DMRs with different resources, such as enhancers and SNPs, and developed diagnostic and prognostic models for further biomarker evaluation. With the database, we not only identified known DNA methylation biomarkers, but also identified 781 hypermethylated and 5245 hypomethylated pan-cancer DMRs, corresponding to 693 and 2172 genes, respectively. These novel potential pan-cancer DNA methylation biomarkers hold significant clinical translational value. We hope that MethMarkerDB will help identify novel cancer DNA methylation biomarkers and propel the clinical application of these biomarkers.
Here, we combine the published Arabidopsis epigenomic datasets (e.g. ChIP-seq, ATAC-seq, MNase-seq, BS-seq), 3D genome datasets (e.g. Hi-C, Capture Hi-C, HiChIP, ChIA-PET), and transcriptome datasets (e.g. RNA-seq and ncRNA-seq) to construct a comprehensive Arabidopsis thaliana Encyclopedia of DNA Elements Database (AraENCODE). This database contains 4,511 datasets. The Arabidopsis TAIR10 is uniformly selected as the reference genome. So, it's convenient for the display and comparison of data. AraENCODE database mainly includes seven search functions, including Histone Modification, Transcriptome, Open Chromatin Region, DNA methylation, 3D Genome, Chromatin State and Wildtype vs Mutant, and they help database users to quickly search for targeted epigenetics. This also shows the epigenetic landscape of Arabidopsis in AraENCODE database from five aspects: histone modification, transcriptional expression, open chromatin state, DNA methylation degree and mutant difference. AraENCODE database has also equipped with WashU Epigenome Browser, it can show the standardized Arabidopsis datasets more intuitively.
Gene transcription regulation is a complex but well-organized process in eukaryotes. Chromatin loops are an important element of chromatin structures and functions. It can help form interactions between regulatory elements, such as promoters, enhancers, silencers, and insulators, thus regulating spatiotemporal gene expression. The development of high-throughput techniques based on chromatin conformation capture (3C) (such as Hi-C, ChIA-PET, HiChIP and PLAC-Seq) has substantially furthered our understanding of the genome spatial organization and how it influences gene regulation. ChIA-PET, HiChIP and PLAC-Seq aim to capture genome-wide chromatin interactions mediated by specific proteins such as RNAPII, transcription factors (TFs), and histone proteins, resulting in highly specific and high-resolution chromatin loops.
The current version of ChromLoops integrated 1030 datasets (366 samples) of 13 species of ChIA-PET, HiChIP and PLAC-Seq, and documented a total of 1,491,416,813 high-quality chromatin loops. In addition, we further annotated chromatin loop anchors (genes or regions) with abundant functional annotations information, including enhancers, silencers, SNPs, QTLs, transcription factor binding sites, alternative splicing information, circRNAs, TWAS, chromatin accessibility information and gene expression information.
DNA methylation plays a crucial role in most organisms. Besides, parental alleles in haploids might exhibit different methylation patterns, which can lead to different phenotypes and even different therapeutic and drug responses to diseases. Moreover, related studies have revealed that allele-specific DNA methylation (ASM) can be used as an effective tumor marker and plays important roles in the development of seeds and seedlings. A comprehensive database of high-throughput BS-Seq data and ASM results from multiple species was not available prior to this study.
Here, we constructed the Allele-Specific DNA Methylation Databases (ASMdb), aiming to provide a comprehensive resource and a web tool for showing the DNA methylation level and differential DNA methylation in diverse organisms, including 47 species, such as Homo sapiens, Mus musculus, Arabidopsis thaliana, and Oryza Sativa, with high quality 5998 GEO samples (4400 BS-Seq data and 1598 RNA-Seq data) filted by bisulfite conversion.
The application of Hi-C, ChIA-PET, and other chromosome conformation capture technologies in plants in recent years has have made a the research on the spatial structure of plant genomes more and more important. With the continuous increase of a large number of plant chromosome interaction data, how to query the epigenetic modification and interaction information efficiently and quickly has become a major problem. Here, we combine the published three-dimensional interactive data ChIA-PET, Hi-C and the epigenomic datasets (ChIP-Seq, ATAC-Seq,MNase-Seq, FAIRE-Seq, WGBS, RNA-Seq) to construct a comprehensive rice Encyclopedia of DNA Elements database (riceENCODE). This database contains 694 datasets, which is the largest one for rice epigenomics up to our best knowledge, which can help us to outline the rice epigenome in a comprehensive way, and provide an important platform for studying rice molecular breeding, genetic mechanisms, tissue specificities and subgroup properties in epigenetic regulation.
Long non‐coding RNAs (lncRNAs) are referred as RNA molecules with length of at least 200 nucleotides (nt) and usually have low protein‐coding potential. In plants, emerging evidence indicate that lncRNAs function as key modulators in development and stress response at the epigenetic, transcriptional and post‐transcriptional levels. Here, we developed a database, RiceLncPedia (http://3dgenome.hzau.edu.cn/RiceLncPedia), to systematically characterize rice lncRNAs with expression profile and multi‐omic features to facilitate the understanding and research of rice lncRNAs, including as follows: (i) lncRNA expression profiles in various tissues, development stages and stress treatments; (ii) lncRNA associations with genome variations; (iii) the linkage of lncRNAs with phenotypes; (iv) the overlap information of lncRNAs and transposon elements; and (v) the lncRNAs predicted as miRNA targets or miRNA precursors.