Choose a suitable algorithm Run the prediction Motif analysis GO analysis Visualize your recent job Visualize your own loop file 3D contact data multi-omics data interested regions RNA-seq expression file E-P interaction for visualization Browser compatibility

EPI Prediction

EPIXplorer integrates 9 robust algorithms to perform the EPI Prediction , as well as provides practical guidance for users to choose the suitable one. To meet different requirements, the guidance is characterized into three types: By Model Type, By Input Type, and by Biosample.

Choose a suitable algorithm

By Model Type: based on the type of model, we categorize all the methods into unsupervised- and supervised-based. Slide your mouse to the selected model type, and the corresponding algorithm can be clicked. Here we choose “Distance-based” as an example, the web page showed the algorithm “PreSTIGE” can be selected, then click the name of algorithm to enter the corresponding tool page.
By Input Type: based on the different input conditions, we categorize all the methods into three types: 1) with both 3D contact data and multi-omics data, 2) without 3D contact data but with multi-omics data, 3) without 3D contact data nor multi-omics data. Slide your mouse to the selected input type, and the corresponding algorithm can be clicked. Here we choose “Hi-C” as an example, the web page showed the algorithm “JEME” and “3DPredictor” can be selected, then click the name of algorithm to enter the corresponding tool page.
By Biosample: EPIXplorer provides pre-trained model for 97 common cell lines, you can select your interested cell line, and click the name of algorithm to enter the corresponding tool page.

Run the prediction

Example 1: Enter the page of “PreSTIGE”, you can choose “Load an example” to fill the input box automatically or input your own settings. Then click “submit” to get your prediction results.
Example 2: Enter the page of “LoopPredictor”, you can choose “Load an example” to fill the input box automatically.
Wait for a few seconds, when “Download results” button appear on the page, you can download your prediction results.
You can upload your own files to perform the prediction.

Downstream

We provide two kinds of downstream analysis tools for users to explore the regulatory function of loops.

Motif analysis

Enter the “Downstream” page from navigation bar on the top, then choose “Motif analysis”
You can load an example or upload your own files to perform the motif analysis, wait for a few seconds to get the results.
You can choose the enriched motifs from web page.
Then click “start visualization” to visualize the results.

GO Enrichment

Enter the “Downstream” page and choose “GO Enrichment”.
You can load an example (similar as before) or upload your own files, wait for a few seconds to get the results.
You can choose enriched GO terms from web page to visualize.

Loop visualization

If you want to visualize the predicted loops, please make sure the predicting process has been finished, and you’ve already got the “job-id”. If you want to visualize your own loop sets, please check out the guidance of format.

Visualize your recent job

Get into “Visualization” page from navigation bar on the top. You can load an example or paste your job-id.

Visualize your own loop file

You can load an example or upload your own loop file.
You can check the distribution of loops across the whole genome, choose one of the chromosomes to visualize.
The predicted loops are presented by blue curves, the annotated regulatory elements are presented by green and red squares. The risk SNPs are plotted with shallow blue lines. You can scroll your mouse to zoom in or zoom out.

Guidance of Format

3D contact data

The 3D contact data file should be .bedpe format with at least 6 columns, columns were separated by tab. The minimum columns should include the chrom name, start, end of each anchor, shown as below. If the file contains loop count, the count should be the 7th column. For the ChIA-PET and HiChIP data, the analysis results could be directly transformed to BEDPE format and used as input. For the Hi-C data, users need to call loops from the original interaction matrix, there were lots of computational methods available to facilitate the loop calling, such as Mustache (https://github.com/ay-lab/mustache), HiCExploer (https://hicexplorer.readthedocs.io/en/latest/), etc.
chr22	38290514	38294289	chr22	38680609	38682339
chr5	96033605	96042289	chr5	96259190	96260539
chr1	23665194	23673403	chr1	24097536	24108995
chr3	176676833	176679516	chr3	176741264	176748850
chr11	63604132	63609924	chr11	63751693	63756217
chr17	37005563	37012402	chr17	38801324	38806978
chr3	138311141	138315068	chr3	138482903	138484460
chr11	126078482	126084019	chr11	126210767	126227804

Multi-omics data

The multi-omics features data should be pre-processed, for ATAC-seq and ChIP-seq/CUT&RUN data, the format should be standard narrowPeak/broadPeak, shown as below.
chr22	16843445	16868802	.	322	.	2.120582	13.1	-1
chr22	17024793	17024896	.	985	.	11.483429	2.8	-1
chr22	17038424	17038594	.	854	.	9.633610	5.0	-1
chr22	17050044	17050593	.	465	.	4.143174	2.0	-1
chr22	17050418	17050537	.	984	.	11.468583	4.2	-1
chr22	17066392	17067403	.	892	.	10.169340	14.8	-1
chr22	17067959	17068242	.	878	.	9.966456	13.3	-1
chr22	17068652	17068827	.	835	.	9.358364	4.9	-1
For RNA-seq data, we recommend two columns file with gene and counts/normalized expression value, separated by tab ,shown as below.
                            TSPAN6	0.04
                            TNMD	0
                            DPM1	86.52
                            SCYL3	8.06
                            C1orf112	33.21
                            FUCA2	25
                            GCLC	11.72
                            NFYA	26.82
                            STPG1	2.99
                            NIPAL3	13.76
                            LAS1L	63.96
                            
For Methylation data, we recommend to download the .bedRrbs format of RRBS data from ENCODE which is shown as below,
chr1	1000170	1000171	K562_Rep3_RRBS	46	+	1000170	1000171	155,255,0	46	35
chr1	1000190	1000191	K562_Rep3_RRBS	46	+	1000190	1000191	105,255,0	46	15
chr1	1000191	1000192	K562_Rep3_RRBS	53	-	1000191	1000192	55,255,0	53	9
chr1	1000198	1000199	K562_Rep3_RRBS	46	+	1000198	1000199	105,255,0	46	20
chr1	1000199	1000200	K562_Rep3_RRBS	53	-	1000199	1000200	105,255,0	53	15
chr1	1000206	1000207	K562_Rep3_RRBS	53	-	1000206	1000207	155,255,0	53	26

interested regions

The interested region file is similar to .bed format with at least three columns separated by tab, including chrom name, start, end, shown as below. If you want to explore the loops close to active chromatin regions, peak file of Hi3k27ac ChIP-seq data would be a good choice. If there is no interest region input, the whole genome will be detected.
                                    chr1	11869	14409	+
                                    chr1	14404	29570	-
                                    chr1	17369	17436	-
                                    chr1	29554	31109	+
                                    chr1	30366	30503	+
                                    chr1	34554	36081	-
                                    chr1	52473	53312	+
                                    chr1	57598	64116	+
                                    chr1	65419	71585	+
                                    chr1	89295	133723	-
                                    chr1	89551	91105	-
                                    chr1	131025	134836	+
                                    chr1	135141	135895	-
                                    chr1	137682	137965	-
                                    chr1	139790	140339	-
                                    chr1	141474	173862	-
                                        

RNA-seq expression file for downstream

The RNA-seq expression file is used to filter the truly enriched motif, since some of detected motifs are false positive. The file should contain a two columns file with gene and counts/normalized expression value, separated by tab, shown as below.
                            TNMD	0
                            DPM1	86.52
                            SCYL3	8.06
                            C1orf112	33.21
                            FUCA2	25
                            GCLC	11.72
                            NFYA	26.82
                            STPG1	2.99
                            NIPAL3	13.76
                            LAS1L	63.96
                            

E-P interaction for visualization

The uploaded loops for visualization could be one of two types: with 6-7 columns or with 2-3 columns. The 6-7 columns file contain 6 or 7 columns separated by tab, including chrom1, start_pos1, end_1, chrom2, start_pos2, end_2, and count (optional), shown as below.
chr22	38290514	38294289	chr22	38680609	38682339
chr5	96033605	96042289	chr5	96259190	96260539
chr1	23665194	23673403	chr1	24097536	24108995
chr3	176676833	176679516	chr3	176741264	176748850
chr11	63604132	63609924	chr11	63751693	63756217
chr17	37005563	37012402	chr17	38801324	38806978
chr3	138311141	138315068	chr3	138482903	138484460
chr11	126078482	126084019	chr11	126210767	126227804
The 2-3 columns file contain 2 or 3 columns separated by tab, including regulatory element name or gene name of two anchors, loop count (optional) shown as below.
                        EH37E0152705	ENSG00000185220.7	0
                        EH37E0152707	ENSG00000185220.7	0
                        EH37E0152707	ENSG00000185220.7	0
                        EH37E0152707	ENSG00000185220.7	0
                        EH37E0327398	ENSG00000198824.4	0
                        EH37E0327404	ENSG00000198824.4	0
                        EH37E0357921	ENSG00000187156.4	1
                        EH37E0357922	ENSG00000187156.4	1
                        

Pre-trained model

We provide three types of pre-trained model to fit different requirements of users. Please choose the at least one model according to the feature number you owned.
*the accuracy was estimated under 10x10-fold cross validation.

Browser compatibility

We test the web compatibility as following:

Copyright @ 2016-CSU-Bioinformatics Group l All Right Reserved l

鲁ICP备2021005642号-1