Frequently asked questions


The TRE-BED format consists of one line per divergent transcriptional regulatory element (TRE), each containing 6 required columns of data:
  1. chrom - name of the chromosome or scaffold. Any valid seq_region_name can be used, and chromosome names can be given with or without the 'chr' prefix.
  2. chromStart - Start position of the feature in standard chromosomal coordinates (i.e. first base is 0).
  3. chromEnd - End position of the feature in standard chromosomal coordinates
  4. Name - name of the element or a placeholder like .
  5. fwdTSS - Position of the Transcription Start Site (TSS) on the forward strand
  6. revTSS - Position of the Transcription Start Site (TSS) on the reverse strand
Please note: By default, our web portal assumes the input TREs are in divergent manners (have transcription activities on both the forward and reverse strands), but in certain cases, for example, sequencing depth is not high enough that your signal tracks suggest specific TREs are in the unidirectional manner (only one peak on the forward/reverse strand); in this case, you should put a -1 at the fourth or fifth column. For example, let's assume you have a TRE, whose coordinate is chr1:123456-123789, and it's only been observed to have transcription activity on the forward strand (TSS 123555), then the corresponding dTRE record should be:
chr1	123456	123789	.	123555	-1
To classify TREs into distal and proximal, two files are required:
  • Reference bed file for promoter regions (proximal regions), for example, we provide promoter (protein-coding genes) annotations based-on GENCODE v24, which can be downloaded from here.
  • bedtools
To obtain proximal TREs from the mixed dTRE file, you can use the following command:
bedtools intersect -a test.dtrebed -b promoters_1kb_tss_centered.bed -u > test.proximal.dtrebed
To obtain distal (also genic) TREs from the mixed dTRE file, you can use the following command:
bedtools intersect -a test.dtrebed -b promoters_1kb_tss_centered.bed -v > test.distal.dtrebed
  • Genome assembly - GRCh38
  • Gene annotation - GENCODE v24
  • Mutation - dbSNP 153 (comprehensive)
  • Transcription Binding - JASPAR 2022
  • Epigenomic annotation - Candidate cis-Regulatory Element v3

The default annotations files downloaded from our catalog database are organized in bed-like format. Below are the columns that you will find in these annotation files:

  1. Chromosome name
  2. Start position
  3. End position
  4. Position of the major TSS on the forward strand. Multiple major TSSs will be contacted by @. (TSS1@TSS2)
  5. Position of the major TSS on the reverse strand. Multiple major TSSs will be contacted by @. (TSS1@TSS2)
  6. Functional characterization status. Suppose this element is validated by functional characterization experiments (CRISPR, MPRA, STARR-seq). In that case, we write each piece of evidence in this column in the format of AssayName(PubMedID), and multiple pieces of evidence will be concatenated by @ (for example, MPRA(27259154)@CRISPR(31784727)); otherwise, we will put NY (Not Yet) here.
  7. Epigenomic signal. Suppose an element is enriched for DNase-seq, H3K27ac ChIP-seq, H3K4me3 ChIP-seq, or CTCF ChIP-seq signal, we put DNase, H3K27ac, H3K4me3, and CTCF in this column respectively. If multiple epigenomic signals are enriched, we concatenate them by @, and if none of these signals are enriched, we put Other in this column.
  8. Core promoter elements. For divergent TREs, we annotate whether they have initiator sequence, TATA-box, or downstream promoter region (DPR) in each direction (+ for forward/plus strand, - for reverse/minus strand). If no core promoter element is found in this TRE, we put Other in this column.
  9. Coordinates of core promoter elements. Formatted as pl_DPR_start-pl_DPR_end;mn_DPR_start-mn_DPR_end;pl_TATA_start-pl_TATA_end;mn_TATA_start-mn_TATA_end;pl_Inr_start-pl_Inr_end;mn_Inr_start-mn_Inr_end. If our portal cannot find any of these core promoters, the start and end coordinates will be -1.
  10. Transcription factors that have JASPAR-predicted binding sites in this element. Different transcription factors are separated by ;.
  11. Summary of variants.
If you find our portal helpful, please cite https://www.nature.com/articles/s41587-022-01211-7 in your work.

Have other questions?