CBcalc — Compositional Bias calculation

Input form

Fasta sequence:

Enter a sequence or choose fasta file(s), max 50 MiB.

List of sites:

Enter sites (whitespace-dilimited) or choose text file(s).

Method(s) of compositional bias calculation:


Input sequence

Input sequence should be in FASTA format. It could contain multiple entries. They are presumed to be separated parts of a single sequence. Any non-ACGT symbol will be treated as sequence break. The exception is space, newline, and gap ("-") symbols which are ignored.

Input sequence could be loaded as fasta file(s). Maximum total size is 50 MiB (50×10242 bytes).

Content of the chosen files and the text field will be joined.

List of sites

CBcalc could handle both continuous and bipartite sites which length does not exceed 10 bases. A bipartite site contains two continuous parts divided by multi-N spacer of fixed length. Length of a bipartite site is a sum of lengths of its parts. Gap length should not exceed 16. The sites could contain any DNA nucleotide symbols (A C G T W S M K R Y B D H V N). Empty lines are ignored.

The list of sites could be loaded as ASCII text file(s) (not RTF, DOC, DOCX, ODT, etc). They should contain one site per line. Empty lines are ignored.

Content of the chosen files and the text field will be joined.

Methods of calculation of compositional biases

There are four methods implemented: M0, MM, PBM, BCK. M0 is based on Bernoulli model of genome sequence, MM utilizes Markov chains of maximum applicable order (L−2, where L is a site length) as a sequence model. PBM was designed by Pevzner et al. to improve MM. BCK was suggested by Burge et al. It takes into account observed frequences of all subsites of a site.

Stand-alone version and source code

CBcalc has a stand-alone version. The source code is available at GitHub.