Getting started
Let’s suppose you already downloaded CRAC (if not, it is not too late!). We now explain how to install it and how to launch it (without entering details).
Installing CRAC
From the deb
package
You just need to install the package using a dedicated program on your
distribution or by typing dpkg -i package-name
, where package-name
must be replaced by the name of your package. This will install crac
as well as crac-index
(to create indexes working with CRAC) on your
system.
From the source code
- Unpack the archive
- Enter the directory
crac-version-number
- Type
./configure
- If everything went fine, run
make
- You may want to check everything is ok by running
make check
- Finally, you can install the software on your system by running
make install
If the configure step failed, that may be due to a missing library. In
particular the zlib
is needed. On a Debian or Debian-like system,
you’ll need to install zlib1g, zlib1g-dev
.
Using CRAC
CRAC relies on a pre-computed index of the genome, as Bowtie or BWA do. The first step is therefore to build such an index, if it is not done yet.
Indexing
If you still need to index your genome, we explain it in the following:
The crac-index
program is the one that creates an index for a genome.
For creating such an index, you must launch a command as this one:
crac-index index myIndex sequence1.fa sequence2.fa sequence3.fa
The first parameter (index
) specifies that we want to create an index.
The second parameter (myIndex
) is the name of the index to be created.
The following parameters are FASTA or multi-FASTA files containing the
sequences to be indexed.
The creation of the index always generates two files: a .ssa file and a
.conf file. Please, note that the extensions must not be provided
neither to crac-index
nor to crac
.
If needed, the original sequences can be recovered using the CRAC index. Therefore you can delete the original FASTA files to save space. This recovery can be done using the following command:
crac-index get sequences.fa myIndex
This will output all the sequences indexed in myIndex.ssa in the file sequences.fa.
Launching CRAC
Once an index has been built, CRAC can be used with that index. CRAC must always be launched with at least three parameters.
-i
the name of the index (e.g. myIndex, recall that the extension must not be provided!)-r
the name of the FASTA or FASTQ file containing the reads (the input file may also be compressed using gzip)-k
the length of the k-mer to be used, we recommend to set k to 22 for the human genome for a better accuracy.
The value of k
is very important for the algorithm. You must not
underestimate it, otherwise the results will be of no utility. It must
be chosen to ensure (as much as possible) that a k-mer has a very high
probability to occur a single times on the genome.
You may also want some output to be created to know what was mapped,
what was not, and where. CRAC can produce a SAM/BAM file by specifying the
name of the SAM output file using the -o
parameter.
As an example, CRAC can be launched with those parameters:
crac -i myIndex -k 22 -r reads_1.fastq read_2.fastq --bam -o output.bam --nb-threads
10
In that example CRAC is launched on the genome indexed in myIndex, with 22-mers
on the paired-end reads stored in reads_1.fastq
reads_2.fastq
respectively.
The output is written in the output.bam file and the program is launched in
parallel on 10 threads.