Gk-arrays are provided as a simple-to-use C++ library dedicated to queries on large collection of sequences as produced by high-throughput sequencers (e.g. HiSeq 2000 from Illumina, 454 from Roche).
Gk-arrays index k-mers of reads and allow to answer different queries on that read collection (e.g. how many reads share this k-mer? where does this k-mer occur in the read collection?).
Gk-arrays consist of a space-efficient alternative to hash tables while being similar in terms of query times.
Gk-arrays is a work by Nicolas Philippe, Mikaël Salson, Thierry Lecroq, Martine Léonard, Thérèse Commes and Éric Rivals. It has been published in the BMC Bioinformatics journal. If you use this work, please don't forget to cite this paper.
Gk-arrays source code is distributed under the GPL-compliant CeCILL-C license.
A very simple test file can be downloaded from here.
Once the library is installed, you can compile the test file using e.g.
g++ -Wall -pedantic -O3 testGkArrays.cpp -o testGkArrays -lGkArrays
.
Another test file (measuring the query time) is also included in the source
code under the src
directory.
The installation will create you a test
executable (called buildTables
) and a library
that could be used in any of your programs.
Note: the library usage is simplified since version 1.0.0. If necessary, you can see the details for using previous versions.
./configure
make
make install
as an administratorldconfig
as an administrator.
You can specify parameters to the configure script.
For instance you can choose to build a static version (quicker) of the library
rather than a shared version. Typing ./configure --help
will
provide you the list of available options.
deb
package
You just need to install the package using a dedicated program on your
distribution or by typing dpkg -i package-name
.
Inside the archive, you will find under the doc
directory
a documentation on how to use the Gk-arrays in your code with a simple
example.
A full documentation of the library is available online or as a downloable PDF.