Efficient read indexing
 All Classes Functions
Public Member Functions | Static Public Member Functions
gkarrays::gkArrays Class Reference

List of all members.

Public Member Functions

 gkArrays (char *tags_file, uint threshold, bool use_bitvector=false, uint tag_length=0, bool stranded=false, uint nb_threads=1)
 gkArrays (char *tags_file1, char *tags_file2, uint threshold, bool use_bitvector=false, uint tag_length=0, bool stranded=false, uint nb_threads=1)
uintSA convertPposToQpos (uintSA i)
uintSA getEndPosOfTagNum (uint tag_num)
uintSA getGkCFA (uintSA i)
uintSA getGkCFALength ()
uintSA getGkISA (uintSA i)
uintSA getGkSA (uintSA i)
uintSA getGkSALength ()
uintSA getNbPposition (uintSA nb_reads)
uint getNbTags ()
uint getNbTagsWithFactor (uint tag_num, uint pos_factor, bool multiplicity=0)
uint getNbThreads ()
uint getPair (uint i)
uintSA getPosInCommon (uint tag_num, uint pos_factor)
readsReadergetReads ()
uintSA getStartPosOfTagNum (uint tag_num)
uintSA getStartQPosOfTagNum (uint tag_num)
uint * getSupport (uint i)
uint getSupportLength (uint i=0)
char * getTag (uint i)
uint getTagLength (uint i=0)
char * getTagFactor (uint i, uint p, uint l)
uint getTagNum (uintSA pos)
std::pair< uint, uint > getTagNumAndPosFromAbsolutePos (uintSA pos)
uint * getTagNumWithFactor (uint tag_num, uint pos_factor)
std::pair< uint, uint > * getTagsWithFactor (uint tag_num, uint pos_factor)
std::pair< uint, uint > * getTagsWithFactor (char *factor, uint factor_length, uint &nb_fact)
char * getTextFactor (uintSA pos, uint length)
uint getThreshold ()
array_type getType ()
bool isLarge ()
bool isPposition (uintSA pos)
bool isStranded ()
bool isTheFirstMemberOfPair (uint i)

Static Public Member Functions

static bool isDiscarded (uint actual_length, uint theoretical_length=0, uint k=0)

Constructor & Destructor Documentation

gkarrays::gkArrays::gkArrays ( char *  tags_file,
uint  threshold,
bool  use_bitvector = false,
uint  tag_length = 0,
bool  stranded = false,
uint  nb_threads = 1 

Construct the read index

tags_fileName of the file containg the reads
thresholdlength of k-mers we have to use
use_bitvector,:true iff we must store the array using a bit vector (slower but more space efficient)
tag_lengthlength of the reads. If a shorter read is found, it raises an error. If a longer read is found, only the prefix of tag_length characters is kept. If tag_length == 0 (default), just gess what the read length is.
stranded,:true iff we know which strand has been sequenced and, therefore, (for instance) AACG must not be considered as equal to its revcomp (CGTT).
nb_threadsallows to build GkSA on a multi-thread architecture
gkarrays::gkArrays::gkArrays ( char *  tags_file1,
char *  tags_file2,
uint  threshold,
bool  use_bitvector = false,
uint  tag_length = 0,
bool  stranded = false,
uint  nb_threads = 1 

Alternative to construct the read index with paired-end reads

tags_file1Name of the file containing the reads of the first pair
tags_file2Name of the file containing the reads of the second pair
thresholdlength of k-mers we have to use
use_bitvector,:true iff we must store the array using a bit vector (slower but more space efficient)
tag_lengthlength of the reads. If a shorter read is found, it raises an error. If a longer read is found, only the prefix of tag_length characters is kept. If tag_length == 0 (default), just gess what the read length is.
stranded,:true iff we know which strand has been sequenced and, therefore, (for instance) AACG must not be considered as equal to its revcomp (CGTT).
nb_threadsallows to build GkSA on a multi-thread architecture

Member Function Documentation

Convert a position from P-position to Q-position (if you do not understand this, please read our article!). That converts a position as in the concatenation of reads to the position in GkIFA (for example). In the article, values of GkSA are also renumbered to Q-position but we do not renumber them in practice (it is quite useless).

i,:a P-position
a Q-position
uintSA gkarrays::gkArrays::getEndPosOfTagNum ( uint  tag_num)

Gives the end position of a given read in the concatenation of reads.

tag_num,:tag number
the end position of the read #tag_num in C_R (the concatenation of reads)
uintSA gkarrays::gkArrays::getGkCFA ( uintSA  i)
ithe index position in the array (starting at 0).
the value of GkCFA at the given index ie. the number of k-factors of rank i, where i is the requested index.
the number of elements in the GkCFA array. In other terms it corresponds to the number of distinct k-mers in the input.
uintSA gkarrays::gkArrays::getGkISA ( uintSA  i)
ithe index position in the array (starting at 0).
the value of GkISA at the given index ie. the rank of the k-factor at position P-position i.
uintSA gkarrays::gkArrays::getGkSA ( uintSA  i)
ithe index position in the array (starting at 0).
the value of GkSA at the given index ie. the P-position of the k-factor whose rank is i
the number of entries in gkSA (ie. the number of P-positions)
uintSA gkarrays::gkArrays::getNbPposition ( uintSA  nb_reads)
the number of P-positions in Cr from a number of reads (fixed length or not) This function is available before the construction of gkSA.
the number of tags (or reads) indexed in the Gk Arrays
uint gkarrays::gkArrays::getNbTagsWithFactor ( uint  tag_num,
uint  pos_factor,
bool  multiplicity = 0 
tag_numThe number of the tag in the index
pos_factorPosition of the factor in the tag
multiplicityCounts (if false) only once a tag that contains the factor many times
Return the number of tags sharing the factor starting at position pos_factor in the tag tag_num. This is the number of elements returned by the function getTagsWithFactor(.)
the number of threads the GkArrays have been told to use. The threads can be used for the construction.
uint gkarrays::gkArrays::getPair ( uint  i)
iThe number of the tag in the index
the tag number of the paired-end read associated with i or -1 if reads are not paired-end.
uintSA gkarrays::gkArrays::getPosInCommon ( uint  tag_num,
uint  pos_factor 
the rank of the P-k factor starting at position pos_factor in the read number tag_num.
the object that allows to get a readIterator
uintSA gkarrays::gkArrays::getStartPosOfTagNum ( uint  tag_num)

Gives the start position of a given read in the concatenation of reads.

tag_num,:tag number
the start position of the read #tag_num in C_R (the concatenation of reads)
uintSA gkarrays::gkArrays::getStartQPosOfTagNum ( uint  tag_num)

Gives the start Q-position of a given read in the ISA array

tag_num,:tag number
the start Q-position of the read #tag_num in GkISA.
uint * gkarrays::gkArrays::getSupport ( uint  i)
iTag number
an array whose length is getSupportLength(i) and where the value at position k is the number of occurrences of the k-factor starting at position k in the reads among all the Pk-factors.

Return the length of the support.

getTagLength(i) - getThreshold()+1
char * gkarrays::gkArrays::getTag ( uint  i)
ithe read number to be retrieved
the read number i.
char * gkarrays::gkArrays::getTagFactor ( uint  i,
uint  p,
uint  l 
iThe number of the tag in the index
pPosition of the factor in the tag
lThe length of the factor
the factor at the position p in the tag number i
uint gkarrays::gkArrays::getTagLength ( uint  i = 0)
iTag number (if the length is not constant)
the length of the read.
uint gkarrays::gkArrays::getTagNum ( uintSA  pos)

Gives the number of a read

posa position in SA or in the concatenated sequence of reads
the read number where this position lies
std::pair< uint, uint > gkarrays::gkArrays::getTagNumAndPosFromAbsolutePos ( uintSA  pos)

Return the number of tag and the relative position in that tag corresponding to a given position in the concatenation of reads

posposition in the concatenation of reads
a pair whose fist element is the tag number and the second element is the position in the tag.
uint * gkarrays::gkArrays::getTagNumWithFactor ( uint  tag_num,
uint  pos_factor 
tag_numThe number of the tag in the index
pos_factorPosition of the factor in the tag
Return an array that contains each tag number where the factors matches.
The array is sorted
pair< uint, uint > * gkarrays::gkArrays::getTagsWithFactor ( uint  tag_num,
uint  pos_factor 
tag_numThe number of the tag in the index
pos_factorPosition of the factor in the tag
Return an array composed of pairs (tag, pos) corresponding to all the Pk-factors equal to the Pk-factor starting at position pos_factor in the tag tag_num.
The array is sorted according to read number and read position
pair< uint, uint > * gkarrays::gkArrays::getTagsWithFactor ( char *  factor,
uint  factor_length,
uint &  nb_fact 
factorthe pattern to be searched.
factor_lengththe length of the factor, should be <= getThreshold()
nb_factnb_fact is used to give the number of occurrences in the array.
Return an array composed of pairs (tag, pos) corresponding to all the Pk-factors equal to the k-factor factor
char * gkarrays::gkArrays::getTextFactor ( uintSA  pos,
uint  length 
posThe position from where we want to retrieve a text subtring. The position must be given in the original text (not the filtered one).
lengththe length of the substring to be retrieved.
text factor at position pos of length length. The returned string is NULL-terminated.
return the length of the k-factors (ie. return k).
the array type used for building GkSA and GkISA (either SMALL_ARRAY, LARGE_ARRAY or OPTIMAL_ARRAY).
bool gkarrays::gkArrays::isDiscarded ( uint  actual_length,
uint  theoretical_length = 0,
uint  k = 0 
) [static]
true iff the read is not suitable ie. if it is shorter than the specified length (if any) or shorter than the specified k-mer length.
true if the nbPposition > 2^32
bool gkarrays::gkArrays::isPposition ( uintSA  pos)
true iff the position does not lie in the threshold - 1 last characters of a read, ie. if it is a P-position.
true iff the GkArrays have been built as a strand-dependant index. Therefore a k-mer and its revcomp won't be considered as equal.
ithe number of the tag in the index
true if the tag is the first member of is pair in case of paired-end files. False either

The documentation for this class was generated from the following files:
 All Classes Functions