Gk-arrays
Efficient read indexing
 All Classes Functions
Public Member Functions | Static Public Member Functions
gkarrays::gkArrays Class Reference

List of all members.

Public Member Functions

 gkArrays (char *tags_file, uint threshold, bool use_bitvector=false, uint tag_length=0, bool stranded=false, uint nb_threads=1)
 gkArrays (char *tags_file1, char *tags_file2, uint threshold, bool use_bitvector=false, uint tag_length=0, bool stranded=false, uint nb_threads=1)
uintSA convertPposToQpos (uintSA i)
uintSA getEndPosOfTagNum (uint tag_num)
uintSA getGkCFA (uintSA i)
uintSA getGkCFALength ()
uintSA getGkISA (uintSA i)
uintSA getGkSA (uintSA i)
uintSA getGkSALength ()
uintSA getNbPposition (uintSA nb_reads)
uint getNbTags ()
uint getNbTagsWithFactor (uint tag_num, uint pos_factor, bool multiplicity=0)
uint getNbThreads ()
uint getPair (uint i)
uintSA getPosInCommon (uint tag_num, uint pos_factor)
readsReadergetReads ()
uintSA getStartPosOfTagNum (uint tag_num)
uintSA getStartQPosOfTagNum (uint tag_num)
uint * getSupport (uint i)
uint getSupportLength (uint i=0)
char * getTag (uint i)
uint getTagLength (uint i=0)
char * getTagFactor (uint i, uint p, uint l)
uint getTagNum (uintSA pos)
std::pair< uint, uint > getTagNumAndPosFromAbsolutePos (uintSA pos)
uint * getTagNumWithFactor (uint tag_num, uint pos_factor)
std::pair< uint, uint > * getTagsWithFactor (uint tag_num, uint pos_factor)
std::pair< uint, uint > * getTagsWithFactor (char *factor, uint factor_length, uint &nb_fact)
char * getTextFactor (uintSA pos, uint length)
uint getThreshold ()
array_type getType ()
bool isLarge ()
bool isPposition (uintSA pos)
bool isStranded ()
bool isTheFirstMemberOfPair (uint i)

Static Public Member Functions

static bool isDiscarded (uint actual_length, uint theoretical_length=0, uint k=0)

Constructor & Destructor Documentation

gkarrays::gkArrays::gkArrays ( char *  tags_file,
uint  threshold,
bool  use_bitvector = false,
uint  tag_length = 0,
bool  stranded = false,
uint  nb_threads = 1 
)

Construct the read index

Parameters:
tags_fileName of the file containg the reads
thresholdlength of k-mers we have to use
use_bitvector,:true iff we must store the array using a bit vector (slower but more space efficient)
tag_lengthlength of the reads. If a shorter read is found, it raises an error. If a longer read is found, only the prefix of tag_length characters is kept. If tag_length == 0 (default), just gess what the read length is.
stranded,:true iff we know which strand has been sequenced and, therefore, (for instance) AACG must not be considered as equal to its revcomp (CGTT).
nb_threadsallows to build GkSA on a multi-thread architecture
gkarrays::gkArrays::gkArrays ( char *  tags_file1,
char *  tags_file2,
uint  threshold,
bool  use_bitvector = false,
uint  tag_length = 0,
bool  stranded = false,
uint  nb_threads = 1 
)

Alternative to construct the read index with paired-end reads

Parameters:
tags_file1Name of the file containing the reads of the first pair
tags_file2Name of the file containing the reads of the second pair
thresholdlength of k-mers we have to use
use_bitvector,:true iff we must store the array using a bit vector (slower but more space efficient)
tag_lengthlength of the reads. If a shorter read is found, it raises an error. If a longer read is found, only the prefix of tag_length characters is kept. If tag_length == 0 (default), just gess what the read length is.
stranded,:true iff we know which strand has been sequenced and, therefore, (for instance) AACG must not be considered as equal to its revcomp (CGTT).
nb_threadsallows to build GkSA on a multi-thread architecture

Member Function Documentation

Convert a position from P-position to Q-position (if you do not understand this, please read our article!). That converts a position as in the concatenation of reads to the position in GkIFA (for example). In the article, values of GkSA are also renumbered to Q-position but we do not renumber them in practice (it is quite useless).

Parameters:
i,:a P-position
Returns:
a Q-position
uintSA gkarrays::gkArrays::getEndPosOfTagNum ( uint  tag_num)

Gives the end position of a given read in the concatenation of reads.

Parameters:
tag_num,:tag number
Returns:
the end position of the read #tag_num in C_R (the concatenation of reads)
uintSA gkarrays::gkArrays::getGkCFA ( uintSA  i)
Parameters:
ithe index position in the array (starting at 0).
Returns:
the value of GkCFA at the given index ie. the number of k-factors of rank i, where i is the requested index.
Returns:
the number of elements in the GkCFA array. In other terms it corresponds to the number of distinct k-mers in the input.
uintSA gkarrays::gkArrays::getGkISA ( uintSA  i)
Parameters:
ithe index position in the array (starting at 0).
Returns:
the value of GkISA at the given index ie. the rank of the k-factor at position P-position i.
uintSA gkarrays::gkArrays::getGkSA ( uintSA  i)
Parameters:
ithe index position in the array (starting at 0).
Returns:
the value of GkSA at the given index ie. the P-position of the k-factor whose rank is i
Returns:
the number of entries in gkSA (ie. the number of P-positions)
uintSA gkarrays::gkArrays::getNbPposition ( uintSA  nb_reads)
Returns:
the number of P-positions in Cr from a number of reads (fixed length or not) This function is available before the construction of gkSA.
Returns:
the number of tags (or reads) indexed in the Gk Arrays
uint gkarrays::gkArrays::getNbTagsWithFactor ( uint  tag_num,
uint  pos_factor,
bool  multiplicity = 0 
)
Parameters:
tag_numThe number of the tag in the index
pos_factorPosition of the factor in the tag
multiplicityCounts (if false) only once a tag that contains the factor many times
Returns:
Return the number of tags sharing the factor starting at position pos_factor in the tag tag_num. This is the number of elements returned by the function getTagsWithFactor(.)
Returns:
the number of threads the GkArrays have been told to use. The threads can be used for the construction.
uint gkarrays::gkArrays::getPair ( uint  i)
Parameters:
iThe number of the tag in the index
Returns:
the tag number of the paired-end read associated with i or -1 if reads are not paired-end.
uintSA gkarrays::gkArrays::getPosInCommon ( uint  tag_num,
uint  pos_factor 
)
Returns:
the rank of the P-k factor starting at position pos_factor in the read number tag_num.
Returns:
the object that allows to get a readIterator
uintSA gkarrays::gkArrays::getStartPosOfTagNum ( uint  tag_num)

Gives the start position of a given read in the concatenation of reads.

Parameters:
tag_num,:tag number
Returns:
the start position of the read #tag_num in C_R (the concatenation of reads)
uintSA gkarrays::gkArrays::getStartQPosOfTagNum ( uint  tag_num)

Gives the start Q-position of a given read in the ISA array

Parameters:
tag_num,:tag number
Returns:
the start Q-position of the read #tag_num in GkISA.
uint * gkarrays::gkArrays::getSupport ( uint  i)
Parameters:
iTag number
Returns:
an array whose length is getSupportLength(i) and where the value at position k is the number of occurrences of the k-factor starting at position k in the reads among all the Pk-factors.

Return the length of the support.

Returns:
getTagLength(i) - getThreshold()+1
char * gkarrays::gkArrays::getTag ( uint  i)
Parameters:
ithe read number to be retrieved
Returns:
the read number i.
char * gkarrays::gkArrays::getTagFactor ( uint  i,
uint  p,
uint  l 
)
Parameters:
iThe number of the tag in the index
pPosition of the factor in the tag
lThe length of the factor
Returns:
the factor at the position p in the tag number i
uint gkarrays::gkArrays::getTagLength ( uint  i = 0)
Parameters:
iTag number (if the length is not constant)
Returns:
the length of the read.
uint gkarrays::gkArrays::getTagNum ( uintSA  pos)

Gives the number of a read

Parameters:
posa position in SA or in the concatenated sequence of reads
Returns:
the read number where this position lies
std::pair< uint, uint > gkarrays::gkArrays::getTagNumAndPosFromAbsolutePos ( uintSA  pos)

Return the number of tag and the relative position in that tag corresponding to a given position in the concatenation of reads

Parameters:
posposition in the concatenation of reads
Returns:
a pair whose fist element is the tag number and the second element is the position in the tag.
uint * gkarrays::gkArrays::getTagNumWithFactor ( uint  tag_num,
uint  pos_factor 
)
Parameters:
tag_numThe number of the tag in the index
pos_factorPosition of the factor in the tag
Returns:
Return an array that contains each tag number where the factors matches.
Postcondition:
The array is sorted
pair< uint, uint > * gkarrays::gkArrays::getTagsWithFactor ( uint  tag_num,
uint  pos_factor 
)
Parameters:
tag_numThe number of the tag in the index
pos_factorPosition of the factor in the tag
Returns:
Return an array composed of pairs (tag, pos) corresponding to all the Pk-factors equal to the Pk-factor starting at position pos_factor in the tag tag_num.
Postcondition:
The array is sorted according to read number and read position
pair< uint, uint > * gkarrays::gkArrays::getTagsWithFactor ( char *  factor,
uint  factor_length,
uint &  nb_fact 
)
Parameters:
factorthe pattern to be searched.
factor_lengththe length of the factor, should be <= getThreshold()
nb_factnb_fact is used to give the number of occurrences in the array.
Returns:
Return an array composed of pairs (tag, pos) corresponding to all the Pk-factors equal to the k-factor factor
char * gkarrays::gkArrays::getTextFactor ( uintSA  pos,
uint  length 
)
Parameters:
posThe position from where we want to retrieve a text subtring. The position must be given in the original text (not the filtered one).
lengththe length of the substring to be retrieved.
Returns:
text factor at position pos of length length. The returned string is NULL-terminated.
Returns:
return the length of the k-factors (ie. return k).
Returns:
the array type used for building GkSA and GkISA (either SMALL_ARRAY, LARGE_ARRAY or OPTIMAL_ARRAY).
bool gkarrays::gkArrays::isDiscarded ( uint  actual_length,
uint  theoretical_length = 0,
uint  k = 0 
) [static]
Returns:
true iff the read is not suitable ie. if it is shorter than the specified length (if any) or shorter than the specified k-mer length.
Returns:
true if the nbPposition > 2^32
bool gkarrays::gkArrays::isPposition ( uintSA  pos)
Returns:
true iff the position does not lie in the threshold - 1 last characters of a read, ie. if it is a P-position.
Returns:
true iff the GkArrays have been built as a strand-dependant index. Therefore a k-mer and its revcomp won't be considered as equal.
Parameters:
ithe number of the tag in the index
Returns:
true if the tag is the first member of is pair in case of paired-end files. False either

The documentation for this class was generated from the following files:
 All Classes Functions