lncrnapy.algorithms

Algorithm

Contains Algorithm base class for the classificiation of RNA transcritps as either protein-coding or long non-coding.

class lncrnapy.algorithms.algorithm.Algorithm(model, feature_extractors, used_features=None)

Base class for algorithms for the classification of RNA transcripts as either protein-coding or long non-coding.

`model`

Underlying classification model. Can be a trained torch.nn.Module object with a single, sigmoid-activated output node (lncrnapy Model recommended.), or a scikit-learn-style model with a .fit and .classify method.

`feature_extractors`

Feature extractor or list of feature extractors that are applied to the data if a feature in used_features is missing in the input.

Type:

list

`used_features`

Specifies which feature names (data columns) serve as input variables for the model. If None, will use all features from feature_extractors.

Type:

list[str]

feature_extraction(data)

Calls upon the object’s feature extractors if a feature in the used_features attribute is missing in data.

fit(data)

Fits model on data, extracting features first if necessary. Will only fit on features as specified in the used_features attribute. This method is disabled when the model attribute is a torch.nn.Module instance.

predict(data)

Classifies data, extracting features first if necessary. Will only use features as specified in the used_features attribute.

Traditional

Pre-implemented coding/non-coding RNA classifiers.

Note that all classifiers that are based on algorithms presented in related works should be considered as loose adaptations. We do not guarantee that our implementations achieve the exact same performance as that of the original works.

class lncrnapy.algorithms.traditional.CNCI(ant_ref)

Coding Non-Coding Index (CNCI)

References

CNCI: Sun et al. (2013) https://doi.org/10.1093/nar/gkt646

class lncrnapy.algorithms.traditional.CNIT(ant_ref)

Coding-Non-Coding Identifying Tool (CNIT)

References

CNIT: Guo et al. (2019) https://doi.org/10.1093/nar/gkz400

class lncrnapy.algorithms.traditional.CPAT(fickett_ref, hexamer_ref)

Coding-Potential Assesment Tool (CPAT).

References

CPAT: Wang et al. (2013) https://doi.org/10.1093/nar/gkt006

class lncrnapy.algorithms.traditional.CPC(database, **kwargs)

Adaptation of Coding Potential Calculator (CPC). This adaptation differs in two ways from the original: 1) we use ORF length instead of the log-odds score; 2) we do not make us of the ORF integrity feature, as all identified ORFs will have a start and stop codon.

References

CPC: Kong et al. (2007) https://doi.org/10.1093/nar/gkm391

class lncrnapy.algorithms.traditional.CPC2(fickett_ref)

Adaptation of Coding Potential Calculator version 2 (CPC2). An important difference between this implementation and the original is that we do not make us of the ORF integrity feature, as all ORFs will have a start and stop codon.

References

CPC2: Kang et al. (2017) https://doi.org/10.1093/nar/gkx428

class lncrnapy.algorithms.traditional.CPPred(fickett_ref, hexamer_ref)

Adaptation of Coding Potential Prediction (CPPred). We replace C and T of CTD features with mono- and dimer frequencies, respectively.

References

CPPred: Tong et al. (2019) https://doi.org/10.1093/nar/gkz087

class lncrnapy.algorithms.traditional.DeepCPP(fickett_ref, hexamer_ref, zhang_ref)

Deep neural network for coding potential prediction (DeepCPP).

References

DeepCPP: Zhang et al. (2020) https://doi.org/10.1093/bib/bbaa039

class lncrnapy.algorithms.traditional.FEELnc(kmer_refs)

FlExible Extraction of LncRNAs (FEELnc)

References

FEELnc: Wucher et al. (2017) https://doi.org/10.1093/nar/gkw1306

class lncrnapy.algorithms.traditional.LncADeep(fickett_ref, hexamer_ref, database, **kwargs)

Adaptation of the LncADeep algorithm, a feature-based classifier based on a Deep Belief Network. We replace HMMER with BLASTX, and implement the full-length variant as explained in the paper.

References

LncADeep: Yang et al. (2018) https://doi.org/10.1093/bioinformatics/bty428

class lncrnapy.algorithms.traditional.LncFinder(orf_6mer_ref, acguD_4mer_ref, acguACGU_3mer_ref)

LncFinder algorithm. Utilizes log distance of (ORF) k-mer frequencies, as well as secondary structure elements and EIIP-derived physico-chemical features.

References

LncFinder: Han et al. (2018) https://doi.org/10.1093/bib/bby065

class lncrnapy.algorithms.traditional.PLEK

Predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme (PLEK)

References

PLEK: Li et al. (2014) https://doi.org/10.1186/1471-2105-15-311

feature_extraction(data)

Calls upon the object’s feature extractors if a feature in the used_features attribute is missing in data.

Altered for PLEK algorithm to prevent requiring special PLEK nucleotide frequencies (will calculate them instead).

class lncrnapy.algorithms.traditional.PLncPro(database, **kwargs)

Plant Long Non-Coding RNA Prediction by Random fOrest

References

PLncPro: Singh et al. (2017) https://doi.org/10.1093/nar/gkx866

class lncrnapy.algorithms.traditional.iSeeRNA(database, **kwargs)

Adaptation of the iSeeRNA algorithm. An important difference is that the conservation score feature is replaced by the number of BLASTX hits.

References

iSeeRNA: Sun et al. (2013) https://doi.org/10.1186/1471-2164-14-S2-S7