lncrnapy.algorithms
Algorithm
Contains Algorithm base class for the classificiation of RNA transcritps as either protein-coding or long non-coding.
- class lncrnapy.algorithms.algorithm.Algorithm(model, feature_extractors, used_features=None)
Base class for algorithms for the classification of RNA transcripts as either protein-coding or long non-coding.
- `model`
Underlying classification model. Can be a trained torch.nn.Module object with a single, sigmoid-activated output node (lncrnapy Model recommended.), or a scikit-learn-style model with a .fit and .classify method.
- `feature_extractors`
Feature extractor or list of feature extractors that are applied to the data if a feature in used_features is missing in the input.
- Type:
list
- `used_features`
Specifies which feature names (data columns) serve as input variables for the model. If None, will use all features from feature_extractors.
- Type:
list[str]
- feature_extraction(data)
Calls upon the object’s feature extractors if a feature in the used_features attribute is missing in data.
- fit(data)
Fits model on data, extracting features first if necessary. Will only fit on features as specified in the used_features attribute. This method is disabled when the model attribute is a torch.nn.Module instance.
- predict(data)
Classifies data, extracting features first if necessary. Will only use features as specified in the used_features attribute.
Traditional
Pre-implemented coding/non-coding RNA classifiers.
Note that all classifiers that are based on algorithms presented in related works should be considered as loose adaptations. We do not guarantee that our implementations achieve the exact same performance as that of the original works.
- class lncrnapy.algorithms.traditional.CNCI(ant_ref)
Coding Non-Coding Index (CNCI)
References
CNCI: Sun et al. (2013) https://doi.org/10.1093/nar/gkt646
- class lncrnapy.algorithms.traditional.CNIT(ant_ref)
Coding-Non-Coding Identifying Tool (CNIT)
References
CNIT: Guo et al. (2019) https://doi.org/10.1093/nar/gkz400
- class lncrnapy.algorithms.traditional.CPAT(fickett_ref, hexamer_ref)
Coding-Potential Assesment Tool (CPAT).
References
CPAT: Wang et al. (2013) https://doi.org/10.1093/nar/gkt006
- class lncrnapy.algorithms.traditional.CPC(database, **kwargs)
Adaptation of Coding Potential Calculator (CPC). This adaptation differs in two ways from the original: 1) we use ORF length instead of the log-odds score; 2) we do not make us of the ORF integrity feature, as all identified ORFs will have a start and stop codon.
References
CPC: Kong et al. (2007) https://doi.org/10.1093/nar/gkm391
- class lncrnapy.algorithms.traditional.CPC2(fickett_ref)
Adaptation of Coding Potential Calculator version 2 (CPC2). An important difference between this implementation and the original is that we do not make us of the ORF integrity feature, as all ORFs will have a start and stop codon.
References
CPC2: Kang et al. (2017) https://doi.org/10.1093/nar/gkx428
- class lncrnapy.algorithms.traditional.CPPred(fickett_ref, hexamer_ref)
Adaptation of Coding Potential Prediction (CPPred). We replace C and T of CTD features with mono- and dimer frequencies, respectively.
References
CPPred: Tong et al. (2019) https://doi.org/10.1093/nar/gkz087
- class lncrnapy.algorithms.traditional.DeepCPP(fickett_ref, hexamer_ref, zhang_ref)
Deep neural network for coding potential prediction (DeepCPP).
References
DeepCPP: Zhang et al. (2020) https://doi.org/10.1093/bib/bbaa039
- class lncrnapy.algorithms.traditional.FEELnc(kmer_refs)
FlExible Extraction of LncRNAs (FEELnc)
References
FEELnc: Wucher et al. (2017) https://doi.org/10.1093/nar/gkw1306
- class lncrnapy.algorithms.traditional.LncADeep(fickett_ref, hexamer_ref, database, **kwargs)
Adaptation of the LncADeep algorithm, a feature-based classifier based on a Deep Belief Network. We replace HMMER with BLASTX, and implement the full-length variant as explained in the paper.
References
LncADeep: Yang et al. (2018) https://doi.org/10.1093/bioinformatics/bty428
- class lncrnapy.algorithms.traditional.LncFinder(orf_6mer_ref, acguD_4mer_ref, acguACGU_3mer_ref)
LncFinder algorithm. Utilizes log distance of (ORF) k-mer frequencies, as well as secondary structure elements and EIIP-derived physico-chemical features.
References
LncFinder: Han et al. (2018) https://doi.org/10.1093/bib/bby065
- class lncrnapy.algorithms.traditional.PLEK
Predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme (PLEK)
References
PLEK: Li et al. (2014) https://doi.org/10.1186/1471-2105-15-311
- feature_extraction(data)
Calls upon the object’s feature extractors if a feature in the used_features attribute is missing in data.
Altered for PLEK algorithm to prevent requiring special PLEK nucleotide frequencies (will calculate them instead).
- class lncrnapy.algorithms.traditional.PLncPro(database, **kwargs)
Plant Long Non-Coding RNA Prediction by Random fOrest
References
PLncPro: Singh et al. (2017) https://doi.org/10.1093/nar/gkx866
- class lncrnapy.algorithms.traditional.iSeeRNA(database, **kwargs)
Adaptation of the iSeeRNA algorithm. An important difference is that the conservation score feature is replaced by the number of BLASTX hits.
References
iSeeRNA: Sun et al. (2013) https://doi.org/10.1186/1471-2164-14-S2-S7