Usage¶
How-to-use¶
- Pre-process:
scikit-ribo-build.py
- Fit model:
scikit-ribo-run.py
Detailed usage¶
Pre-process:
scikit-ribo-build.py
:scikit-ribo-build.py -g gtf-file -f fasta-file -p prefix -r rna-fold-file -t TPM-file -o index-path
required arguments:
-g G Gtf file, required
-f F Fasta file, required
-p P Prefix to use, required
-r R Rnafold file, required
-t T TPM of RNAseq sample, required
-o O Output path of the built indexes, required
Fit model:
scikit-ribo-run.py
:scikit-ribo-run.py -i bam-file -f index-path -p prefix -o output-path
required arguments:
-i I Input bam file, required
-f F path to the Folder of BED/index files generated by the pre-processing module, required
-p P Prefix for BED/index files, required
-o O Output path, recommend using the sample id, required
optional arguments:
-h, --help show this help message and exit
-q Q minimum mapQ allowed, Default: 20
-s S Shortest read length allowed, Default: 10
-l L Longest read length allowed, Default: 35
-c enable cross validation for glmnet
-r setting this flag will enable the RelE mode
-u U Un-mappable regions
For more information, please refer to the template shell script about details of executing the two modules.
Preparing Input Data¶
RNAfold:
call_rnafold.py
:python call_rnafold.py -f <prefix>.expandCDS.fasta -r rnafold-binary -p prefix -n num-processes -o output-folder
Resulting file: <output-folder>/<prefix>.rnafold_lbox.txt Pre-built files can be also downloaded from here: https://github.com/hanfang/scikit-ribo/tree/master/data/prebuilt_rnafold
- TPM: Gene-level quatification from Salmon or Kallisto
Output format¶
- Translation efficiency estimates for the genes: genesTE.csv
gene | log2_TE |
---|---|
YAL001C | -0.7444 |
YAL002W | -0.9811 |
YAL003W | 2.0833 |
... | ... |
- Translation elongation rate for 61 sense codons: codons.csv
codon | codon_dwell_time |
---|---|
AAA | 0.9795 |
AAC | -0.9811 |
... | ... |
TTT | 2.0833 |
- Diagnostic plots of the models
asite_feature_importances.pdf
: Feature importance plot for the random forest classifier.asite_roc.pdf
: ROC curve for the random forest classifier.asite_3offset.pdf
: Distribution of A-site by read length and 3’ phaseasite_5offset.pdf
: Distribution of A-site by read length and 5’ pahse- (optional)
riboseq.lambda_cv.pdf
: If cross-validation is enabled, the cross-validation curve is plotted.