Usage

How-to-use

  • Pre-process: scikit-ribo-build.py
  • Fit model: scikit-ribo-run.py

Detailed usage

  • Pre-process: scikit-ribo-build.py:

    scikit-ribo-build.py -g gtf-file -f fasta-file -p prefix -r rna-fold-file -t TPM-file -o index-path
    

required arguments:

-g G        Gtf file, required
-f F        Fasta file, required
-p P        Prefix to use, required
-r R        Rnafold file, required
-t T        TPM of RNAseq sample, required
-o O        Output path of the built indexes, required
  • Fit model: scikit-ribo-run.py:

    scikit-ribo-run.py -i bam-file -f index-path -p prefix -o output-path
    

required arguments:

-i I        Input bam file, required
-f F        path to the Folder of BED/index files generated by the pre-processing module, required
-p P        Prefix for BED/index files, required
-o O        Output path, recommend using the sample id, required

optional arguments:

-h, --help  show this help message and exit
-q Q        minimum mapQ allowed, Default: 20
-s S        Shortest read length allowed, Default: 10
-l L        Longest read length allowed, Default: 35
-c          enable cross validation for glmnet
-r          setting this flag will enable the RelE mode
-u U        Un-mappable regions

For more information, please refer to the template shell script about details of executing the two modules.

Preparing Input Data

  • RNAfold: call_rnafold.py:

    python call_rnafold.py  -f <prefix>.expandCDS.fasta -r rnafold-binary -p prefix -n num-processes -o output-folder
    

Resulting file: <output-folder>/<prefix>.rnafold_lbox.txt Pre-built files can be also downloaded from here: https://github.com/hanfang/scikit-ribo/tree/master/data/prebuilt_rnafold

  • TPM: Gene-level quatification from Salmon or Kallisto

Output format

  • Translation efficiency estimates for the genes: genesTE.csv
gene log2_TE
YAL001C -0.7444
YAL002W -0.9811
YAL003W 2.0833
... ...
  • Translation elongation rate for 61 sense codons: codons.csv
codon codon_dwell_time
AAA 0.9795
AAC -0.9811
... ...
TTT 2.0833
  • Diagnostic plots of the models
  1. asite_feature_importances.pdf: Feature importance plot for the random forest classifier.
  2. asite_roc.pdf: ROC curve for the random forest classifier.
  3. asite_3offset.pdf: Distribution of A-site by read length and 3’ phase
  4. asite_5offset.pdf: Distribution of A-site by read length and 5’ pahse
  5. (optional) riboseq.lambda_cv.pdf: If cross-validation is enabled, the cross-validation curve is plotted.