To upload new genome references (or new genome versions), please follow these instructions:
1 - Download the genome and annotation references from your species of interest from Ensembl (https://ftp.ensembl.org/). To achieve this you should:
Select the directory ´´pub´´pub/´´
Select the release folder of your choice (e.g., release-110)
Select ´´gtf´´gtf/´´ folder or ´´dna´´dna/´´ folder to download annotation or genome files, respectively.
Select your organism of interest
Download the gtf.gz file for the annotation reference
Download the toplevel.fa.gz file for the genome reference
...
2 - Rename gtf and fasta files as follow:
gtf file - annotation_organism_ercc_sirv_biotyped.gtf
Example: Mus_musculus.GRCm39.110.gtf.gz renamed to annotation_organism_ercc_sirv_biotyped.gtf
fasta file - annotation_organism_ercc_sirv.fa
Example: Mus_musculus.GRCm39.dna.toplevel.fa.gz renamed to annotation_organism_ercc_sirv.fa
...
3 - Create a gene_description.txt file.
The gene_description.txt file is a tab separated file that contains two columns: Gene ID and Gene name.
To generate the gene_description.txt file, please use the command line presented below. This command line uses the annotation file to extract the Gene ID and Gene name information.
...
4 - Create a .bedfile from the gft gtf file.
To create a .bed file, please download the bedtools package tools gtfToGenePred and genePredToBed and use the following command linelines:
Code Block |
---|
gtf2bedgtfToGenePred < annotation_organism_ercc_sirv_biotyped.gtf > annotation_organism_ercc_sirv.genepred genePredToBed annotation_organism_ercc_sirv.genepred annotation_organism_ercc_sirv.bed |
...
5 - Generate a final directory containing both genome and annotation files. To do this you need to:
...
First, copy the files:
Code Block |
---|
cp gene_description.txt annotation_organism_ercc_sirv.fa annotation_organism_ercc_sirv_biotyped.gtf annotation_organism_ercc_sirv.bed ./Directory_folder/ |
Compress Then, compress the resulting folder (tar.gz):
Code Block |
---|
tar czfv Directory_folder.tar.gz Directory_folder |
...
6 - Upload the directory folder in Kangooroo as described in WORKING_COPYthe following FAQ: How do I upload my files in the Kangooroo platform?
7 - Tag the reference as a genome file.
By default, all uploaded files appear as have a designated type marked as “File”. To be used as a reference by the pipeline, the type of the uploaded file must be changed to ´´Genome´´ genome files must be marked as “Genome”. Please watch here our tutorial video on how to tag your file as a ´´Genome´´ type file here.To change the file type, click on the file type icon (Can we insert the icon here?) in the type column to switch between ´´Ffile´´ and ´´Ggenome´´. Once marked as “Ggenome”, the file can be used as a reference in all your projects´´Genome´´ type filehere.