Minimal Steps For LiftOver: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
 
mNo edit summary
Line 1: Line 1:
The following outlines the minimal steps I undertook to make a liftOver chain file  to convert annotations between bacterial genome builds (genome size ~4Mb).
This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.
 
The following describes the minimal steps needed to create a liftOver chain file  that converts annotations between bacterial genome builds (genome size ~4Mb).


[1] Split the query (NEW) genome build by FASTA record using faSplit:
[1] Split the query (NEW) genome build by FASTA record using faSplit:
Line 5: Line 7:
<code><pre>$ faSplit sequence NEW.build 2 chr</pre></code>
<code><pre>$ faSplit sequence NEW.build 2 chr</pre></code>


NOTES: There are 2 fasta records in the NEW build.  Therefore, I used the argument '2' to split the build into two files which each contain one FASTA record, chr0 and chr1. Also need to create .lft files that describe the two sequences.  If breaking the sequences into chunks using the 'size' parameter, just use the -lift option.  Otherwise, need to make your own .lft files, e.g. chr0.lft contains:
'''NOTES''': There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences.  If breaking the sequences into chunks using the 'size' parameter, just use the -lift option.  Otherwise, need to make your own .lft files. For example, chr0.lft contains:


<code><pre>
<code><pre>
Line 16: Line 18:
</pre></code>
</pre></code>


Also need to make chrom.sizes files what contain the sequence lengths of the FASTA records in the builds:
Setp [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:


<code>
<code>
Line 34: Line 36:
</pre></code>
</pre></code>


NOTES: Not using ooc file becuase genome build is small.  Probably a good idea to use a ooc file for larger genomes i.e. > X Mb (?).
'''NOTES:''' Not using ooc file becuase genome build is small.  Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.


[3] Use liftUp to change the coordinate system.  Requires .lft files created in step [1]:
[3] Use liftUp to change the coordinate system.  Requires .lft files created in step [1]:
Line 50: Line 52:
</pre></code>
</pre></code>


NOTES: Note the -psl argument.  This allows axtChain to accept psl as input.  I haven't tested this using blastz instead of BLAT.  I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.
'''NOTES''': Note the -psl argument.  This allows axtChain to accept psl as input.  I haven't tested this using blastz instead of BLAT.  I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.


[5] Combine and sort chain files from [4]:
[5] Combine and sort chain files from [4]:
Line 56: Line 58:
$ chainMergeSort *.chain | chainSplit chain stdin
$ chainMergeSort *.chain | chainSplit chain stdin
</pre></code>
</pre></code>
NOTES: This creates a directory 'chain' whicn contains a chr.chain file.  The OLD build use here only contained one chromosome.  Not sure if axtChain will create one chain file for each target chromosome.
'''NOTES''': This creates a directory 'chain' whicn contains a chr.chain file.  The OLD build use here only contained one chromosome.  Not sure if axtChain will create one chain file for each target chromosome.


[6] Make alignment nets from chains in [5]:
[6] Make alignment nets from chains in [5]:
Line 64: Line 66:
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null
</pre></code>
</pre></code>
NOTES: This step requires the chrom.sizes files from step [1].
'''NOTES''': This step requires the chrom.sizes files from step [1].


[7] Create liftOver chain file:
[7] Create liftOver chain file:

Revision as of 15:26, 10 June 2007

This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.

The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb).

[1] Split the query (NEW) genome build by FASTA record using faSplit:

$ faSplit sequence NEW.build 2 chr

NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:

0       chr 3061531 chr      3061531

where columns are:

start seq_name size seq_name size

Setp [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:

$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes

[2] BLAT query sequences from [1] against the OLD build:

$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap

NOTES: Not using ooc file becuase genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.

[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:

$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl 
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl 

[4] Chain together alignments from [3] using axtChain:

$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain

NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.

[5] Combine and sort chain files from [4]:

$ chainMergeSort *.chain | chainSplit chain stdin

NOTES: This creates a directory 'chain' whicn contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.

[6] Make alignment nets from chains in [5]:

$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null

NOTES: This step requires the chrom.sizes files from step [1].

[7] Create liftOver chain file:

$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain

[8] Use your new liftOver chain file:

$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped