Conservation Track QA: Difference between revisions

From genomewiki
Jump to navigationJump to search
(Replacing page with 'This page is no longer maintained.')
 
(38 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== QAing Conservation and Most Conserved tracks for a new assembly ==
This page is no longer maintained.
(using the ponAbe2 (orangutan) assembly as an example throughout)<BR>
 
=== Make a list of tables: ===
* multiz8way
* multiz8wayFrames
* multiz8waySummary
* phastCons8way
* phastConsElements8way
* seq
* extFile
 
=== Make a list of files to go to hgnfs1: ===
* /gbdb/ponAbe2/multiz8way/phastCons8way.wib
* /gbdb/ponAbe2/multiz8way/anno/maf/*
 
=== Make a list of files to go to hgdownload: ===
* /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/*.*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/phastCons8way/phastConsScores/*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/multiz8way/*.*
* /usr/local/apache/htdocs/goldenPath/ponAbe2/multiz8way/maf/*
 
=== Make a list of all organisms in the Conservation track: ===
{|
| orangutan || Pan troglodytes || July 2007 || ponAbe2
|-
| human || Homo sapiens || Mar 2006 || hg18
|-
| chimpanzee || Pan troglodytes || Mar 2006 || panTro2
|-
| rhesus || Macaca mulatta || Jan 2006 || rheMac2
|-
| marmoset || Callithrix jacchus || June 2007 || calJac1
|-
| mouse || Mus musculus || July 2007 || mm9
|-
| opossum || Monodelphis domestica || Jan 2006 || monDom4
|-
| platypus || Ornithorhychus anatinus || Mar 2007 || ornAna1
|}
 
=== Make a list of all organisms for which there are nets & chains: ===
(put in order of furthest from this species to closest)<BR>
* ornAna1
* monDom4
* mm9
* rheMac2
* panTro2
* hg18
 
 
=== Check the following in the files: ===
==== Check annotated maf files for overlapping blocks: ====
*Note that upstream*.maf files do not need to be checked in this way.
[hgwdev:/gbdb/ponAbe2/multiz8way/anno/maf>
foreach f (*.maf)
  echo -n "${f}: "
  mafFilter -overlap -minRow=1 $f > /dev/null
end
 
''If there are 'rejected blocks', contact the developer.''
 
 
==== Read both README files: ====
/goldenPath/ponAbe2/phastCons8way/README.txt<BR>
/goldenPath/ponAbe2/multiz8way/README.txt
 
==== Check upstream files to make sure that the species name doesn't appear in an "s" line: ====
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | grep "s ponAbe2" | wc -l<BR>
0<BR>
''If this is not zero, contact the developer.''
 
==== Check upstream files to make sure gene names haven't been truncated (to 9 chars): ====
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat upstream*.maf.gz | head
##maf version=1 scoring=zero
a score=0.000000
s CG13384-RD_up_1000_chr2L_8383467_f 0 1000 + 1000
''If the gene names are short (9 characters) contact the developer.''
 
To generate a list of gene names and pipe to a file called "test":
zcat upstream1000.maf.gz | grep "NM_.*" | cut -f2 -d" " > test
 
To see how many file names are 10 or more characters long:
grep "[0-9]\{10,\}" test
 
==== Check upstream files to make sure sequence hasn't been reverse-complemented incorrectly: ====
Since reverse-complement is a relative thing, the MAF sequence is supposed to be in the direction of transcription.  That is, for a negative strand gene, its reversed-complement of the
genome sequence.  So it is supposed to be r-c of the genome, and not r-c of the direction of transcription.
 
From the MAF file documentation:
strand -- Either '+' or '-'. If '-', then the alignment is to the reverse-complemented source.
 
Search in the MySQL database for a gene on the minus strand.  Then find that gene in the upstream1000.maf.gz file then check for correct r-c.
 
e.g.
mysql> select name, chrom, strand from ensGene where strand = "-" limit 1\G<BR>
name: ENSORLT00000000020<BR>
chrom: chr1<BR>
strand: -<BR>
 
[hgwdev:~/goldenPath/oryLat2/multiz5way> zcat ensGene.upstream1000.maf.gz | grep "ENSORLT00000000020"<BR>
s ENSORLT00000000020 0 1000 + 1000 gacactgaaggacgtGGACGTTATTTACCAACATCAAAGCACACAAATATAtggcacagaaac [ -clip - ]
 
Check this sequence with the sequence just upstream from this gene in the browser.
 
==== Check one maf file: ====
[hgwdev:~/goldenPath/ponAbe2/multiz8way/maf> zcat chrX.maf.gz  | head
##maf version=1 scoring=autoMZ.v1
a score=21236.000000
s ponAbe2.chrX    2 249 + 156195299 cagtggcatgatcacagatgactgcagcctcggcctccatagc
 
==== Read through both description pages: ====
* Conservation track:
** Check image that displays on conservation details page.
** Check "Gene tracks used for codon translation" table against make doc.
** Make sure organisms are listed (in all places) in the correct phylogenetic order.
** Make sure that this page includes all the extra sections (if the multizs have been annotated).
** Make sure there is a tree model available.
* Most Conserved track:
** Make sure the text refers to the correct species.
 
==== Check trackDb.ra file: ====
* Conservation track:
** Make sure there is a speciesCodonDefault entry (usually is this species).
** Make sure Jim has signed off on the species listed in the speciesDefaultOff entry.
* Most Conserved track:
 
 
 
=== Figure out extFile and seq tables: ===
* if they are standard maf files, there will be no entries in the seq table.
* There may be more than one set of entries in the extFile table.  Make sure you only push the set that pertains to the actual files you are pushing to hgnfs1 (e.g. /gbdb/ponAbe2/multiz8way/anno/maf/*)
* These are the ones that will need pushing to beta:
mysql> select path from extFile where path like "%anno/maf%";
 
You can use this script to copy the rows from dev to beta:
copyExtSeqRows.csh
 
=== Test in the Genome Browser: ===
* Zoom out past 1M bps (this tests the multiz*waySummary table)
* Find example areas of all annotation types (check against the maf file for that location):
** pale yellow bar
** green square brackets
** vertical blue bar
** gaps
* Check out codon translation for a few species.
 
 
=== Test in the tables: ===
* joinerCheck
 
* featureBits
[hgwdev:~/qa/tracks/conservation/ponAbe2>  nice featureBits ponAbe2 multiz8way gap -bed=output.bed
162920397 bases of 3093572278 (5.266%) in intersection
 
* countPerChrom.csh ponAbe2 multiz8way
 
* find out how phastCons was run (from make doc).  See if the species listed in the non-inf list make sense.  In this case, they do not add to the phastCons wiggle. --not-informative
 
[[Category:Browser QA]]

Latest revision as of 19:32, 10 March 2011

This page is no longer maintained.