Crispr QA: Difference between revisions
Line 22: | Line 22: | ||
* These tables should be in tablesIgnored for all.joiner | * These tables should be in tablesIgnored for all.joiner | ||
* countPerChrom and runBits: | * countPerChrom and runBits: | ||
* compare crisprRanges to the gene track it was made from. They should be somewhat similar. | ** compare crisprRanges to the gene track it was made from. They should be somewhat similar. | ||
** The crisprTargets track is a bigBed, so I like to make a bed out of the bigBed, run it through bedSingleCover, and then call featureBits on it comparing it to the crisprRanges table, they should be fairly similar: | ** The crisprTargets track is a bigBed, so I like to make a bed out of the bigBed, run it through bedSingleCover, and then call featureBits on it comparing it to the crisprRanges table, they should be fairly similar: | ||
<pre>$ featureBits db crisprRanges targets.singleCover.bed</pre> | <pre>$ featureBits db crisprRanges targets.singleCover.bed</pre> | ||
* Neither track should overlap with the gap track | ** Neither track should overlap with the gap track | ||
* crisprTargets should not fall outside crisprRanges, since crisprTargets is all the guides in the range of crisprRanges, see below on how to check. | ** crisprTargets should not fall outside crisprRanges, since crisprTargets is all the guides in the range of crisprRanges, see below on how to check. | ||
* all details check is slightly different, see below. | * all details check is slightly different, see below. | ||
Revision as of 23:33, 31 August 2017
In addition to following the checks listed here, be sure to follow the regular new track checklist, as well.
Background
The track consists of regions in the genome that are target-able via the Cas 9 enzyme from S. pyogenes. Target-able sequence is any 20bp sequence with an NGG motif (the Protospacer Adjacent Motif, PAM) on the 3' end, and only those found within exons + some length of flanking sequence (200bp flanking for regular track, 10kbp for crispr10K track, etc). Researchers construct RNA complementary to the 20bp guide sequence, complex it to the cas9 enzyme, and inject the complex into the cell to edit DNA near the guide location. Different microbes use different PAM sequences, different cas enzymes, etc, and so the range of editable sequence can vary, we are just showing one example.
The following sites contain good histories/explanation:
https://en.wikipedia.org/wiki/CRISPR
http://science.sciencemag.org/content/346/6213/1258096.full
Table/File setup
The Crispr track (and it's variations Crispr 10K, CrisprKmers, etc) all consist of 3 tables and 2 files: MySQL tables:
- crisprRanges (or crispr10KRanges, etc) - a simple bed3 table which contains the regions surveyed for guides
- crisprTargets - a one-row table pointing to a bigBed file (/gbdb/$db/crispr/crispr.bb)
- locusName - table describing each base of a given genome, and whether that base (or sequence of bases) is in an exon, intron, intergenic, etc
GBDB Files:
- /gbdb/$db/crispr/crispr.bb - the meat of the track, this huge file contains all of the 23 bp Crispr/cas9 guide sequences
- /gbdb/$db/crispr/crisprDetails.tab - an even huger file describing the off-target locations of each of the guides in the bigBed
Special notes about normal QA checks
- crisprRanges and locusName follow the normal track QA checklist.
- These tables should be in tablesIgnored for all.joiner
- countPerChrom and runBits:
- compare crisprRanges to the gene track it was made from. They should be somewhat similar.
- The crisprTargets track is a bigBed, so I like to make a bed out of the bigBed, run it through bedSingleCover, and then call featureBits on it comparing it to the crisprRanges table, they should be fairly similar:
$ featureBits db crisprRanges targets.singleCover.bed
- Neither track should overlap with the gap track
- crisprTargets should not fall outside crisprRanges, since crisprTargets is all the guides in the range of crisprRanges, see below on how to check.
- all details check is slightly different, see below.
Special QA notes
- To check the coordinates of the bigBed file, what you can do is first check the table coords of the crisprRanges table with checkTableCoords, when that comes back ok, make a bed file of the crisprRanges, and then compare it to the crisprTargets.bed file with bedtools intersect:
/cluster/bin/bedtools/bedtools intersect -v -a mm10.crispr10K.bed -b mm10.crispr10KRanges.bed
This intersects the two files, and only outputs crisprTargets that are not in crisprRanges (ie: bad coords).
- The coloring scheme is described in the description page. A few guides of each type should be checked. From a bed file of crisprTargets, you can extract the guides of each type like so:
awk -F '\t' '{$9 == "0,200,0") print $0}' crisprTargets.bed > greenItems.bed
$9 is the normal bigBed color column, and in this case it indicates the efficiency of the guide RNA, or how well the guide will actually cut at that location.
- To get color counts:
for color in $(cat crispr10KColors); do printf "number of %s color:" "$color"; awk '{print $9}' mm10.crispr10K.bed | grep -c $color mm10.crispr10K.bed; done > itemColorCounts
- To mimic countPerChrom measurements:
for chrom in {1..19} X Y M; do echo chr$chrom; awk -v chr=$chrom -F'\t' '{if ($1 == "chr"chr) {sum+=1}} END {print sum}' mm10.crispr10K.bed; done > chromCounts
- Take a few random lines (shuf -n 3) from each of these color files and check every aspect of the bigBed file:
- check the scores are correct
- check the mouseOver text in hgTracks is correct
- check off-target counts match what's displayed in the full list. sometimes the page may say something like 1 off-target with one mismatch but there won't be any in the full list.
- make sure the list of off-targets shows up. the off-target information is stored in an external file and there have been problems with the indices into it.
- check a few off-targets from the table. Make sure the sequence displayed matches the sequence in the browser, and check that the locus is correct. Note that this can be confusing when a negative strand is involved on either a guide, off-target or both. Some guides have hundreds of off-targets, only 2-3 need to be checked.
- check the speed of the track and details page. This is a large bigBed, so it's good to note any performance issues.
- this track can cause problems in the TB and DI, profiling these errors further will be helpful for future use.
Push Notes
- when pushing the track to hgwbeta, be sure to have the correct release tags on the trackDb stanzas so hgnfs1 files don't leak out.