HgTablesTest details: Difference between revisions
No edit summary |
|||
(21 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
hgTablesTest - what it is actually testing. | ==hgTablesTest - what it is actually testing.== | ||
For each org/db/group/track/table does this: | For each org/db/group/track/table does this: | ||
Line 12: | Line 12: | ||
of the rest of the system I have recently added the ability for it to | of the rest of the system I have recently added the ability for it to | ||
shuffle the track and table lists or not. | shuffle the track and table lists or not. | ||
-seed=N - option for reproducibility and debugging. | |||
-noShuffle - do not shuffle tracks and tables lists. | |||
- Recursively selects the track/table in the hgTables drop-downs. | - Recursively selects the track/table in the hgTables drop-downs. | ||
Line 45: | Line 46: | ||
- testOutHyperlink - chooses "hyperlinks", "get output", fetches, compares output <A> tags count to expectedCount. | - testOutHyperlink - chooses "hyperlinks", "get output", fetches, compares output <A> tags count to expectedCount. | ||
- testOutGff- chooses "GTF", "get output", fetches. No other checking. (internally calls everything GFF not GTF) | - testOutGff- chooses "GTF", "get output", fetches. No other checking. (internally calls everything GFF not GTF) | ||
- testOutCustomTrack -- chooses "custom track", "get output", "CT in Table Browser". Checks that group "user" now exists. | - testOutCustomTrack -- chooses "custom track", "get output", "CT in Table Browser". Checks that group "user" now exists which is the group where user-created subtracks go. Because previous tests may also create custom tracks, this check is not completely foolproof. | ||
"CDS FASTA from multiple alignments" output type is NOT tested. | |||
What it is NOT testing: | ==What it is NOT testing:== | ||
identifiers (names/accessions) | identifiers (names/accessions) | ||
filter | filter | ||
Line 54: | Line 55: | ||
correlation | correlation | ||
But, at the end it does | But, at the end, just once it does these special tests on uniProt db: | ||
And it compares the number of rows returned to the table size which is fetched from the database. | joining | ||
filter | |||
identifier | |||
HGDB_PROF (or HGDB_CONF) | |||
It joins uniProt.taxon. And it compares the number of rows returned to the table size which is fetched from the database. | |||
And if you are connected to hgwdev db while hitting hgwbeta URL, those two tables can be different, | And if you are connected to hgwdev db while hitting hgwbeta URL, those two tables can be different, | ||
and hgTablesTest complains about it. | and hgTablesTest complains about it. You can address this by using the environment variable | ||
HGDB_PROF=someprofile where someprofile is defined in your .hg.conf file and points to the database | |||
which you are testing against, which would be hgwbeta. Alternatively you can point HGDB_CONF to .hg.conf.beta | |||
which points db.* to hgwbeta. | |||
==Hard vs Soft Errors== | |||
In the output log, it breaks down problems for reporting into hard and soft errors. | |||
Most of the errors are soft errors. | |||
A hard error occurs if it had errAbort while fetching, or the page variable is null, or the page->status returned from the hgTables CGI is not 200 OK. | |||
==Errors you can ignore== | |||
===Ex error1=== | |||
<tt> allFields n/a hg38 rep chainSelf chainSelfLink carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (734,348,198) </tt> | |||
This error is just saying the track has too many things to access in the Table Browser. In this instance the issue is that this is the self-alignment track, and it is in an area of a lot of repeats, near the centromere, so the track has a lot of items here. | |||
chainSelf errors come in different forms and are often false positives, with no discoverable problem. | |||
===Ex error2=== | |||
<tt><nowiki>summaryStats Mouse mm10 rna intronEst est Error near line 169 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables:<li>Can\x27t\x20start\x20query\x3A\x3CBR\x3Eselect\x20tStart\x2CtEnd\x2CqName\x LI outside of any of DIR MENU OL UL</nowiki></tt> | |||
<br> | |||
This is a [http://redmine.soe.ucsc.edu/issues/19111 known bug] with the est table on mm10 where somebody | |||
forgot about split-chrom tables. The table 'mm10.est' doesn't exist, since it is split across | |||
each chromosome, so the real table names are chr1_est etc. | |||
===Ex error3=== | |||
<tt> <nowiki> Error near line 163 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables: </BLOCKQUOTE></TD><TD><TT>varchar(255)</TT></TD> <TD><A HREF="/cgi-bin/hgTables </BLOCKQUOTE> without preceding <BLOCKQUOTE> </nowiki> </tt> | |||
This error is actually a [http://redmine.soe.ucsc.edu/issues/18762 data bug] -- the stray "</BLOCKQUOTE>" is in the intron column of the tRNAs table. | |||
===Ex error4=== | |||
==Example Running hgTablesTest== | |||
During the builds a script called '''doRobots.csh''' by the [http://genomewiki.ucsc.edu/genecats/index.php/CGI_Build_Process#Run_the_Robots Build Meister] | |||
If you see an error in the logs, it can be helpful to rerun the hgTablesTest on that specific item. | |||
===Ex1=== | |||
The following will run a test on the beta site of hg38 database selecting the group gene and the knownGene table and put the output into file called tempLog. | |||
<code> | |||
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=genes -track=knownGene -table=knownGene tempLog | |||
</code> | |||
While running important errors should show up, and you can look into the results to see the grand total: | |||
<pre> | |||
grand total | |||
Total: 31 tests, 0 soft errors, 0 hard errors, 19.07 seconds | |||
</pre> | |||
===Ex2=== | |||
Here's another example from a real log. '''Hint: grep -v "0 hard errors"''' to make results easier to read.<br> | |||
<code> cat /hive/groups/browser/newBuild/kent/src/utils/qa/weeklybld/logs/v407.preview2.hgTables.log | grep -v "0 hard errors" | less</code> | |||
<pre> | |||
type subtotals | |||
allFields: 62 tests, 0 soft errors, 1 hard errors, 60.15 seconds | |||
schema: 68 tests, 0 soft errors, 2 hard errors, 56.70 seconds | |||
summaryStats: 58 tests, 0 soft errors, 1 hard errors, 59.71 seconds | |||
organism subtotals | |||
n/a: 753 tests, 0 soft errors, 4 hard errors, 629.07 seconds | |||
db subtotals | |||
hg38: 740 tests, 0 soft errors, 4 hard errors, 619.33 seconds | |||
group subtotals | |||
rep: 61 tests, 0 soft errors, 2 hard errors, 101.25 seconds | |||
varRep: 74 tests, 0 soft errors, 2 hard errors, 54.39 seconds | |||
track subtotals | |||
chainSelf: 18 tests, 0 soft errors, 2 hard errors, 68.17 seconds | |||
dbSnp153Composite: 23 tests, 0 soft errors, 2 hard errors, 16.31 seconds | |||
table subtotals | |||
chainSelfLink: 4 tests, 0 soft errors, 2 hard errors, 22.21 seconds | |||
dbSnp153BadCoords: 11 tests, 0 soft errors, 1 hard errors, 7.72 seconds | |||
dbSnp153Mult: 11 tests, 0 soft errors, 1 hard errors, 7.87 seconds | |||
grand total | |||
Total: 753 tests, 0 soft errors, 4 hard errors, 629.07 seconds | |||
</pre> | |||
You can look at the above and see that the '''db hg38''' and '''4 hard errors''' and that the '''track chainSelf''' in '''grp rep''' was a source of two. | |||
Here we run '''hg38 rep chainSelf chainSelfLink''' to recreate the issue:<br> | |||
'''<code>hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=rep -track=chainSelf -table=chainSelfLink tempLog2</code>''' | |||
This gives this screen output: | |||
<pre> | |||
Running on machine hgwdev | |||
Testing URL hgwbeta.soe.ucsc.edu/cgi-bin/hgTables | |||
Connecting as hgcat@localhost to database server Localhost via UNIX socket | |||
Testing hg38 at position chr1:121978212-126978211 | |||
Testing n/a hg38 rep chainSelf chainSelfLink | |||
carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (604,788,212). Exiting. | |||
</pre> | |||
That '''Allocated too much memory - more than 500,000,000 bytes''' is the [http://genomewiki.ucsc.edu/genecats/index.php?title=HgTablesTest_details#Ex_error1 Errors you can ignore Ex1] on this page. | |||
===Ex3=== | |||
In Ex2 there is also 2 hard errors on the dbSnp153Composite track. | |||
Here we run '''hg38 varRep dbSnp153Composite dbSnp153Mult''' to recrreat the issue:<br> | |||
'''<code>hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153Mult tempLog3</code>''' | |||
Looking at tempLog3 we see the error is on the schema: | |||
'''<code>cat tempLog3 | grep -v "0 hard errors"</code>'''<br> | |||
<pre> | |||
schema n/a hg38 varRep dbSnp153Composite dbSnp153Mult Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables: < 1%).</td> </tr> <tr> <td>refIsSingleton</td> <td class="number">3 Space not allowed between opening bracket < and tag name | |||
</pre> | |||
===Ex4=== | |||
In Ex2 there also 2 hard errors on the dbSnp153Composite track. | |||
Here we run ''hg38 varRep dbSnp153Composite '''dbSnp153BadCoords''''' to recreate the issue:<br> | |||
'''<code>$ hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153BadCoords tempLog4</code>''' | |||
Looking at tempLog4 we see the error is on the schema: | |||
'''<code>cat tempLog4 | grep -v "0 hard errors"</code>'''<br> | |||
<pre> | |||
schema n/a hg38 varRep dbSnp153Composite dbSnp153BadCoords Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables: < 1%).</td> </tr> <tr> <td>refIsSingleton</td> <td class="number">3 Space not allowed between opening bracket < and tag name | |||
</pre> | |||
'''For Ex3 and Ex4 the result would be to let the developer of the dbSnp153Composite Track know about their Track Description page errors.''' |
Latest revision as of 23:17, 18 November 2020
hgTablesTest - what it is actually testing.
For each org/db/group/track/table does this:
- For each org/db, it gets 5MB test region from the middle of the first chrom in chromInfo table.
- Can filter by -org= -db= or specify number to check -orgs=N -dbs=N - Can filter by specifying a single -group= -track= -table= - Can filter by specifying the number to check, -groups=N -tracks=N -tables=N Defaults to all groups, and the first 4 tracks and the first 2 tables. Since testing just the first 4 tracks all the time does not get much coverage of the rest of the system I have recently added the ability for it to shuffle the track and table lists or not. -seed=N - option for reproducibility and debugging. -noShuffle - do not shuffle tracks and tables lists.
- Recursively selects the track/table in the hgTables drop-downs.
- Checks with the htmlCheck library all pages fetched by the robot.
- Presses the schema button to bring up the schema page. Because the schema page includes the track description at the bottom, it ends up checking the html description which is located under makeDb/trackDb/ and which gets built into the trackDb.html field. It turns out that it stops at the first error, so actually testing and fixing goes faster by just running the htmlCheck utility directly on the .html files under makeDb/trackDb/. You can quickly find out if the fix worked, and if there are any other errors, without waiting for a whole other build-cycle of 3 weeks.
- Presses the summary/statistics button.
- testAllFields - chooses "all fields from selected table", "get output". Counts the rows returned and keeps as expectedCount for further steps.
- testOneField - chooses "Select Fields from primary and related tables", "get output" It automatically checks the first field found and submits. It compares the rows returned to the expected count.
- If no BED output is available, this is a signal that it cannot limit the output to the 5MB test position, which means that the entire table will be scanned. The table is skipped if over 500K rows, which it checks in the database.
- If BED output is available (and output is limited to 5MB test region), it then proceeds to test these: - testOutSequence - chooses "sequence", "get output", fetches, compares output rowCount to expectedCount. - testOutBed - chooses "BED", "get output", fetches, compares output rowCount to expectedCount. - testOutHyperlink - chooses "hyperlinks", "get output", fetches, compares output <A> tags count to expectedCount. - testOutGff- chooses "GTF", "get output", fetches. No other checking. (internally calls everything GFF not GTF) - testOutCustomTrack -- chooses "custom track", "get output", "CT in Table Browser". Checks that group "user" now exists which is the group where user-created subtracks go. Because previous tests may also create custom tracks, this check is not completely foolproof. "CDS FASTA from multiple alignments" output type is NOT tested.
What it is NOT testing:
identifiers (names/accessions) filter intersection correlation
But, at the end, just once it does these special tests on uniProt db:
joining filter identifier
HGDB_PROF (or HGDB_CONF)
It joins uniProt.taxon. And it compares the number of rows returned to the table size which is fetched from the database. And if you are connected to hgwdev db while hitting hgwbeta URL, those two tables can be different, and hgTablesTest complains about it. You can address this by using the environment variable HGDB_PROF=someprofile where someprofile is defined in your .hg.conf file and points to the database which you are testing against, which would be hgwbeta. Alternatively you can point HGDB_CONF to .hg.conf.beta which points db.* to hgwbeta.
Hard vs Soft Errors
In the output log, it breaks down problems for reporting into hard and soft errors. Most of the errors are soft errors. A hard error occurs if it had errAbort while fetching, or the page variable is null, or the page->status returned from the hgTables CGI is not 200 OK.
Errors you can ignore
Ex error1
allFields n/a hg38 rep chainSelf chainSelfLink carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (734,348,198)
This error is just saying the track has too many things to access in the Table Browser. In this instance the issue is that this is the self-alignment track, and it is in an area of a lot of repeats, near the centromere, so the track has a lot of items here.
chainSelf errors come in different forms and are often false positives, with no discoverable problem.
Ex error2
summaryStats Mouse mm10 rna intronEst est Error near line 169 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables:<li>Can\x27t\x20start\x20query\x3A\x3CBR\x3Eselect\x20tStart\x2CtEnd\x2CqName\x LI outside of any of DIR MENU OL UL
This is a known bug with the est table on mm10 where somebody
forgot about split-chrom tables. The table 'mm10.est' doesn't exist, since it is split across
each chromosome, so the real table names are chr1_est etc.
Ex error3
Error near line 163 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables: </BLOCKQUOTE></TD><TD><TT>varchar(255)</TT></TD> <TD><A HREF="/cgi-bin/hgTables </BLOCKQUOTE> without preceding <BLOCKQUOTE>
This error is actually a data bug -- the stray "" is in the intron column of the tRNAs table.
Ex error4
Example Running hgTablesTest
During the builds a script called doRobots.csh by the Build Meister
If you see an error in the logs, it can be helpful to rerun the hgTablesTest on that specific item.
Ex1
The following will run a test on the beta site of hg38 database selecting the group gene and the knownGene table and put the output into file called tempLog.
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=genes -track=knownGene -table=knownGene tempLog
While running important errors should show up, and you can look into the results to see the grand total:
grand total Total: 31 tests, 0 soft errors, 0 hard errors, 19.07 seconds
Ex2
Here's another example from a real log. Hint: grep -v "0 hard errors" to make results easier to read.
cat /hive/groups/browser/newBuild/kent/src/utils/qa/weeklybld/logs/v407.preview2.hgTables.log | grep -v "0 hard errors" | less
type subtotals allFields: 62 tests, 0 soft errors, 1 hard errors, 60.15 seconds schema: 68 tests, 0 soft errors, 2 hard errors, 56.70 seconds summaryStats: 58 tests, 0 soft errors, 1 hard errors, 59.71 seconds organism subtotals n/a: 753 tests, 0 soft errors, 4 hard errors, 629.07 seconds db subtotals hg38: 740 tests, 0 soft errors, 4 hard errors, 619.33 seconds group subtotals rep: 61 tests, 0 soft errors, 2 hard errors, 101.25 seconds varRep: 74 tests, 0 soft errors, 2 hard errors, 54.39 seconds track subtotals chainSelf: 18 tests, 0 soft errors, 2 hard errors, 68.17 seconds dbSnp153Composite: 23 tests, 0 soft errors, 2 hard errors, 16.31 seconds table subtotals chainSelfLink: 4 tests, 0 soft errors, 2 hard errors, 22.21 seconds dbSnp153BadCoords: 11 tests, 0 soft errors, 1 hard errors, 7.72 seconds dbSnp153Mult: 11 tests, 0 soft errors, 1 hard errors, 7.87 seconds grand total Total: 753 tests, 0 soft errors, 4 hard errors, 629.07 seconds
You can look at the above and see that the db hg38 and 4 hard errors and that the track chainSelf in grp rep was a source of two.
Here we run hg38 rep chainSelf chainSelfLink to recreate the issue:
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=rep -track=chainSelf -table=chainSelfLink tempLog2
This gives this screen output:
Running on machine hgwdev Testing URL hgwbeta.soe.ucsc.edu/cgi-bin/hgTables Connecting as hgcat@localhost to database server Localhost via UNIX socket Testing hg38 at position chr1:121978212-126978211 Testing n/a hg38 rep chainSelf chainSelfLink carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (604,788,212). Exiting.
That Allocated too much memory - more than 500,000,000 bytes is the Errors you can ignore Ex1 on this page.
Ex3
In Ex2 there is also 2 hard errors on the dbSnp153Composite track.
Here we run hg38 varRep dbSnp153Composite dbSnp153Mult to recrreat the issue:
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153Mult tempLog3
Looking at tempLog3 we see the error is on the schema:
cat tempLog3 | grep -v "0 hard errors"
schema n/a hg38 varRep dbSnp153Composite dbSnp153Mult Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables: < 1%).</td> </tr> <tr> <td>refIsSingleton</td> <td class="number">3 Space not allowed between opening bracket < and tag name
Ex4
In Ex2 there also 2 hard errors on the dbSnp153Composite track.
Here we run hg38 varRep dbSnp153Composite dbSnp153BadCoords to recreate the issue:
$ hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153BadCoords tempLog4
Looking at tempLog4 we see the error is on the schema:
cat tempLog4 | grep -v "0 hard errors"
schema n/a hg38 varRep dbSnp153Composite dbSnp153BadCoords Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables: < 1%).</td> </tr> <tr> <td>refIsSingleton</td> <td class="number">3 Space not allowed between opening bracket < and tag name
For Ex3 and Ex4 the result would be to let the developer of the dbSnp153Composite Track know about their Track Description page errors.