HgTablesTest details
hgTablesTest - what it is actually testing.
For each org/db/group/track/table does this:
- For each org/db, it gets 5MB test region from the middle of the first chrom in chromInfo table.
- Can filter by -org= -db= or specify number to check -orgs=N -dbs=N - Can filter by specifying a single -group= -track= -table= - Can filter by specifying the number to check, -groups=N -tracks=N -tables=N Defaults to all groups, and the first 4 tracks and the first 2 tables. Since testing just the first 4 tracks all the time does not get much coverage of the rest of the system I have recently added the ability for it to shuffle the track and table lists or not. -seed=N - option for reproducibility and debugging. -noShuffle - do not shuffle tracks and tables lists.
- Recursively selects the track/table in the hgTables drop-downs.
- Checks with the htmlCheck library all pages fetched by the robot.
- Presses the schema button to bring up the schema page. Because the schema page includes the track description at the bottom, it ends up checking the html description which is located under makeDb/trackDb/ and which gets built into the trackDb.html field. It turns out that it stops at the first error, so actually testing and fixing goes faster by just running the htmlCheck utility directly on the .html files under makeDb/trackDb/. You can quickly find out if the fix worked, and if there are any other errors, without waiting for a whole other build-cycle of 3 weeks.
- Presses the summary/statistics button.
- testAllFields - chooses "all fields from selected table", "get output". Counts the rows returned and keeps as expectedCount for further steps.
- testOneField - chooses "Select Fields from primary and related tables", "get output" It automatically checks the first field found and submits. It compares the rows returned to the expected count.
- If no BED output is available, this is a signal that it cannot limit the output to the 5MB test position, which means that the entire table will be scanned. The table is skipped if over 500K rows, which it checks in the database.
- If BED output is available (and output is limited to 5MB test region), it then proceeds to test these: - testOutSequence - chooses "sequence", "get output", fetches, compares output rowCount to expectedCount. - testOutBed - chooses "BED", "get output", fetches, compares output rowCount to expectedCount. - testOutHyperlink - chooses "hyperlinks", "get output", fetches, compares output <A> tags count to expectedCount. - testOutGff- chooses "GTF", "get output", fetches. No other checking. (internally calls everything GFF not GTF) - testOutCustomTrack -- chooses "custom track", "get output", "CT in Table Browser". Checks that group "user" now exists which is the group where user-created subtracks go. Because previous tests may also create custom tracks, this check is not completely foolproof. "CDS FASTA from multiple alignments" output type is NOT tested.
What it is NOT testing:
identifiers (names/accessions) filter intersection correlation
But, at the end, just once it does these special tests on uniProt db:
joining filter identifier
HGDB_PROF (or HGDB_CONF)
It joins uniProt.taxon. And it compares the number of rows returned to the table size which is fetched from the database. And if you are connected to hgwdev db while hitting hgwbeta URL, those two tables can be different, and hgTablesTest complains about it. You can address this by using the environment variable HGDB_PROF=someprofile where someprofile is defined in your .hg.conf file and points to the database which you are testing against, which would be hgwbeta. Alternatively you can point HGDB_CONF to .hg.conf.beta which points db.* to hgwbeta.
Hard vs Soft Errors
In the output log, it breaks down problems for reporting into hard and soft errors. Most of the errors are soft errors. A hard error occurs if it had errAbort while fetching, or the page variable is null, or the page->status returned from the hgTables CGI is not 200 OK.
Errors you can ignore
Ex error1
allFields n/a hg38 rep chainSelf chainSelfLink carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (734,348,198)
This error is just saying the track has too many things to access in the Table Browser. In this instance the issue is that this is the self-alignment track, and it is in an area of a lot of repeats, near the centromere, so the track has a lot of items here.
chainSelf errors come in different forms and are often false positives, with no discoverable problem.
Ex error2
summaryStats Mouse mm10 rna intronEst est Error near line 169 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables:<li>Can\x27t\x20start\x20query\x3A\x3CBR\x3Eselect\x20tStart\x2CtEnd\x2CqName\x LI outside of any of DIR MENU OL UL
This is a known bug with the est table on mm10 where somebody
forgot about split-chrom tables. The table 'mm10.est' doesn't exist, since it is split across
each chromosome, so the real table names are chr1_est etc.
Ex error3
Error near line 163 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables: </BLOCKQUOTE></TD><TD><TT>varchar(255)</TT></TD> <TD><A HREF="/cgi-bin/hgTables </BLOCKQUOTE> without preceding <BLOCKQUOTE>
This error is actually a data bug -- the stray "" is in the intron column of the tRNAs table.
Ex error4
Example Running hgTablesTest
During the builds a script called doRobots.csh by the Build Meister
If you see an error in the logs, it can be helpful to rerun the hgTablesTest on that specific item.
Ex1
The following will run a test on the beta site of hg38 database selecting the group gene and the knownGene table and put the output into file called tempLog.
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=genes -track=knownGene -table=knownGene tempLog
While running important errors should show up, and you can look into the results to see the grand total:
grand total Total: 31 tests, 0 soft errors, 0 hard errors, 19.07 seconds
Ex2
Here's another example from a real log. Hint: grep -v "0 hard errors" to make results easier to read.
cat /hive/groups/browser/newBuild/kent/src/utils/qa/weeklybld/logs/v407.preview2.hgTables.log | grep -v "0 hard errors" | less
type subtotals allFields: 62 tests, 0 soft errors, 1 hard errors, 60.15 seconds schema: 68 tests, 0 soft errors, 2 hard errors, 56.70 seconds summaryStats: 58 tests, 0 soft errors, 1 hard errors, 59.71 seconds organism subtotals n/a: 753 tests, 0 soft errors, 4 hard errors, 629.07 seconds db subtotals hg38: 740 tests, 0 soft errors, 4 hard errors, 619.33 seconds group subtotals rep: 61 tests, 0 soft errors, 2 hard errors, 101.25 seconds varRep: 74 tests, 0 soft errors, 2 hard errors, 54.39 seconds track subtotals chainSelf: 18 tests, 0 soft errors, 2 hard errors, 68.17 seconds dbSnp153Composite: 23 tests, 0 soft errors, 2 hard errors, 16.31 seconds table subtotals chainSelfLink: 4 tests, 0 soft errors, 2 hard errors, 22.21 seconds dbSnp153BadCoords: 11 tests, 0 soft errors, 1 hard errors, 7.72 seconds dbSnp153Mult: 11 tests, 0 soft errors, 1 hard errors, 7.87 seconds grand total Total: 753 tests, 0 soft errors, 4 hard errors, 629.07 seconds
You can look at the above and see that the db hg38 and 4 hard errors and that the track chainSelf in grp rep was a source of two.
Here we run hg38 rep chainSelf chainSelfLink to recreate the issue:
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=rep -track=chainSelf -table=chainSelfLink tempLog2
This gives this screen output:
Running on machine hgwdev Testing URL hgwbeta.soe.ucsc.edu/cgi-bin/hgTables Connecting as hgcat@localhost to database server Localhost via UNIX socket Testing hg38 at position chr1:121978212-126978211 Testing n/a hg38 rep chainSelf chainSelfLink carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (604,788,212). Exiting.
That Allocated too much memory - more than 500,000,000 bytes is the Errors you can ignore Ex1 on this page.
Ex3
In Ex2 there is also 2 hard errors on the dbSnp153Composite track.
Here we run hg38 varRep dbSnp153Composite dbSnp153Mult to recrreat the issue:
hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153Mult tempLog3
Looking at tempLog3 we see the error is on the schema:
cat tempLog3 | grep -v "0 hard errors"
schema n/a hg38 varRep dbSnp153Composite dbSnp153Mult Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables: < 1%).</td> </tr> <tr> <td>refIsSingleton</td> <td class="number">3 Space not allowed between opening bracket < and tag name
Ex4
In Ex2 there also 2 hard errors on the dbSnp153Composite track.
Here we run hg38 varRep dbSnp153Composite dbSnp153BadCoords to recreate the issue:
$ hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153BadCoords tempLog4
Looking at tempLog4 we see the error is on the schema:
cat tempLog4 | grep -v "0 hard errors"
schema n/a hg38 varRep dbSnp153Composite dbSnp153BadCoords Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables: < 1%).</td> </tr> <tr> <td>refIsSingleton</td> <td class="number">3 Space not allowed between opening bracket < and tag name
For Ex3 and Ex4 the result would be to let the developer of the dbSnp153Composite Track know about their Track Description page errors.