Browser Mirrors

From genomewiki
Jump to navigationJump to search

Please note, the information on this page may be useful, however it is becoming dated with time. The currently recommended UCSC procedures can be found in the source tree: scripts directory.

To fully utilize scripts such as these, you should be familiar with shell programming and you should be able to understand what the scripts are doing so you can customize them to your particular installation. They are not going to work blindly out of the box.

This page contains information for users interested in mirroring the UCSC Genome Browser on their own servers. See also http://genome.ucsc.edu/mirror.html

Browser_Installation contains the complete installation instructions Minimal_Browser_Installation also contains some useful information.

Partial Mirrors

A complete mirror of all assemblies requires a large amount of disk space (currently on the order of a terabyte). However, it is not too difficult to set things up so that only a portion of assemblies are mirrored. The following scripts and auxiliary files are used for this purpose at Cornell (http://genome-mirror.bscb.cornell.edu).

  • doDownloads.sh - Script for downloading selected files from hgdownload.soe.ucsc.edu
  • doUpdateDb.sh - Script for updating local mysql databases with downloaded files.
  • databases - File identifying which databases to mirror.
  • gbdb.exclude - File identifying directories to be excluded when rsyncing /gbdb.

These programs are run nightly via cron, with the following crontab entry:

0 0 * * * /usr/data/mirror-download/doAll.sh 

where doAll.sh is a simple wrapper for doDownloads.sh and doUpdateDb.sh:

#!/bin/bash -e

# do downloads and updates
# for use with cron

echo "#####################################"
/usr/data/mirror-download/doDownloads.sh

echo "#####################################"
/usr/data/mirror-download/doUpdateDb.sh

echo "#####################################"
echo "Successfully updated mirror."

To mirror a new database, simply add the name (e.g., galGal3) to the databases file and, if necessary, delete it from gbdb.exclude. Similary, delete an entry from databases and add it to gbdb.exclude to discontinue mirroring it. (You will manually need to delete the current files and database, if desired.) Note that doDownloads.sh and doUpdateDb.sh can be run manually, either on all databases or just on selected databases (see -d option). The "dry-run" (-n) option is handy for seeing what they will do without actually making any changes.

These programs are a work in progress. One minor problem is that they contain hardcoded paths to /usr/data/mirror-downloads, which is the working directory on our system. This can easily be corrected. Another issue is that we want to allow for local tracks in addition to mirrored tracks. As a result, we cannot use the --delete option when we rsync files to /gbdb and we cannot simply overwrite the hgcentral tables with the 'hgcentral.sql' table provided on hgdownload (see comments in scripts). A related issue is that updates on hgdownload to trackDb tables currently cause our own local versions of these tables to be overwritten, and we have to redefine them from our local trackDb.ra files each time this happens. To address these problems, we will need to do some programming that will allow updated data from hgdownload to be merged with our own local data. Users who do not need to maintain their own local files need not worry about these issues -- but you may want to edit the scripts to use the --delete option with rsync and to download and load hgcentral.sql.