Minimal Browser Installation: Difference between revisions

Latest revision as of 06:48, 1 September 2018

NOTE: This page is not necessarily maintained with the most up-to-date information. The information in our mirror instructions is current.

Minimal Browser Installation

Usually a browser installation wants to be a subset of all data from a selection of genomes compared to the entire UCSC Genome Browser. A mirror with a subset of genomes is often called a "partial mirror". Instead of the entire rsync of everything from some genomes as documented on Mirror Instructions or on this wiki (Browser installation), sometimes you want only a subset of data from one genome or want to create a genome browser for an entirely new genome. This is more work but possible. We call this a "Minimal Browser" in the following. This page shows you a minimal browser that displays only the genome sequence and the gaps. It is admittedly not a very useful genome browser but will get you started.

A license is required for commercial download and/or installation of the Genome Browser binaries and source code. No license is needed for academic, nonprofit, and personal use. To purchase a license, see our License Instructions.

Please note the full discussion in the README.* files and scripts to assist with these browser installation procedures in the source tree directory: src/product/ and src/product/scripts/

A minimal browser database needs six tables:

grp
chromInfo
trackDb
hgFindSpec
gold
gap

The gateway page needs the hgcentral database to function. The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral or loaded from the SQL text file at http://hgdownload.soe.ucsc.edu/admin/hgcentral.sql

Enter a defaultGenome=<your species> specification in your /cgi-bin/hg.conf file. See notes in the src/product/ex.hg.conf file for this option.

For the /gbdb/ data area, at a minimum you will need the .2bit file or the nib files for the assembly. This is either:

/gbdb/<database>/<database>.2bit
or for older genome assemblies:
/gbdb/<database>/nib/*.nib

Various tracks use other files in this directory. If you don't care about all the tracks, you won't need other files here.

For the genbank sequences, you can check the gbExtFile table for your database to see exactly which files are used by that assembly in /gbdb/genbank/
Extract the "path" column from that table and use that list in a --files-from specification for your rsync.

You will also need the /gbdb/hgFixed/ and the hg19 installation requires the /gbdb/hg19Patch5/ directory and database.

Additional databases

To mirror a single genome, there are a few extra databases that are required to enable the full functions of that single genome database. These databases contain data that are not specific to a single genome assembly. Your particular genome may need one or more of the following databases:

go080130
hgFixed
proteins090821
sp090821

With symlinks in your MySQL data directory:

go -> go080130
proteome -> proteins090821
uniProt -> sp090821

The specific selection of the go, proteins, and uniProt databases can be found in the hgcentral gdbPdb table:

 hgsql -e "select * from gdbPdb;" hgcentral

The symlinks remain as indicated above, other genomes will reference a specific protein, go, or uniProt database explicitly.

You may need to add MySQL GRANT permissions for these new databases if your read-only MySQL user has a specific list of database accesses.

Partial Mirrors

The currently recommended UCSC browser mirror procedures can be found in the source tree: scripts directory.

To fully utilize scripts such as these, you should be familiar with shell programming and you should be able to understand what the scripts are doing so you can customize them to your particular installation. They are not going to work blindly out of the box.

User notes

I made hgBlat work on my local browser installation by putting the full hostnames into hgcentral.blatservers, e.g. 'blat4' was replaced by the output of `blat4.soe.ucsc.edu`. I wonder if it wouldn't be a good idea to mention this in the mirroring instructions somewhere. --- max

Before you start using our blat servers, you need to verify with us that you have permission. We can't have everyone with a mirror site simply use our blat servers, the load would take them down for everyone. See also: Kent Informatics for a commercial blat license.

A nice command from Paul McKenna: UPDATE blatServers SET host=concat(host,'.soe.ucsc.edu'); Max 15:11, 3 February 2007 (PST)

@@ Line 1: / Line 1: @@
+'''NOTE''': This page is not necessarily maintained with the most up-to-date information.  The information in [http://hgwdev.gi.ucsc.edu/goldenPath/help/mirror.html our mirror instructions] is current.
 ==Minimal Browser Installation==
-Usually a browser installation wants to be a subset of genomes compared to the entire [http://genome.ucsc.edu/| UCSC Genome Browser]
+Usually a browser installation wants to be a subset of all data from a selection of genomes compared to the entire [http://genome.ucsc.edu/ UCSC Genome Browser]. A mirror with a subset of genomes is often called a "partial mirror". Instead of the entire rsync of everything from some genomes as documented on [http://genome.ucsc.edu/admin/mirror.html Mirror Instructions] or on this wiki ([[Browser installation]]), sometimes you want only a subset of data from one genome or want to create a genome browser for an entirely new genome. This is more work but possible. We call this a "Minimal Browser" in the following. This page shows you a minimal browser that displays only the genome sequence and the gaps. It is admittedly not a very useful genome browser but will get you started.
-Instead of the entire rsync of everything mentioned in
+A '''license''' is required for commercial download and/or installation of the Genome Browser binaries and source code. No license is needed for academic, nonprofit, and personal use. To purchase a license, see our [http://genome.ucsc.edu/license/index.html License Instructions].
-the [http://genome.ucsc.edu/admin/mirror.html Mirror Instructions,]
-a subset of data can be downloaded.
-A minimal browser database needs five tables:
+Please note the full discussion in the README.* files and scripts to assist with these browser installation procedures in the source tree directory:
+[http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product src/product/] and
+[http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product/scripts src/product/scripts/]
+A minimal browser database needs six tables:
 <UL>
 <LI>grp</LI>
@@ Line 13: / Line 18: @@
 <LI>trackDb</LI>
 <LI>hgFindSpec</LI>
-<LI>any other table, for example gap</LI>
+<LI>gold</LI>
+<LI>gap</LI>
 </UL>
-The gateway page needs the hgcentral database to function.  The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload.cse.ucsc.edu/mysql/hgcentral or loaded from the SQL text file at http://hgdownload.cse.ucsc.edu/admin/hgcentral.sql
+The gateway page needs the hgcentral database to function.  The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral or loaded from the SQL text file at http://hgdownload.soe.ucsc.edu/admin/hgcentral.sql
-Currently (August 2006) the gateway page expects the human hg18 database to exist in order to function without difficulty.  This concept of the default genome needs to be a configuration item in the '''cgi-bin/hg.conf''' file to avoid this dependency.  This needs to be fixed in '''src/hg/lib/hdb.c'''.
+Enter a '''defaultGenome=<your species>''' specification in your '''/cgi-bin/hg.conf''' file.  See notes in the
+[http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/ex.hg.conf src/product/ex.hg.conf] file for this option.
 For the /gbdb/ data area, at a minimum you will need the .2bit file or the nib files for the assembly.  This is either:<BR><pre>
 /gbdb/<database>/<database>.2bit
-or
+or for older genome assemblies:
 /gbdb/<database>/nib/*.nib
 </pre>
@@ Line 30: / Line 37: @@
 For the genbank sequences, you can check the gbExtFile table for your database to see exactly which files are used by that assembly in '''/gbdb/genbank/'''<BR>
 Extract the "path" column from that table and use that list in a '''--files-from''' specification for your rsync.
+You will also need the /gbdb/hgFixed/ and the hg19 installation requires the /gbdb/hg19Patch5/ directory and database.
+==Additional databases==
+To mirror a single genome, there are a few extra databases that are required to
+enable the full functions of that single genome database.  These databases contain data that
+are not specific to a single genome assembly.  Your particular genome may need one or more
+of the following databases:
+<UL>
+<LI>go080130</LI>
+<LI>hgFixed</LI>
+<LI>proteins090821</LI>
+<LI>sp090821</LI>
+</UL>
+With symlinks in your MySQL data directory:
+<UL>
+<LI>go -> go080130</LI>
+<LI>proteome -> proteins090821</LI>
+<LI>uniProt -> sp090821</LI>
+</UL>
+The specific selection of the go, proteins, and uniProt databases can be found
+in the hgcentral gdbPdb table:
+  hgsql -e "select * from gdbPdb;" hgcentral
+The symlinks remain as indicated above, other genomes will reference a specific
+protein, go, or uniProt database explicitly.
+You may need to add MySQL GRANT permissions for these new databases if your read-only
+MySQL user has a specific list of database accesses.
+==Partial Mirrors==
+The currently recommended UCSC browser mirror  procedures can be found in the source tree:
+[http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product/scripts scripts] directory.
+To fully utilize scripts such as these, you should be familiar with shell programming and you should
+be able to understand what the scripts are doing so you can customize them to your particular installation.
+They are not going to work blindly out of the box.<BR>
+==See also==
+[[Building a new genome database]]
+[[Browser Installation]]
+[[Browser Mirrors]]
+==User notes==
+I made hgBlat work on my local browser installation by putting the full hostnames into hgcentral.blatservers, e.g. 'blat4' was replaced by the output of `blat4.soe.ucsc.edu`. I wonder if it wouldn't be a good idea to mention this in the mirroring instructions somewhere. --- max
+----
+Before you start using our blat servers, you need to verify with us that you have permission.  We can't have everyone with a mirror site simply use our blat servers, the load would take them down for everyone.  See also: [http://www.kentinformatics.com/ Kent Informatics] for a commercial blat license.
+----
+A nice command from Paul McKenna: UPDATE blatServers SET host=concat(host,'.soe.ucsc.edu'); [[User:Max|Max]] 15:11, 3 February 2007 (PST)
 [[Category:Technical FAQ]]
 [[Category:Mirror Site FAQ]]
+[[Category:Browser Linked]]
+[[Category:Installation]]

Minimal Browser Installation: Difference between revisions

Latest revision as of 06:48, 1 September 2018

Contents