Minimal Browser Installation: Difference between revisions
(typo) |
(changed cse to soe and genome-source fixes) |
||
(35 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
'''NOTE''': This page is not necessarily maintained with the most up-to-date information. The information in [http://hgwdev.gi.ucsc.edu/goldenPath/help/mirror.html our mirror instructions] is current. | |||
==Minimal Browser Installation== | ==Minimal Browser Installation== | ||
Usually a browser installation wants to be a subset of genomes compared to the entire [http://genome.ucsc.edu/ | Usually a browser installation wants to be a subset of all data from a selection of genomes compared to the entire [http://genome.ucsc.edu/ UCSC Genome Browser]. A mirror with a subset of genomes is often called a "partial mirror". Instead of the entire rsync of everything from some genomes as documented on [http://genome.ucsc.edu/admin/mirror.html Mirror Instructions] or on this wiki ([[Browser installation]]), sometimes you want only a subset of data from one genome or want to create a genome browser for an entirely new genome. This is more work but possible. We call this a "Minimal Browser" in the following. This page shows you a minimal browser that displays only the genome sequence and the gaps. It is admittedly not a very useful genome browser but will get you started. | ||
A '''license''' is required for commercial download and/or installation of the Genome Browser binaries and source code. No license is needed for academic, nonprofit, and personal use. To purchase a license, see our [http://genome.ucsc.edu/license/index.html License Instructions]. | |||
A minimal browser database needs | Please note the full discussion in the README.* files and scripts to assist with these browser installation procedures in the source tree directory: | ||
[http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product src/product/] and | |||
[http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product/scripts src/product/scripts/] | |||
A minimal browser database needs six tables: | |||
<UL> | <UL> | ||
<LI>grp</LI> | <LI>grp</LI> | ||
Line 13: | Line 18: | ||
<LI>trackDb</LI> | <LI>trackDb</LI> | ||
<LI>hgFindSpec</LI> | <LI>hgFindSpec</LI> | ||
<LI> | <LI>gold</LI> | ||
<LI>gap</LI> | |||
</UL> | </UL> | ||
The gateway page needs the hgcentral database to function. The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload. | The gateway page needs the hgcentral database to function. The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral or loaded from the SQL text file at http://hgdownload.soe.ucsc.edu/admin/hgcentral.sql | ||
Enter a '''defaultGenome=<your species>''' specification in your '''/cgi-bin/hg.conf''' file. See notes in the | |||
[http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/ex.hg.conf src/product/ex.hg.conf] file for this option. | |||
For the /gbdb/ data area, at a minimum you will need the .2bit file or the nib files for the assembly. This is either:<BR><pre> | For the /gbdb/ data area, at a minimum you will need the .2bit file or the nib files for the assembly. This is either:<BR><pre> | ||
/gbdb/<database>/<database>.2bit | /gbdb/<database>/<database>.2bit | ||
or | or for older genome assemblies: | ||
/gbdb/<database>/nib/*.nib | /gbdb/<database>/nib/*.nib | ||
</pre> | </pre> | ||
Line 30: | Line 37: | ||
For the genbank sequences, you can check the gbExtFile table for your database to see exactly which files are used by that assembly in '''/gbdb/genbank/'''<BR> | For the genbank sequences, you can check the gbExtFile table for your database to see exactly which files are used by that assembly in '''/gbdb/genbank/'''<BR> | ||
Extract the "path" column from that table and use that list in a '''--files-from''' specification for your rsync. | Extract the "path" column from that table and use that list in a '''--files-from''' specification for your rsync. | ||
You will also need the /gbdb/hgFixed/ and the hg19 installation requires the /gbdb/hg19Patch5/ directory and database. | |||
==Additional databases== | |||
To mirror a single genome, there are a few extra databases that are required to | |||
enable the full functions of that single genome database. These databases contain data that | |||
are not specific to a single genome assembly. Your particular genome may need one or more | |||
of the following databases: | |||
<UL> | |||
<LI>go080130</LI> | |||
<LI>hgFixed</LI> | |||
<LI>proteins090821</LI> | |||
<LI>sp090821</LI> | |||
</UL> | |||
With symlinks in your MySQL data directory: | |||
<UL> | |||
<LI>go -> go080130</LI> | |||
<LI>proteome -> proteins090821</LI> | |||
<LI>uniProt -> sp090821</LI> | |||
</UL> | |||
The specific selection of the go, proteins, and uniProt databases can be found | |||
in the hgcentral gdbPdb table: | |||
hgsql -e "select * from gdbPdb;" hgcentral | |||
The symlinks remain as indicated above, other genomes will reference a specific | |||
protein, go, or uniProt database explicitly. | |||
You may need to add MySQL GRANT permissions for these new databases if your read-only | |||
MySQL user has a specific list of database accesses. | |||
==Partial Mirrors== | |||
The currently recommended UCSC browser mirror procedures can be found in the source tree: | |||
[http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product/scripts scripts] directory. | |||
To fully utilize scripts such as these, you should be familiar with shell programming and you should | |||
be able to understand what the scripts are doing so you can customize them to your particular installation. | |||
They are not going to work blindly out of the box.<BR> | |||
==See also== | |||
[[Building a new genome database]] | |||
[[Browser Installation]] | |||
[[Browser Mirrors]] | |||
==User notes== | |||
I made hgBlat work on my local browser installation by putting the full hostnames into hgcentral.blatservers, e.g. 'blat4' was replaced by the output of `blat4.soe.ucsc.edu`. I wonder if it wouldn't be a good idea to mention this in the mirroring instructions somewhere. --- max | |||
---- | |||
Before you start using our blat servers, you need to verify with us that you have permission. We can't have everyone with a mirror site simply use our blat servers, the load would take them down for everyone. See also: [http://www.kentinformatics.com/ Kent Informatics] for a commercial blat license. | |||
---- | |||
A nice command from Paul McKenna: UPDATE blatServers SET host=concat(host,'.soe.ucsc.edu'); [[User:Max|Max]] 15:11, 3 February 2007 (PST) | |||
[[Category:Technical FAQ]] | [[Category:Technical FAQ]] | ||
[[Category:Mirror Site FAQ]] | [[Category:Mirror Site FAQ]] | ||
[[Category:Browser Linked]] | |||
[[Category:Installation]] |
Latest revision as of 06:48, 1 September 2018
NOTE: This page is not necessarily maintained with the most up-to-date information. The information in our mirror instructions is current.
Minimal Browser Installation
Usually a browser installation wants to be a subset of all data from a selection of genomes compared to the entire UCSC Genome Browser. A mirror with a subset of genomes is often called a "partial mirror". Instead of the entire rsync of everything from some genomes as documented on Mirror Instructions or on this wiki (Browser installation), sometimes you want only a subset of data from one genome or want to create a genome browser for an entirely new genome. This is more work but possible. We call this a "Minimal Browser" in the following. This page shows you a minimal browser that displays only the genome sequence and the gaps. It is admittedly not a very useful genome browser but will get you started.
A license is required for commercial download and/or installation of the Genome Browser binaries and source code. No license is needed for academic, nonprofit, and personal use. To purchase a license, see our License Instructions.
Please note the full discussion in the README.* files and scripts to assist with these browser installation procedures in the source tree directory: src/product/ and src/product/scripts/
A minimal browser database needs six tables:
- grp
- chromInfo
- trackDb
- hgFindSpec
- gold
- gap
The gateway page needs the hgcentral database to function. The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral or loaded from the SQL text file at http://hgdownload.soe.ucsc.edu/admin/hgcentral.sql
Enter a defaultGenome=<your species> specification in your /cgi-bin/hg.conf file. See notes in the src/product/ex.hg.conf file for this option.
For the /gbdb/ data area, at a minimum you will need the .2bit file or the nib files for the assembly. This is either:
/gbdb/<database>/<database>.2bit or for older genome assemblies: /gbdb/<database>/nib/*.nib
Various tracks use other files in this directory. If you don't care about all the tracks, you won't need other files here.
For the genbank sequences, you can check the gbExtFile table for your database to see exactly which files are used by that assembly in /gbdb/genbank/
Extract the "path" column from that table and use that list in a --files-from specification for your rsync.
You will also need the /gbdb/hgFixed/ and the hg19 installation requires the /gbdb/hg19Patch5/ directory and database.
Additional databases
To mirror a single genome, there are a few extra databases that are required to enable the full functions of that single genome database. These databases contain data that are not specific to a single genome assembly. Your particular genome may need one or more of the following databases:
- go080130
- hgFixed
- proteins090821
- sp090821
With symlinks in your MySQL data directory:
- go -> go080130
- proteome -> proteins090821
- uniProt -> sp090821
The specific selection of the go, proteins, and uniProt databases can be found in the hgcentral gdbPdb table:
hgsql -e "select * from gdbPdb;" hgcentral
The symlinks remain as indicated above, other genomes will reference a specific protein, go, or uniProt database explicitly.
You may need to add MySQL GRANT permissions for these new databases if your read-only MySQL user has a specific list of database accesses.
Partial Mirrors
The currently recommended UCSC browser mirror procedures can be found in the source tree: scripts directory.
To fully utilize scripts such as these, you should be familiar with shell programming and you should
be able to understand what the scripts are doing so you can customize them to your particular installation.
They are not going to work blindly out of the box.
See also
Building a new genome database
User notes
I made hgBlat work on my local browser installation by putting the full hostnames into hgcentral.blatservers, e.g. 'blat4' was replaced by the output of `blat4.soe.ucsc.edu`. I wonder if it wouldn't be a good idea to mention this in the mirroring instructions somewhere. --- max
Before you start using our blat servers, you need to verify with us that you have permission. We can't have everyone with a mirror site simply use our blat servers, the load would take them down for everyone. See also: Kent Informatics for a commercial blat license.
A nice command from Paul McKenna: UPDATE blatServers SET host=concat(host,'.soe.ucsc.edu'); Max 15:11, 3 February 2007 (PST)