How to add a track to a mirror: Difference between revisions

Revision as of 16:31, 7 January 2011

This describes how to add a track to the UCSC genome browser. The two major steps in adding a track are creating a table containing the track information, and putting a description of the track in trackDb. The browser has one mysql database for each version of each genome that it displays. Both the track table and the track description live in this database.

Introduction

Please note the UCSC comments about adding a track in the source tree file: src/product/README.trackDb.

MySQL Preliminaries

Before you get started it is good to look at these databases a little, and make sure that you have update access to them. You could do this directly with the 'mysql' command, but let's do it instead with the 'hgsql' command, which will keep you from having to type your user name and password all the time.

Assuming you've got the browser source already installed in ~/kent/src, do the following to create hgsql:

cd ~/kent/src/lib
make

This make almost always goes smoothly on Linux. You may need to remove the '-ggdb' flag in the makefile on other systems, and possibly set up a MACHTYPE environment variable, and then mkdir $MACHTYPE on some systems. Next,

cd ../hg/lib
make

The main problem that can happen with this make is if the mysql libraries and include files are not found. See kent/src/README for details. The next step is

cd ../hgsql
make
rehash

The hgsql program is just a thin wrapper around mysql. It looks for the password and username in the file ~/.hg.conf. Here's the necessary parts of .hg.conf:

# db.host is the name of the MySQL host to connect to
db.host=localhost
# db.user is the username is use when connecting to the host
db.user=mysqlUserName
# this is the password to use with the above hostname
db.password=mysqlPassword

This .hg.conf is similar to the cgi-bin/hg.conf file that the browser uses, but it need not contain everything that file does. Also it's advisable to have a read-only user/password in cgi-bin/hg.conf while you'll want a read-write user/password in ~/.hg.conf. Setting this up can involve doing some 'grants' in mysql. See the documentation at www.mysql.com for how do set up various users.

Assuming your mysql and .hg.conf are set up, and that you already have a mirror site going then the command

hgsql database

(where database is something like hg13 or mm2) should bring you to the mysql prompt. Do

mysql> show tables;

and you should see a large list of tables. When you've finished adding a track, the track(s) for your tables will be among them. Also try doing:

mysql> describe trackDb;

This will list the fields of the trackDb table, which has a row for each track. Then do

mysql> select tableName,shortLabel,type from trackDb;

This will show you some of the key fields from this table. You won't be updating this table directly, but it can be handy to look at it sometimes for debugging purposes. Some other useful mysql commands are

mysql> select count(*) from sex;

This will count up all the items in the sex table. You thought there were only two? This table reflects the diversity of the sex fields in genbank. Try

mysql> select * from sex;

to see the full diversity.

Well, enough of that non-normalized nightmare. To get out do:

mysql> quit

Loading the main track table

The UCSC Genome Browser Database is usually loaded from a text file of some sort. The most popular types of text files are .psl files for blat alignments, .bed files for a wide variety of data, and GTF files for gene predictions. See http://genome.ucsc.edu/goldenPath/help/customTrack.html for further information on these formats. For now we'll assume you have a file in one of these formats that you want to add to the browser.

Creating the loader programs

cd ~/kent/src/hg/makeDb
cd hgLoadPsl
make
cd ../hgLoadBed
make
cd ../ldHgGene
make
cd ../hgPepPred
make

The makeDb directory also contains loaders for a number of more specialized tables including hgLoadOut for RepeatMasker data. There is also a .doc file describing in detail how we created each database in the files named things like makeHg13.doc and makeMm2.doc.

Loading a bed file

Loading a bed file is the most straightforward. First decide on the name you want to call the table. Then do

hgLoadBed database tableName file.bed

Type hgLoadBed with no arguments for further information. Database will be something like hg13 or mm2.

Loading a psl file

Loading a psl file is also easy. Make sure that the psl file is sorted by chromosome (tName) and start position (tStart). Use kent/src/hg/pslSort or just plain Unix sort for this if necessary.

If the number of alignments is somewhat modest (say less than 500,000) then do

hgLoadPsl database -table=tableName file.psl -tNameIx

This will load everything into one big table. For huge numbers of alignments the browser will be faster if you first split up the data into one file for each chromosome. Name these files chr1_tableName.psl, chr2_tableName.psl, and so forth. Then do

hgLoadPsl database chr*_tableName.psl

This will end up making a separate table for each chromosome.

Unfortunately it is still a bit complicated to make the details pages for a psl format track to include the alignments themselves. Please contact us at UCSC if this is a priority for you and we will try to make it easier.

Loading a GTF (or GFF) file

Generally GTF is a much more tightly defined standard than GFF, so GTF files are more likely to work without tweaking. However most reasonable GFFs will work as well. To load do

ldHgGene database tableName file(s).gtf

This will make a gene-prediction type table. You often will want to create an associated predicted peptide table as well. To do this do

hgPepPred database generic tableNamePep file(s).fa

The first word after the '>' in the fasta files should use the same symbol as the 'group' in GFF files or 'transcript_id' in GTF files.

Updating trackDb

Your data will not display in the browser until you load it into trackDb. To do this first

cd ~/kent/src/hg/makeDb/trackDb

and look at the file trackDb.ra, and read the README file. Then decide whether your new track should be global, organism-specific, or assembly-specific, and edit the corresponding trackDb.ra file. Generally it's good to find an existing track as similar as possible to the track you want to add, copy and paste it, and modify the copy. Then put any explanatory text you want on the track in trackName.html in the appropriate directory. After this do a

make alpha

to update the trackDb table, or a

make

to update trackDb_user (where user is you Unix username).

If multiple engineers are working on the project you can set up cgi-bin-user directories with hg.conf files that will tell the browser to use trackDb_user instead of trackDb to avoid conflicts with other engineers code.

After the 'make alpha' the browser should show your track.

How to add a track to a mirror: Difference between revisions

Revision as of 16:31, 7 January 2011

Contents

Introduction

MySQL Preliminaries

Loading the main track table

Creating the loader programs

Loading a bed file

Loading a psl file

Loading a GTF (or GFF) file

Updating trackDb

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools

@@ Line 1: / Line 1: @@
-I. Introduction.
+This describes how to add a track to the UCSC genome browser.  The two major steps in adding a track are creating a table containing the track information, and putting a description of the track in trackDb.  The browser has one mysql database for each version of each genome that it displays.  Both the track table and the track description live in this database.
-Please note the UCSC comments about adding a track in the source
+==Introduction==
-tree file: [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/README.trackDb src/product/README.trackDb].
-This describes how to add a track to the UCSC genome browser.
-The two major steps in adding a track are creating a table
-containing the track information, and putting a description
-of the track in trackDd.  The browser has one mysql database for
-each version of each genome that it displays.  Both the track
-table and the track description live in this database.  The current
-human genome database is hg13, while the current mouse database is
-mm2.
+Please note the UCSC comments about adding a track in the source tree file: [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/product/README.trackDb src/product/README.trackDb].
-II. MySQL Preliminaries.
+==MySQL Preliminaries==
-Before you get started it is good to look at these databases a little,
+Before you get started it is good to look at these databases a little, and make sure that you have update access to them.  You could do this
-and make sure that you have update access to them.  You could do this
+directly with the 'mysql' command,  but let's do it instead with the 'hgsql' command, which will keep you from having to type your user name
-directly with the 'mysql' command,  but let's do it instead with the
-'hgsql' command, which will keep you from having to type your user name
 and password all the time.
-Assuming you've got the browser source already installed in ~/kent/src
+Assuming you've got the browser source already installed in ~/kent/src, do the following to create hgsql:
-do the following to create hgsql
+ cd ~/kent/src/lib
-	cd ~/kent/src/lib
+ make
-	make
+This make almost always goes smoothly on Linux.  You may need to remove the '-ggdb' flag in the makefile on other systems, and possibly
-This make almost always goes smoothly on Linux.  You may need to
+set up a MACHTYPE environment variable, and then mkdir $MACHTYPE on some systems.  Next,
-remove the '-ggdb' flag in the makefile on other systems, and possibly
+ cd ../hg/lib
-set up a MACHTYPE environment variable, and then mkdir $MACHTYPE on
+ make
-some systems.  Next
+The main problem that can happen with this make is if the mysql libraries and include files are not found.   See kent/src/README for details. The
-	cd ../hg/lib
-	make
-The main problem that can happen with this make is if the mysql libraries
-and include files are not found.   See kent/src/README for details. The
 next step is
-	cd ../hgsql
+ cd ../hgsql
-	make
+ make
-	rehash
+ rehash
-The hgsql program is just a thin wrapper around mysql.  It looks for the
+The hgsql program is just a thin wrapper around mysql.  It looks for the password and username in the file ~/.hg.conf.   Here's the necessary
-password and username in the file ~/.hg.conf.   Here's the necessary
 parts of .hg.conf:
-# db.host is the name of the MySQL host to connect to
+<pre># db.host is the name of the MySQL host to connect to
 db.host=localhost
 # db.user is the username is use when connecting to the host
 db.user=mysqlUserName
 # this is the password to use with the above hostname
-db.password=mysqlPassword
+db.password=mysqlPassword</pre>
-This .hg.conf is similar to the cgi-bin/hg.conf file that the
+This .hg.conf is similar to the cgi-bin/hg.conf file that the browser uses,  but it need not contain everything that file does.
-browser uses,  but it need not contain everything that file does.
+Also it's advisable to have a read-only user/password in cgi-bin/hg.conf while you'll want a read-write user/password in ~/.hg.conf.  Setting
-Also it's advisable to have a read-only user/password in cgi-bin/hg.conf
+this up can involve doing some 'grants' in mysql.  See the documentation at www.mysql.com for how do set up various users.
-while you'll want a read-write user/password in ~/.hg.conf.  Setting
-this up can involve doing some 'grants' in mysql.  See the documentation
-at www.mysql.com for how do set up various users.
-Assuming your mysql and .hg.conf are set up, and that you already
+Assuming your mysql and .hg.conf are set up, and that you already have a mirror site going then the command
-have a mirror site going then the command
+ hgsql database
-	hgsql database
+(where database is something like hg13 or mm2) should bring you to the mysql prompt.  Do
-(where database is something like hg13 or mm2) should bring you to the
+ mysql> show tables;
-mysql prompt.  Do
+and you should see a large list of tables.  When you've finished adding a track, the track(s) for your tables will be among them.  Also try
-	mysql> show tables;
-and you should see a large list of tables.  When you've finished adding
-a track, the track(s) for your tables will be among them.  Also try
 doing:
-	mysql> describe trackDb;
+ mysql> describe trackDb;
-This will list the fields of the trackDb table, which has a row for
+This will list the fields of the trackDb table, which has a row for each track.  Then do
-each track.  Then do
+ mysql> select tableName,shortLabel,type from trackDb;
-	mysql> select tableName,shortLabel,type from trackDb;
+This will show you some of the key fields from this table.  You won't be updating this table directly, but it can be handy to look at it
-This will show you some of the key fields from this table.  You won't
-be updating this table directly, but it can be handy to look at it
 sometimes for debugging purposes.   Some other useful mysql commands are
-        mysql> select count(*) from sex;
+ mysql> select count(*) from sex;
-This will count up all the items in the sex table.  You thought there
+This will count up all the items in the sex table.  You thought there were only two?  This table reflects the diversity of the sex fields in
-were only two?  This table reflects the diversity of the sex fields in
 genbank.  Try
-	mysql> select * from sex;
+ mysql> select * from sex;
-to see the full diversity.   Well, enough of that non-normalized nightmare.
+to see the full diversity.
-To get out do:
-        mysql> quit
+Well, enough of that non-normalized nightmare. To get out do:
+ mysql> quit
-III. Loading the main track table.
+==Loading the main track table==
-The UCSC Genome Browser Database is usually loaded from
+The UCSC Genome Browser Database is usually loaded from a text file of some sort.  The most popular types of text files are .psl files for blat alignments, .bed files for a wide variety of data, and GTF files for gene predictions.  See [http://genome.ucsc.edu/goldenPath/help/customTrack.html http://genome.ucsc.edu/goldenPath/help/customTrack.html] for further information on these formats.  For now we'll assume you have a file in one of these formats that you want to add to the browser.
-a text file of some sort.  The most popular types of
-text files are .psl files for blat alignments,  .bed files
-for a wide variety of data, and GTF files for gene predictions.
-See http://genome.ucsc.edu/goldenPath/help/customTrack.html
-for further information on these formats.  For now we'll
-assume you have a file in one of these formats that you
-want to add to the browser.
-A) Creating the loader programs
+===Creating the loader programs===
-	cd ~/kent/src/hg/makeDb
+ cd ~/kent/src/hg/makeDb
-	cd hgLoadPsl
+ cd hgLoadPsl
-	make
+ make
-	cd ../hgLoadBed
+ cd ../hgLoadBed
-	make
+ make
-	cd ../ldHgGene
+ cd ../ldHgGene
-	make
+ make
-	cd ../hgPepPred
+ cd ../hgPepPred
-   The makeDb directory also contains loaders for a number of more
+ make
-   specialized tables including hgLoadOut for RepeatMasker data.  There
+The makeDb directory also contains loaders for a number of more specialized tables including hgLoadOut for RepeatMasker data.  There is also a .doc file describing in detail how we created each database in the files named things like makeHg13.doc and makeMm2.doc.
-   is also a .doc file describing in detail how we created each database
-   in the files named things like makeHg13.doc and makeMm2.doc.
-B) Loading a bed file
+===Loading a bed file===
-   Loading a bed file is the most straightforward.  First decide on
+Loading a bed file is the most straightforward.  First decide on the name you want to call the table.  Then do
-   the name you want to call the table.  Then do
+ hgLoadBed database tableName file.bed
-       hgLoadBed database tableName file.bed
+Type hgLoadBed with no arguments for further information.  Database will be something like hg13 or mm2.
-   type hgLoadBed with no arguments for further information.
-   Database will be something like hg13 or mm2.
-C) Loading a psl file
+===Loading a psl file===
-   Loading a psl file is also easy.  Make sure that the psl file
+Loading a psl file is also easy.  Make sure that the psl file is sorted by chromosome (tName) and start position (tStart).  Use kent/src/hg/pslSort or just plain Unix sort for this if necessary.
-   is sorted by chromosome (tName) and start position (tStart).  Use
-   kent/src/hg/pslSort or just plain Unix sort for this if necessary.
-   If the number of alignments is somewhat modest (say less than
-,000) then do
-   	hgLoadPsl database -table=tableName file.psl -tNameIx
-   This will load everything into one big table.  For huge numbers of
-   alignments the browser will be faster if you first split up the
-   data into one file for each chromosome.  Name these files
-   chr1_tableName.psl  chr2_tableName.psl and so forth.  Then do
-        hgLoadPsl database chr*_tableName.psl
-   This will end up making a separate table for each chromosome.
-   Unfortunately it is still a bit complicated to make the details
-   pages for a psl format track to include the alignments themselves.
-   Please contact us at UCSC if this is a priority for you and we
-   will try to make it easier.
-D) Loading a GTF (or GFF) file
+If the number of alignments is somewhat modest (say less than 500,000) then do
-   Generally GTF is a much more tightly defined standard than GFF, so
+  hgLoadPsl database -table=tableName file.psl -tNameIx
-   GTF files are more likely to work without tweaking.  However most
+This will load everything into one big table.  For huge numbers of alignments the browser will be faster if you first split up the data into one file for each chromosome.  Name these files ''chr1_tableName.psl'', ''chr2_tableName.psl'', and so forth.  Then do
-   reasonable GFFs will work as well.  To load do
+ hgLoadPsl database chr*_tableName.psl
-        ldHgGene database tableName file(s).gtf
+This will end up making a separate table for each chromosome.
-   This will make a gene-prediction type table. You often will want
-   to create an associated predicted peptide table as well. To do this
-   do
-        hgPepPred database generic tableNamePep file(s).fa
-   The first word after the '>' in the fasta files should use the same
-   symbol as the 'group' in GFF files or 'transcript_id' in GTF files.
-IV. Updating trackDb
+Unfortunately it is still a bit complicated to make the details pages for a psl format track to include the alignments themselves.  Please contact us at UCSC if this is a priority for you and we will try to make it easier.
-Your data will not display in the browser until you load it
+===Loading a GTF (or GFF) file===
-into trackDb. To do this first
+Generally GTF is a much more tightly defined standard than GFF, so GTF files are more likely to work without tweaking.  However most reasonable GFFs will work as well.  To load do
-	cd ~/kent/src/hg/makeDb/trackDb
+ ldHgGene database tableName file(s).gtf
-and look at the file trackDb.ra,  and read the README file.
+This will make a gene-prediction type table. You often will want to create an associated predicted peptide table as well. To do this do
-Then decide whether your new track should be global, organism
+ hgPepPred database generic tableNamePep file(s).fa
-specific, or assembly specific, and edit the corresponding
+The first word after the '>' in the fasta files should use the same symbol as the 'group' in GFF files or 'transcript_id' in GTF files.
-trackDb.ra file.  Generally it's good to find an existing
-track as similar as possible to the track you want to add,
+==Updating trackDb==
-copy and paste it, and modify the copy.  Then put any explanitory
-text you want on the track in trackName.html in the appropriate
+Your data will not display in the browser until you load it into trackDb. To do this first
-directory.  After this do a
+ cd ~/kent/src/hg/makeDb/trackDb
-	make alpha
+and look at the file trackDb.ra, and read the README file.  Then decide whether your new track should be global, organism-specific, or assembly-specific, and edit the corresponding trackDb.ra file.  Generally it's good to find an existing track as similar as possible to the track you want to add,
+copy and paste it, and modify the copy.  Then put any explanatory text you want on the track in trackName.html in the appropriate directory.  After this do a
+ make alpha
 to update the trackDb table,  or a
-        make
+ make
 to update trackDb_user (where user is you Unix username).
-If multiple engineers are working on the project you can set
-up cgi-bin-user directories with hg.conf files that will
+If multiple engineers are working on the project you can set up cgi-bin-user directories with hg.conf files that will tell the browser to use trackDb_user instead of trackDb to avoid conflicts with other engineers code.
-tell the browser to use trackDb_user instead of trackDb
-to avoid conflicts with other engineers code.
 After the 'make alpha' the browser should show your track.
--Jim Kent   Feb 14, 2003.
 [[Category:Technical FAQ]]