Assembly Hubs: Difference between revisions

From genomewiki
Jump to navigationJump to search
(filling in the genomes.txt section)
Line 48: Line 48:
==genomes.txt==
==genomes.txt==
The [http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/genomes.txt genomes.txt]  file provides the references to the
The [http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/genomes.txt genomes.txt]  file provides the references to the
genome assemblies and tracks available at this assembly hub.
genome assemblies and tracks available at this assembly hub.  The example file indicates the typical contents:
<pre>
genome ricCom1
trackDb ricCom1/trackDb.txt
groups ricCom1/groups.txt
description July 2011 Castor bean
twoBitPath ricCom1/ricCom1.2bit
organism Ricinus communis
defaultPos EQ973772:1000000-2000000
orderKey 4800
scientificName Ricinus communis
htmlPath ricCom1/description.html
</pre>
 
* The ''genome'' name is the equivalent to the UCSC database name.  The genome browser displays this database name in titles of pages in the genome browser.
* The ''trackDb'' refers to a file which defines the tracks to place on this genome assembly.  The format of this file is described in the [http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html Track Hub] help reference documentation.
* The ''groups'' refers to a file which defines the track groups on this genome browser.  Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image.
* The ''description'' will be displayed for user information on the gateway page and most title pages of this genome assembly browser.  It is the name displayed in the ''assembly'' pull-down menu on the browser gateway page.
* The ''twoBitPath'' refers to the '''.2bit''' file containing the sequence for this assembly.  Typically this file is constructed from the original fasta files for the sequence using the kent program '''twoBitToFa'''
* The ''organism'' string is displayed along with the ''description'' on most title pages in the genome browser.  Adjust your names in ''organism'' and ''description'' until they are appropriate.  This example is very close to what the genome browser normally displays.  This ''organism'' name is the name that appears in the ''genome'' pull-down menu on the browser gateway page.
* The ''defaultPos'' specifies the default position the genome browser will open when a user first views this assembly.  This is usually selected to highlight a popular gene or region of interest in the genome assembly.
* The ''orderKey'' is used with other genome definitions at this hub to order the pull-down menu ordering the ''genome'' pull-down menu.

Revision as of 22:34, 17 April 2013

Overview

The Assembly Hub function is new technology in the UCSC Genome Browser as of early 2013 which allows you to display your novel genome sequence using the UCSC Genome Browser

Web Server

To display your novel genome sequence, you use a web server at your institution to supply your files to the UCSC Genome Browser. Establish a hierarchy of directories and files to host your novel genome sequence. For example:

myHub/ - directory to organize your files on this hub
     hub.txt – primary reference text file to define the hub, refers to:
     genomes.txt – definitions for each genome assembly on this hub
          newOrg1/ - directory of files for this specific genome assembly
               newOrg1.2bit – ‘2bit’ file constructed from your fasta sequence
               description.html – information about this assembly for users
               trackDb.txt – definitions for tracks on this genome assembly
               groups.txt – definitions for track groups on this assembly
               bigWig and bigBed files – data for tracks on this assembly
               external track hub data tracks can be displayed on this assembly

The URL to reference this hub would be: http://yourLab.yourInstitution.edu/myHub/hub.txt

You can view a working example hierarchy of files at: Plants

hub.txt

The initial file hub.txt is the primary URL reference for your assembly hub. The format of the file:

hub hubName
shortLabel genome
longLabel Comment describing this hub contents
genomesFile genomes.txt
email contactEmail@institution.edu

The shortLabel is the name that will appear in the genome pull-down menu at the UCSC gateway page. Example: Plants

The genomesFile is a reference to the next definition file in this chain that will describe the assemblies and tracks available at this hub. Typically genomes.txt is at the same directory level as this hub.txt, however it can also be a relative path reference to a different directory level.

The email address provides users a contact point for queries related to this assembly hub.

genomes.txt

The genomes.txt file provides the references to the genome assemblies and tracks available at this assembly hub. The example file indicates the typical contents:

genome ricCom1
trackDb ricCom1/trackDb.txt
groups ricCom1/groups.txt
description July 2011 Castor bean
twoBitPath ricCom1/ricCom1.2bit
organism Ricinus communis
defaultPos EQ973772:1000000-2000000
orderKey 4800
scientificName Ricinus communis
htmlPath ricCom1/description.html
  • The genome name is the equivalent to the UCSC database name. The genome browser displays this database name in titles of pages in the genome browser.
  • The trackDb refers to a file which defines the tracks to place on this genome assembly. The format of this file is described in the Track Hub help reference documentation.
  • The groups refers to a file which defines the track groups on this genome browser. Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image.
  • The description will be displayed for user information on the gateway page and most title pages of this genome assembly browser. It is the name displayed in the assembly pull-down menu on the browser gateway page.
  • The twoBitPath refers to the .2bit file containing the sequence for this assembly. Typically this file is constructed from the original fasta files for the sequence using the kent program twoBitToFa
  • The organism string is displayed along with the description on most title pages in the genome browser. Adjust your names in organism and description until they are appropriate. This example is very close to what the genome browser normally displays. This organism name is the name that appears in the genome pull-down menu on the browser gateway page.
  • The defaultPos specifies the default position the genome browser will open when a user first views this assembly. This is usually selected to highlight a popular gene or region of interest in the genome assembly.
  • The orderKey is used with other genome definitions at this hub to order the pull-down menu ordering the genome pull-down menu.