Public Hub Guidelines: Difference between revisions

From genomewiki
Jump to navigationJump to search
(→‎Guidelines for User Experience:: adding suggestions assembly hubs have defaultPos , scientificName, organism, description)
(→‎Guidelines for User Experience:: restructure added info.)
Line 16: Line 16:
* It is best to avoid setting a composite track and all of the corresponding subtracks to the same visibility.  When you have composite tracks that are hidden by default, it is best to still designate some subtracks to display when the composite track is turned on (visibility dense, versus the default of hide).  This provides an example of your track data to users who turn on your composite track.  If no subtracks are turned on by default, a user who changes your composite track visibility to "show" won't see anything.
* It is best to avoid setting a composite track and all of the corresponding subtracks to the same visibility.  When you have composite tracks that are hidden by default, it is best to still designate some subtracks to display when the composite track is turned on (visibility dense, versus the default of hide).  This provides an example of your track data to users who turn on your composite track.  If no subtracks are turned on by default, a user who changes your composite track visibility to "show" won't see anything.
* If you are making an assembly hub, you will want to add a gateway page for each assembly  by having a ''htmlPath'' line for each genome not in the Browser in genomes.txt. [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hubs Wiki]
* If you are making an assembly hub, you will want to add a gateway page for each assembly  by having a ''htmlPath'' line for each genome not in the Browser in genomes.txt. [http://genomewiki.ucsc.edu/index.php/Assembly_Hubs Assembly Hubs Wiki]
**With assembly hubs, we strongly encourage these following settings being used in genomes.txt:
*With assembly hubs, we strongly encourage these following settings being used in genomes.txt (The last 3 settings will make it easier to find assembly hub species in hgGateway by UI search):
*** defaultPos , scientificName, organism, description
** defaultPos, scientificName, organism, description
**The latter 3 settings will make it easier to find assembly hub species in hgGateway by UI search.


== Guidelines for Composites:==
== Guidelines for Composites:==

Revision as of 21:03, 29 August 2016

Suggestions for Public Hubs

Based on common problems seen in hubs, this page outlines recommendations from the UCSC Genome Browser engineers. Please note that hosting hub files on HTTP tends to work even better than FTP because of the difference in the number of open tcp connections needed. (As a reference for interpreting trackDb.txt lines use the Hub Track Database Definition glossary)

Guidelines for User Experience:

  • Have no more than 10 tracks with visibility set to display on as default upon first opening your hub.
  • Add a descriptionUrl html page to hub.txt that includes a link to a description page with search terms or a full-text paper describing what your hub is all about. The descriptionUrl webpage will be indexed to enable finding your hub with our hub search function, so more terms on your descriptionUrl page will increase the chances of hits.
  • Have a description page for every configuration page (composite or stand alone track).
  • The description page should preferably contain UCSC's standard Description, Methods, Contacts... sections as defined here under "html" and here is an example template.
  • The description page MUST have a contact email address prominently displayed.
  • Note that multiple composites/tracks can use the same description page by using the html setting.
  • Related tracks should be combined into composites where appropriate. The hub track group should not be overwhelming with individual tracks when they can be combined into a meaningful composite organization. Such use of composites will make user configuration easier.
  • Extremely large hubs may use superTracks as well to achieve a meaningful hierarchy.
  • The shortLabel text should be under 17 characters, or meaningful information may be cut off from display when tracks are set to "dense" visibility.
  • The length for a longLabel should be limited to around 75 characters.
  • It is best to avoid setting a composite track and all of the corresponding subtracks to the same visibility. When you have composite tracks that are hidden by default, it is best to still designate some subtracks to display when the composite track is turned on (visibility dense, versus the default of hide). This provides an example of your track data to users who turn on your composite track. If no subtracks are turned on by default, a user who changes your composite track visibility to "show" won't see anything.
  • If you are making an assembly hub, you will want to add a gateway page for each assembly by having a htmlPath line for each genome not in the Browser in genomes.txt. Assembly Hubs Wiki
  • With assembly hubs, we strongly encourage these following settings being used in genomes.txt (The last 3 settings will make it easier to find assembly hub species in hgGateway by UI search):
    • defaultPos, scientificName, organism, description

Guidelines for Composites:

  • Have multi-view only when there is more than one view. Views ideally give alternate access to the same data (e.g. signals and called peaks). Keep in mind that the value of views is that they allow for more than one data/configuration type (e.g. bigBed and bigWig) in a single composite. All subtracks of a view must have the same data type. Likewise, all subtracks of a non-multi-view composite must be the same type.
  • Never represent the same subgroup in both view and as a dimension (e.g. NOT dimensions dimX=view). For that matter a subgroup should never be in two dimensions (e.g. NOT dimensions dimX=cell dimY=mark dimA=cell). The composite will appear to function but multiple ways of selecting the same thing will create a confusing and inconsistent User Interface.

Guidelines for Using Dimensions:

  • There should be no dimensions with a single entry (do not have only one cell line represented in dimX=cell), unless data growth is expected to fill in additional entries.
  • Using only one dimension: preferably use dimX (e.g. dimensions dimX=cell). This saves vertical User Interface space, but is not always the best choice.
  • Using two dimensions: use dimX and dimY (e.g. dimensions dimX=cell dimY=mark)
  • Using more than two: use dimX, dimY on the most important dimensions. Then use dimA,B,C as needed on lesser dimensions. (e.g. dimensions dimX=cell dimY=mark dimA=donor_id)
  • The A,B,C.. dimensions should probably use filterComposite (e.g. filterComposite dimA)
  • Each dimension and views should be represented in sortOrder, ideally in order of dimX, dimY, dimA,B,C, view (e.g. sortOrder cell_type=+ mark=+ donor_id=+ view=+). But the hub user may wish for a different sortOrder, which is fine.
  • Tags of subGroup/dimension should be short and sweet with no special chars. Also labels can have HTML codes embedded (e.g. NOT CPG_methylation_%=CPG_methylation_% RATHER mpct=CPG_methylation_&_#37)

Miscellaneous Guidelines:

  • The use of metadata lines can be supported, users need to be well aware that support may be replaced by another system in the future.

Public Hub Examples

The browser's public hubs provide excellent resources to see how others have created hub structures. As a reference for interpreting trackDb.txt lines use the Hub Track Database Definition glossary. For an example of hub configuration and documentation, one example is the ENCODE Analysis hub:

http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt

http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/genomes.txt

http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hg19/trackDb.txt

http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hg19/uniformTfbs.html

http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hg19/uniformRNA.html

Regarding creating meaningful html documentation, if you are creating a hub based on a paper, we suggest the paper's abstract as a useful start for your track's Description section. The Methods section should have more detail, and please include a contact for questions. Lastly, it is best to assume a broad audience of students as well as researchers. For example, it is best to spell out common acronyms for those who may be new to genomics.

Help for connection issues

Sometimes the servers hosting public hubs will experience administrative changes and no longer successfully serve up hub files. In most cases it is likely that new firewalls are limiting the access at the institution and are causing these connection problems. One can please ask their institution's admins to add this IP range as exceptions that are not limited:

These IP addresses are currently used by official genome browser mirrors:

  • 128.114.119.* = genome.ucsc.edu
  • 129.70.40.120 = european mirror, genome-euro.ucsc.edu
  • 134.160.84.67 = asian mirror, genome-asia.ucsc.edu
  • 132.249.245.79 = genome-test.ucsc.edu, used by developers and for debugging

Although our site is creating many requests to an institution, each is small and quickly satisfied by the server, so the total load on your webserver should be limited and system administrators will likely not have an issue with adding this exception.