Gus

From BiowebDB
Jump to: navigation, search

GENOMIC UNIFIED SCHEMA (GUS) 3.5

Contents

Minimum dependencies

GUS has support for ORACLE, and PosgreSQL. But as we chose to use PostgreSQL, our manual only covers PostgreSQL installation. Check GUS home or GUS Project on Sourceforge for further orientation on installing and using GUS with ORACLE.

Perl 5.8.1 or above;
– Perl Modules: DBI, DBD::Oracle and/or DBD::Pg, Parse::Yapp, XML::Simple, and XML::Parser or XML::SAX;
JAVA 1.5 or above, including the SDK. We experienced a problem with jbdc driver included in JAVA 1.4_06 with GUS installed in Fedora Core 4 installations. You could try this version, but we recommend using JAVA 1.5.
PostgresSQL 7.4 or above;
Apache Ant 1.6.2 or above;
Tomcat 5.0 or above (if you wish to install WDK).

The command lines started by # will need root access, and those started by $ may be executed by regular users. We will use $YOUR_PATH as a notation to “any directory you desire”. We strongly suggest to install all dependencies in one directory, i.e. /usr/local/.

PostgreSQL Installation

  1. Download PostgreSQL latest stable version;
  2. Decompress it;
  3. Enter the PostgreSQL directory;
  4. Execute the following commands:
$ ./configure
$ su
# gmake
# gmake install
# adduser postgres

From this point starts the creation of the database itself. Depending on the work your group develops, the database may need a grand amount of hard disk space. Given that prerequisites, your database may be allocated anyplace in your hard disk.

# mkdir /$YOUR_PATH/data
# chown postgres /$YOUR_PATH/data
# su - postgres
$ /usr/local/pgsql/bin/initdb -D /$YOUR_PATH/data

Now, you can start postgreSQL

$ /usr/local/pgsql/bin/postmaster -D /$YOUR_PATH/data -l logfile 2>&1 &

or

$ export PGDATA=/$YOUR_PATH/data
$ /usr/local/pgsql/bin/pg_ctl start -l filename.log &

pg_ctl starts postmaster
Variable PGDATA can be set in the user profile file.

Check the logfile to see if the database was successfully started.
You can also check the status of postgreSQL typing:

$ export PGDATA=/$YOUR_PATH/data
$ /usr/local/pgsql/bin/pg_ctl status

To stop postgreSQL, type:

$ export PGDATA=/$YOUR_PATH/data
$ /usr/local/pgsql/bin/pg_ctl stop

To view other options of pg_ctl

$ /usr/local/pgsql/bin/pg_ctl --help

Create a test database for it.

$ /$YOUR_PATH/pgsql/bin/createdb test
$ /$YOUR_PATH/pgsql/bin/psql test

Edit .bash_profile or /etc/profile (where [POSTGRES_PATH] means where you installed PostgreSQL on your server)

export PSQL_HOME=/PATH/[POSTGRES_PATH]/
export PATH=$PSQL_HOME/bin:$PATH

For better GUS' administration, the dependencies' path should be included either in each user BASH (/home/username/.bash_profile) for each user using/administrating GUS, in /etc/profile as a global for all users, or you might create a group for users using GUS and add this permissions for the users using/administrating GUS.

Adding a user and granting permissions

I needed a way to grant permissions on all tables of a given schema. Didn't really know how to do it, so I followed someone's advice. In the script below I add "SELECT" permission to Dots, Core and Sres to a role called "aluno". I later added three users to that role.

SELECT 'grant select on ' || n.nspname || '.' || c.relname || ' to \"aluno\";' 
FROM pg_class c, pg_namespace n
WHERE c.relnamespace = n.oid
AND c.relname NOT LIKE 'pg_%' and  c.relkind in ('r','v','s')
AND n.nspname in ('dots','core','sres');

ANT Installation

  1. Download Apache ANT;
  2. Decompress using the following command:
# tar xzvf apache-ant-x.x.x-bin.tar.gz

Add the following to .bash_profile or /etc/profile:

export ANT_HOME=/$YOUR_PATH/apache-ant-x.x.x/
export PATH=$ANT_HOME/bin:$PATH

JAVA Installation

  1. Download latest JAVA version and execute on your local directory:
# sh jdk-x_x_x_x-linux-i586.bin 

Add the following to .bash_profile or /etc/profile:

export JAVA_HOME=/$YOUR_PATH/jdkx.x.x_xx/
export PATH=$JAVA_HOME/bin:$PATH

DBD-Pg module

In Ubuntu 7.10, the DBD-Pg module should be installed like this:


sudo apt-get install libdbd-pg-perl
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  libdbd-pg-perl
0 upgraded, 1 newly installed, 0 to remove and 4 not upgraded.
Need to get 129kB of archives.
After unpacking 385kB of additional disk space will be used.
Get:1 http://br.archive.ubuntu.com gutsy/universe libdbd-pg-perl 1.49-2build1 [129kB]
Fetched 129kB in 1s (88.0kB/s)     
Selecting previously deselected package libdbd-pg-perl.
(Reading database ... 33537 files and directories currently installed.)
Unpacking libdbd-pg-perl (from .../libdbd-pg-perl_1.49-2build1_i386.deb) ...
Setting up libdbd-pg-perl (1.49-2build1) ...

JDBC driver Installation

  1. Download latest JDBC driver (PostgreSQL compatible) and copy to /$YOUR_PATH/jdk1.5.0_04/lib/;
  2. Add the following to .bash_profile or /etc/profile:
CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/postgresql-x.X-xxx.jar
export CLASSPATH

GUS 3.0 Installation

Old manual generated by Poliana


GUS 3.5 Installation

Create /GUS/ directory in your chosen path:

$ mkdir GUS

Create database “$GUS_DB” (in PostgreSQL) [$GUS_DB as “any name you might want”]

$ createdb $GUS_DB

Create database user “$GUS_USER” (in PostgreSQL, as well) [$GUS_USER as “any username you might be creative enough to think of]

$ createuser $GUS_USER

Our research group wrote a script to facilitate the research proccess. Copy this script to $GUS_HOME, and fill in the following lines in build.properties:

# Propriedades do buildfile build.xml

# Diretorios a serem criados
dir.project_home = $YOUR_GUS_PATH/project_home
dir.gus_home     = $YOUR_GUS_PATH/gus_home

Where $YOUR_GUS_PATH is the directory where GUS will be installed (i.e., /GUS/). And below, in the same file:

# Valores das propriedades do arquivo
        gus.propriedade.dbVendor.valor         = Postgres
        gus.propriedade.dbiDsn.valor           = dbi:Pg:dbname=gustest
        gus.propriedade.jdbcDsn.valor          = jdbc:postgresql:gustest
        gus.propriedade.databaseLogin.valor    = usertest
        gus.propriedade.databasePassword.valor = usertest
        gus.propriedade.userName.valor         = dba
        gus.propriedade.group.valor            = dba
        gus.propriedade.project.valor          = Database administration

Where all lines come filled in as listed above in the script by default. This values should be filled in according to the database and the user you have created in Postgres, as shown above. Next, change the /GUS path permission to allow GUS' admins

# chown -R username:group /usr/local/GUS

Add the following lines to .bash_profile for each user administrating GUS (or as root in /etc/profile as a global pattern):

export GUS_HOME=/$YOUR_PATH/GUS/gus_home
export PROJECT_HOME=/$YOUR_PATH/GUS/project_home
export GUS_CONFIG_FILE=/$YOUR_PATH/GUS/gus.properties
export PERL5LIB=$GUS_HOME/lib/perl
export PATH=$GUS_HOME/bin:$PROJECT_HOME/install/bin

To register the basic plugins, use the following command:

$ ga +meta –commit
Bug:

[amanda1@kineto4 ~]$ ga +create GUS::Supported::Plugin::LoadGusXml
FATAL: Database version does not match required version for Plugin.
Database Version:     Plugin Version: 3.5
FATAL: Failed running 'ga +meta --commit' with stderr:

Solution:


INSERT INTO "databaseversion" ("database_version_id", "version", "modification_date", "user_read", "user_write", "group_read", "group_write", "other_read", "other_write", "row_user_id", "row_group_id", "row_project_id", "row_alg_invocation_id") VALUES (1, 3.5, '2007-08-02 16:02:31.694305', 1, 1, 1, 1, 1, 0, 1, 1, 1, 1)

GUS Install debugging

GUS Install troubleshoting

WDK Installation

The GUS WDK is designed to accelerate the creation of "data mining" websites. It can work on any relational database system and on any schema

Download latest WDK version;
Decompress WDK_version_x-x.tar.gz in /$PROJECT_HOME;
Execute:

build WDKToySite install -append;

Edit toyModel-config.xml the following lines located in /$GUS_HOME/config:

	<modelConfig login="usertest"
	             	password="usertest"
		            connectionUrl="jdbc:postgresql:gustest"
		            queryInstanceTable="QueryInstance"
		            maxQueryParams="50"
		            platformClass="org.gusdb.wdk.model.implementation.PostgreSQL"
		            initialSize="3"
          		    maxActive="20"
			    maxIdle="3"
		            minIdle="3"
		            maxWait="500"/>

Create a Postgres schema for the GUS DB:

$ psql -U usertest gustest
gustest=# CREATE SCHEMA usertest
gustest=#\q

Execute the following commands: wdkCache, wdkTestDb, wdkSanityTest

$ wdkCache -model toyModel -new
$ wdkTestDb -model toyModel -create
  • NOTE:* As of July 21st, 2007 the creation of the test database does not work for PostgreSQL. It throws the following error message:
ERROR: column "row_project_id" is of type numeric but expression is of type character varying
java.sql.SQLException: ERROR: column "row_project_id" is of type numeric but expression is of type character varying
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1471)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1256)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:175)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:392)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:330)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:282)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:233)
        at org.gusdb.wdk.model.implementation.SqlUtils.getResultSet(SqlUtils.java:152)
        at org.gusdb.wdk.model.test.TestDBManager.main(TestDBManager.java:116)

To get around it, you can manually add data to the tables.

1. Manually edit and remove the first line of each of the files in WDKToySite/Model/data/testTables. 2. Run the command "wdkTestDb -model toyModel -create" 8 times to create the 8 tables 3. Load data using the \copy from Postgres as described below


cd /home/kary1/data/modifiedTestTables

[kary1@kineto4 modifiedTestTables]$ psql
Password: 
Welcome to psql 8.2.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit


kary1=> \copy wdktestassembly from 'WDKTestAssembly' with null as ''
kary1=> \copy wdktestest from 'WDKTestEst' with null as ''
kary1=> \copy wdktestassemblysequence from 'WDKTestAssemblySequence' with null as ''
kary1=> \copy wdktestlibrary from 'WDKTestLibrary' with null as ''
kary1=> \copy wdktestzipcode from 'WDKTestZipCode'
kary1=> \copy wdktestgoterm from 'WDKTestGoTerm'
kary1=> \copy wdktesttaxonname from 'WDKTestTaxonName'
kary1=> \copy wdktestexternalnasequence from 'WDKTestExternalNaSequence'

Now you can proceed to the sanity tests.

$ wdkSanityTest -model toyModel

The last command should print the following message:

14 queries passed, 0 queries failed (the number of queries passed may change, from version to version)
2 records passed, 0 records failed
6 questions passed, 0 questions failed
Sanity Test PASSED


For further documentation on WDK, check GUS Web Development Kit website.

TOMCAT Installation

Download latest Apache Tomcat version;
Decompress in your chosen directory;
Edit /$YOUR_PATH/jakarta-tomcat-x.x.xx/conf/server.xml, starting in line 376:

<Context path="/wdktoysite"
        	   docBase="/usr/local/GUS/gus_home/webapp"
	           debug="1"
        	   privileged="false" allowLinking="true" >
     	<Logger className="org.apache.catalina.logger.FileLogger"
      	  	prefix="wdktoysite-log."
           	suffix=".txt"
           	timestamp="true"
           	docBase="/usr/local/GUS/gus_home/webapp/logs" />
	</Context> 
    
</Host>

In /$YOUR_PATH/jakarta-tomcat-x.x.xx/conf/tomcat-users.xml add the following line;

<user username="admin" password="admin" roles="manager,role1,tomcat"/>
</tomcat-users>

Inside /$GUS_HOME create a directory named /webapp, and inside it, create a directory named /logs;
Still in /$GUS_HOME create an archive named webapp.conf, using the following command line:

$ echo “webappTargetDir=/usr/local/GUS/project_home/webapp/” >webapp.conf

Then, build WDKToySite:

$ build WDKToySite webinstall -append -webPropFile webapp.conf

Inside $GUS_HOME/webapp execute the following commands:

$ cd WEB-INF
$ cp web.xml.toy web.xml
$ cd tags/site
$ cp footer.tag.toy footer.tag
$ cp header.tag.toy header.tag

Initialize TOMCAT:

# /$YOUR_PATH/jakarta-tomcat-x.x.xx/bin/startup.sh

Make sure port 8080 is accepting connections:

# /sbin/iptables -I INPUT -p tcp --dport 8080 -j ACCEPT

To test you WDK, click here. Or, open a browser and access http://localhost:8080/wdktoysite.

Plugins

As GUS Schema comes with several plugins, and different groups have different needs, the plugins you might need could not be listed below. Look in the mailing list for help with plugin not listed here, or email gusdev mailing list to get help from the community. The plugins are listed in the following order because some entries made by one plugin might be needed by the following. The descriptions are currently copied from the help from each plugin. Any value marked with a $ (as in $YOUR_DB_NAME) means this field should be filled by personal information from your project, or from yourself.

GUS::Supported::Plugin::LoadGusXml

DESCRIPTION

Insert data in from a simple GUS XML format into GUS.

	$ ga +create GUS::Supported::Plugin::LoadGusXml --commit

	$ ga GUS::Supported::Plugin::LoadGusXml --filename 
	/$YOUR_PATH/ExternalDatabase.xml --commit

	$ ga GUS::Supported::Plugin::LoadGusXml --filename 
	/$YOUR_PATH/ExternalDatabaseRls.xml --commit

	$ ga GUS::Supported::Plugin::LoadGusXml --filename 
	/$YOUR_PATH/SequenceType.xml --commit

The XML files can be found at http://www.gusdb.org/wiki/index.php/Bootstrap%20data You may map your own XML files, but that may require some experience, not to mention some knowledge about the GUS Schema. We suggest however that you add the following entries in ExternalDatabase.xml and ExternalDatabaseRls.xml:

  • gb (GenBank);
  • emb (EMBL Data Library);
  • dbj (DNA Database of Japan);
  • pir (NBRF PIR);
  • prf (Protein Reasearch Foundation);
  • sp (Swiss Prot);
  • pdb (Brookhaven Protein Data Bank);
  • pat (Patents);
  • bbs (GenInfo Backbone Id);
  • gnl;
  • ref (General Database Identifier);
  • lcl (Local Database Identifier);
  • genpept;
  • tpe;
  • tpd;
  • tpg.

This entries refer to several database repositories that are needed when LoadNRDB plugin is used (below). In the XML files provided by GUS Schema (*where? that' s a good question*), and the entries needed to it are marked by (nrDB). However a few entries are missing: genpept, lcl, tpd, tpe, tpg. There are two ways to solve this:

1) Edit ExternalDatabase and ExternalDatabaseRls.xml files. Both these files originals' hold several entries for external databases that you probably not use. So there's no real need to load them. What can be done is editing both these files so that will load only the databases you will actually use. Below is a model for both files:

This is the model from ExternalDatabase.xml

<SRes::ExternalDatabase>
  <external_database_id>$DATABASE_ID</external_database_id>
  <name>$DATABASE_NAME</name>
</SRes::ExternalDatabase>

Where $DATABASE_NAME stands for the name od the external database you might need, and the $DATABASE_ID, its number. The plugin GUS::Supported::Plugin:: InsertExternalDatabase (described below) will check for repeated external database id's, and will crash if it finds one.

The following XML structure refers to ExternalDatabaseRls.xml:

<SRes::ExternalDatabaseRelease>
  <external_database_id>$DATABASE_ID</external_database_id>
  <id_type>$ID_TYPE</id_type>
  <description>$DESCRIPTION</description>
  <version>$VERSION</version>
</SRes::ExternalDatabaseRelease>

Where $DATABASE_ID must be the same one present in ExternalDatabse.xml (or the plugin will crash); $ID_TYPE describes the type of entry (like accession for genbank entries, etc. *need to clarify this*), $DESCRIPTION where you can type a short description for your release, and $VERSION stands for the version of your release. Filling each one of these fields for each external database you will be using, and loading a XML file at once, with all the external database needed by your group might feel more handy than ...

2) Using InsertExternalDatabase and InsertExternalDatabaseRls to load your External Databases. Both these plugins exist to do the work of loading this information to the correct tables in GUS. However, if you use a lot of external databases, you might feel that loading all that information at once in a XML file (as described above) might feel a lot more handy than repeating a lot of command lines (both plugins described below). This plugins are useful however, if you need to load like 1 or 2 extra databases and don't want to bother mapping them on a XML file.

GUS::Supported::Plugin::InsertExternalDatabase

DESCRIPTION

Simple plugin that is the easiest way to create a row representing a new database that exists in the outside world that will be imported into GUS. This entry serves as a stable identifier across multiple releases of the database (which are stored in SRes.ExternalDatabaseRelease and point back to the entry created by this plugin). Protects against making multiple entries in GUS for an external database that already exists there. Creates a new entry in table SRes.ExternalDatabase to represent a new source of data imported into GUS.

	$ ga +create GUS::Supported::Plugin::InsertExternalDatabase --commit

 	
	$ ga GUS::Supported::Plugin::InsertExternalDatabase --name $EXT_DATABASE_NAME
    --commit

GUS::Supported::Plugin::InsertExternalDatabaseRls

DESCRIPTION

Simple plugin that is the easiest way to create a row representing a new release of a database from an external source. Protects against making an entry for a version that already exists for a particular database. Creates new entry in table SRes.ExternalDatabaseRelease for new external database versions.

 	$ ga +create GUS::Supported::Plugin::InsertExternalDatabaseRls --commit

	$ ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName
    $YOUR_DATABASE_NAME  --databaseVersion $VERSION --commit

GUS::Supported::Plugin::LoadTaxon

DESCRIPTION

Populate the SRes::Taxon, SRes::GeneticCode, and SRes::TaxonName tables using the names.dmp, gencode.dmp and merged.dmp files downloaded from NCBI. The NCBI taxonomy files can be found here.


	$ ga +create GUS::Supported::Plugin::LoadTaxon --commit

	$ ga GUS::Supported::Plugin::LoadTaxon --nodes /$YOUR_PATH/nodes.dmp 
        --names	/$YOUR_PATH/names.dmp --gencode/$YOUR_PATH/gencode.dmp --merged
 	/$YOUR_PATH/merged.dmp –commit


GUS::Supported::Plugin::LoadRow

DESCRIPTION

This plugin either inserts or updates a row in the database, depending upon whether the you provide a primary key as one of the attributes. If you do, that row will be read, updated with the attribute values you supply, and written. Otherwise, a new row will be inserted.

	$ ga +create GUS::Supported::Plugin::LoadRow –-commit

	$ ga GUS::Supported::Plugin::LoadRow --tablename Core::UserInfo 
	--attrlist "login,password,first_name,last_name,e_mail" --valuelist 
	$A_LOGIN_HERE^^^$A_PASSWORD_HERE^^^$YOUR_1ST_NAME^^^$YOUR_LAST_NAME^^^
	$A_EMAIL  --commit

	$ ga GUS::Supported::Plugin::LoadRow --tablename Core::GroupInfo 
	--attrlist name --valuelist $A_GROUP_NAME --commit

If you want to test, you may use our XML file below:

        
	$ echo "<Core::ProjectInfo>
          <name>test</name>
                <description>No projects were hurt or killed during this 				test</description>
        </Core::ProjectInfo>
        //" >testproject.xml

Using the following command:

	$ ga GUS::Supported::Plugin::LoadGusXml --filename testproject.xml 
	--commit

GUS::Supported::Plugin::InsertSequenceOntologyOBO

DESCRIPTION

Extracts the id, name, and def fields from a so.obo file and inserts new entries into into the SRes.SequenceOntology table in the form so_id, ontology_name, so_version, so_cvs_version, term_name, definition. The plugin InsertSequenceOntologyOBO came to substitute the old InsertSequenceOntology, because the old .definition format became out-dated. The new plugin uses .obo format, found here.

The following change in the database has to be made in order to sequence ontology be properly loaded:

	gustest=# ALTER TABLE sres.sequenceontology ALTER definition TYPE  
	character varying(2000);

Also, several primary keys constraints have to be dropped for the proccess to work properly. The primary keys are located in sres.goevidencecode, sres.gorelationship and sres.go_relationshiptype, go.synonym, and sres.goterm tables. Following is an example, using sres.gorelationship table:

 
gustest=# \d sres.gorelationship
                    Table "sres.gorelationship"
         Column          |            Type             | Modifiers
-------------------------+-----------------------------+-----------
 go_relationship_id      | numeric(10,0)               | not null
 parent_term_id          | numeric(10,0)               | not null
 child_term_id           | numeric(10,0)               | not null
 go_relationship_type_id | numeric(10,0)               | not null
 modification_date       | timestamp without time zone | not null
 user_read               | numeric(1,0)                | not null
 user_write              | numeric(1,0)                | not null
 group_read              | numeric(1,0)                | not null
 group_write             | numeric(1,0)                | not null
 other_read              | numeric(1,0)                | not null
 other_write             | numeric(1,0)                | not null
 row_user_id             | numeric(12,0)               | not null
 row_group_id            | numeric(4,0)                | not null
 row_project_id          | numeric(4,0)                | not null
 row_alg_invocation_id   | numeric(12,0)               | not null
Indexes:
    "gorelationship_pk" PRIMARY KEY, btree (go_relationship_id)
    "gorelationship_ind01" btree (parent_term_id, child_term_id)
    "gorelationship_ind02" btree (child_term_id, parent_term_id)
    "gorelationship_ind03" btree (go_relationship_type_id)
Foreign-key constraints:
    "gorelationship_fk02" FOREIGN KEY (child_term_id) REFERENCES sres.goterm(go_term_id)
    "gorelationship_fk03" FOREIGN KEY (go_relationship_type_id) REFERENCES
 sres.gorelationshiptype(go_relationship_type_id)
    "gorelationship_fk01" FOREIGN KEY (parent_term_id) REFERENCES sres.goterm(go_term_id)

The FOREIGN KEYS should be dropped, with the following command (i.e., dropping gorelationship_fk02):

gustest=# alter table sres.gorelationship drop constraint gorelationship_fk02;

These are the commands for InsertSequenceOntologyOBO:

	$ ga +create GUS::Supported::Plugin::InsertSequenceOntologyOBO --commit

	$ ga GUS::Supported::Plugin::InsertSequenceOntologyOBO --inputFile 
	/home/thiago/work/GO/so.definition --soCvsVersion '1.42' --soVersion '1.419' 
	--user $USER --group $GROUP --commit

GUS::Community::Plugin::GBParser

DESCRIPTION

Module used to parse GenBank records into the DB. Replaced by InsertSequenceFeatures.

	$ ga +create GUS::Community::Plugin::GBParser --commit

	$ ga GUS::Community::Plugin::GBParser --file 
	$YOUR_PATH/$ANY_FILE.gb --db_rel_id 22  --gbRel '148.0' 
	--commit

Where db_rel_id refers to column “external_database_id” in sres.externaldatabse table, and gbRel is the Genbank Release number found at NCBI FTP. GBParser is the old plugin used by GUS to upload sequences in GenBank format in its tables. The GBParser plugin is no longer supported by GUS community because it can no longer handles all features present in Genbank format. Its successor plugin, InsertSequenceFeatures, handles features using a XML to map them.

GUS::Supported::Plugin::LoadFastaSequences

DESCRIPTION

Insert or update sequences from a FASTA file or as set of FASTA files. A set of regular expressions provided on the command line extract from the definition lines of the input sequences various information to stuff into the database.

	$ ga +create GUS::Supported::Plugin::LoadFastaSequences --commit

	$ ga GUS::Supported::Plugin::LoadFastaSequences  --externalDatabaseName 
	'$YOUR_EXT_DB_NAME' --externalDatabaseVersion '$YOUR_EXT_DB_VERSION'
      	-–regexSourceId 'gb\|(\w+)' --sequenceFile /$YOUR_PATH/$YOUR_SEQUENCE_FILE.fasta 
	--regexChromosome 'chromosome\s(\w+)' --regexDesc '\|\s(.*$)'
	--sequenceTypeName 'DNA' --tableName 'DoTS::ExternalNASequence' --regexSeqVersion 
	'gb\|\w+\.(\d+)' --commit

GUS::Supported::Plugin::LoadNRDB

DESCRIPTION

Plug_in that inserts and updates rows in ExternalAASequence and NRDBEntry using data in the nr protein files downloaded from NCBI.

	$ ga GUS::Supported::Plugin::LoadNRDB --externalDatabaseName "$YOUR_NR_NAME"
 	--externalDatabaseVersion "$NR_VERSION" --gitax gi_taxid_prot.dmp --nrdbFile
        /$YOUR_PATH/$YOUR_NR_FILE --sourceDB "$YOUR_DATABASE_ID:$YOUR_DATABASE_NAME" 
        --tempLogin $YOUR_LOGIN --tempPassword $YOUR_PASSWD --dbiStr 'dbi:Pg:dbname=$YOUR_DB_NAME'
        --maketemp --restartTempTable --plugin --verbose --commit

Note: nr database needs a external database reference, that should be created with InsertExternalDatabase plugin or loaded in ExternalDatabase.xml (as described before). The several databases that integrate nr also need external database references. They are: gb (GenBank), emb (EMBL Data Library), dbj (DNA Database of Japan), pir (NBRF PIR), prf (Protein Reasearch Foundation), sp (Swiss Prot), pdb (Brookhaven Protein Data Bank), pat (Patents), bbs (GenInfo Backbone Id), gnl, ref (General Database Identifier) , lcl (Local Database Identifier), genpept, tpe, tpd, tpg (*the last ones, except for genpept I have no clue what are they*).

GUS::Supported::Plugin::InsertSequenceFeatures

DESCRIPTION

Insert files containing NA sequence and features, that are in a format handled by bioperl's Bio::SeqIO parser package (eg, genbank, embl,TIGR). See that package for more details about what the bioperl parser can handle. Also supports files in GFF2 and GFF3 format. (Note that, like all plugins that start with Insert, this plugin only inserts rows and never updates them. This means that the plugin cannot be used to update an existing set of sequences in the database. It assumes a database management strategy that rebuilds from scratch the set of sequences instead of attempting the much more difficult task of updating. This strategy will work for most datasets, but, not for huge ones such as the human genome.) The sequence level attributes that are currently supported are: taxon, SO assignment, comments, keywords, secondary accessions and references. The sequence may be absent from the input file. But, in that case it must have been loaded into GUS already. (The seqIdColumn argument indicates which column of the subclass specified by the naSequenceSubclass argument should be matched against the input to locate the proper sequence in the database.) This plugin is designed to be flexible in its mapping from the features and qualifiers found in the input to the tables in GUS. The flexibility is specified in the "XML mapping file" provided on the command line (--mapFile). A default mapping is provided as a reference (see $GUS_HOME/config/genbank2gus.xml). That file provides a mapping for the complete genbank feature table. It is often the case that your input may come from unofficial sources that, while conforming to one of the bioperl supported formats, invent features or qualifiers, and stuff unexpected values into already existing qualifiers. In that case, you will need to write your own mapping file. Use the provided default as an example. The main purpose of the file is to specify which subclass of NAFeature should store the input feature, and, which columns should store which qualifiers. Another way to generate a template mapping file is by using the command reportFeatureQualifiers. This command analyzes a set of input files, and reports the feature/qualifier structure found in them. It can output the report in simple text form, or in XML that is a template for the mapping file. There is an additional level of configurability provided by "plugable" qualifier handlers. As you will see in the default XML file, some qualifiers do not fit neatly into a column in the feature table. Those that don't are called "special cases." In the XML file you can declare the availability of one or more special case qualifier handlers to handle those special cases The default file declares one of these at the top of the file. Your XML file can declare additional handlers that you write. (See the code in the default handler to learn how to write your own.) In the XML file, qualifiers that need special handling specify the name of the handler object and a method in it to call to handle the qualifier. You can also use a qualifier handler to modify or even ignore the feature that the qualifier belongs to. An argument to the handler is $feature, so if you need to modify it directly you can. Also, if the handler returns undef, that is a signal to force the entire feature to be ignored. An advanced use of the handler exploits the fact that the handler method is an instance method on a handler object that lives throughout the plugin run. So, you can store state in the handler object during one call to a handler method, and that state will be available to other calls. This is a way to combine data from more than one qualifier.

	$ ga +create GUS::Supported::Plugin::InsertSequenceFeatures --commit

	$ ga GUS::Supported::Plugin::InsertSequenceFeatures --mapfile
	$YOUR_PATH/genbank2gus.xml --seqFile YOUR_GB_SEQUENCE --fileFormat $FILE_FORMAT 
        --extDbName "$EXT_DB_NAME" --extDbRlsVer "$EXT_DB_RLS_VERSION" –soCvsVersion 1.41

GUS::Supported::Plugin::InsertBlastSimilarities

This plugin Load blast results from a condensed file format into the DoTS.Similarity table. It was a common practice in several GUS user groups to first load all sequences into GUS, and only then download them into a fasta file for blasting. When sequences are loaded into the database, they receive a gus_id (na_sequence_id, aa_sequence_id), that is a primary key to a table in the GUS schema. The fasta file dumped from the database ($GUS_HOME/bin/dumpSequencesFromTable.pl) would have these ids and only these ids in the defline (the header of the fasta file). A more modern approach comprises the use of source_id (e.g. Genbank accession) as the main identifier, and regular expressions to extract those from the deflines.

ga GUS::Supported::Plugin::InsertBlastSimilarities --file file
    --subjectTable tablename [--subjectTableSrcIdCol string]
    [--subjectExtDbRlsVer string] [--subjectExtDbName string] --queryTable
    tablename [--queryTableSrcIdCol string] [--queryExtDbRlsVer string]
    [--queryExtDbName string] [--batchSize integer] [--noHSPs ]
    [--testnumber integer] [--restartAlgInvs string-list] [--subjectsLimit
    integer] [--hspsLimit integer] [--minSubjects integer] [--maxSubjects
    integer] [--subjectPvalue float] [--subjectPctIdent float]
    [--subjectMatchLength integer] [--hspPvalue float] [--hspPctIdent float]
    [--hspMatchLength integer] [--commit ] [--debug ] [--sqlVerbose ]
    [--verbose ] [--veryVerbose ] [--user string] [--group string]
    [--project string] [--comment string] [--algoinvo integer]
    [--gusconfigfile file]

The required commandline arguments are the following:

 --file *file* (Required)

        Input file containing BLAST results in condensed form

        format: The format produced by the blastSimilarities command (a line
        describing the query, one line per Subject hit, followed by one for
        each HSP)

This "BLAST condensed" file should look like this:

            >97336279 (2 subjects)
               Sum: smart00177:270:2e-24:1:72:63:278:1:72:49:54:0:+3
                HSP1: smart00177:49:54:72:270:2e-24:1:72:63:278:0:+3
               Sum: pfam00025:85:4e-17:1:70:72:278:1:70:41:51:0:+3
                HSP1: pfam00025:41:51:70:85:4e-17:1:70:72:278:0:+3
            >97336342 (0 subjects)

We represented each value, and its meaning, separately for better visualization:

          >97336344 (1 subjects)          ->              >querySourceId (# subjects)

          Sum:                            ->              Sum:

           COG5096:                       ->                subjectSourceId:
           48:                            ->                score:
           4e-06:                         ->                pValue:
           65:                            ->                minSubjectStart:
           99:                            ->                maxSubjectEnd:
           257:                           ->                minQueryStart:
           361:                           ->                maxQueryEnd:
           1:                             ->                numberOfMatches:
           35:                            ->                totalMatchLength:
           20:                            ->                numberIdentical:                 
           28:                            ->                numberPositive:
           0:                             ->                isReversed: 
           +2                             ->                readingFrame

The same follows for HSP:

         HSP1:                            ->               HSP:
          
          COG5096:                        ->                subjectSourceId:
          20:                             ->                numberIdentical:            
          28:                             ->                numberPositive:
          35:                             ->                matchLength:
          48:                             ->                score:
          4e-06:                          ->                pvalue:
          65:                             ->                subjectStart:
          99:                             ->                subjectEnd:
          257:                            ->                queryStart:
          361:                            ->                queryEnd:
          0:                              ->                isReversed:
          +2                              ->                readingFrame   

This condensed file may be generated from default BLAST output files using the script parseBlastFilesForSimilarity.pl, that can be found in $GUS_HOME/bin.

An example follows:

echo /home/john_doe/random_blast_run.blast | parseBlastFilesForSimilarity.pl --regex="(\S+).*" 
--outputFile random_blast_run.condensed

The regex arguments aims to extract the source_ids from the Blast output file.

 --subjectTable *tablename* (Required)

        Subjects are taken from this table (schema::table for mat, eg,
        DoTS::ExternalNaSequence

This command tells the script in which table are the subject sequences inside GUS schema - assuming you already have inserted it - either with InsertSequenceFeatures or LoadFastaSequences.

 --queryTable *tablename* (Required)

        Queries are taken from this table (schema::table form at, eg,
        DoTS::ExternalNaSequence)

This command tells the script in which table are the query sequences inside GUS schema - which, again, should be inserted previously, either with InsertSequenceFeatures or LoadFastaSequences.

The subjectTable and queryTable arguments indicate which tables the sequences are in, respectively. (This is because the Similarity table forms similarities between table/rows in GUS; therefore, the plugin must specify which tables the sequences are in.) The plugin assumes all query sequences to be in the same table, which might be different from the table where all the subject sequences must be stored.

GUS::GOPredict::Plugin::LoadGoOntology

DESCRIPTION

This plugin populates the tables sres.GoTerm, sres.GoRelationShip and sres.GoSynonym. At first, is necessary to download GO ontology data from NCBI site found here and save them in /usr/local/src/gus-related/ontology/. The files needed are:

   * process.ontology
   * component.ontology
   * function.ontology

Register the LoadGoOntology plugin:

$ ga +create GUS::GOPredict::Plugin::LoadGoOntology --commit

Add data at sres.GoRelationShipType by creating a XML archive with the code:

<GUS::Model::SRes::GORelationshipType>
  <go_relationship_type_id>1</go_relationship_type_id>
  <name>isa</name>
</GUS::Model::SRes::GORelationshipType>

<GUS::Model::SRes::GORelationshipType>
  <go_relationship_type_id>2</go_relationship_type_id>
  <name>partof</name>
</GUS::Model::SRes::GORelationshipType>

Finally, upload the ontology data with command:

ga GUS::GOPredict::Plugin::LoadGoOntology --file_path=/usr/local/src/gus-related/ontology/ \
                                          --process_db_id=3003 \
                                          --process_ext_db_rel=165 \
                                          --function_db_id=3001 \
                                          --function_ext_db_rel=164 \
                                          --component_db_id=3002 \
                                          --component_ext_db_rel=166 \
                                          --commit \
                                          >& /home/apps/GUS/dev/uploads/go-ontology-upload.out

Creating your own plugin

Minha sugestão inicial é que você dê uma olhada nos plugins LoadDepreciatedFeatureIds e GenerateKeywords. O primeiro é mais simples e o segundo tem umas funções que podem ser úteis.

Estão em: /home/pablo1/GUS/project_home/TcruziDBData/Load/plugin/perl/LoadDepreciatedFeatureIds.pm /home/pablo1/GUS/project_home/BiowebDB/Load/plugin/perl/GenerateKeywords.pm

Estrutura

O plugin que você vai escrever tem a mesma estrutura.

  • Começa com uma definição de pacote:
    package BiowebDB::Load::Plugin::GenerateKeywords;
  • Usa a superclasse
    Plugin: use GUS::PluginMgr::Plugin;
  • E inclui módulos que provêem acesso a cada tabela do GUS que você vai usar:
    use GUS::Model::DoTS::IndexWord;
  • A parte "plugin global variables" fica parecida. É ali que fica a documentação do plugin que vai ser lida pelo GUS.
  • A variável $argsDeclaration possui a declaração dos argumentos que vamos obter via linha de comando.
  • O método "new" é padrão, não precisa mexer.
  • E o método "run" é onde fica o código do plugin em si. É boa prática, claro, criar procedimentos (subs) para ficar mais fácil de testar e ler o código. Você provavelmente já sabe tudo isso.

Exemplo

Vamos descrever o problema através de um exemplo, que vai mapear identificadores alternativos ao identificador principal do nosso banco (source_id).

  1. Parâmetros de entrada:
mapFile (um arquivo "tab-delimited" com source_id, alternative_id)
sourceExtDbName (nome do external database de onde os source_ids vieram. exemplo: pfa_gb)
sourceExtDbRlsVer (versão do banco acima)
altExtDbName (nome do external database de onde os alternative_ids vieram. esse vai ter que ser criado. exemplo: plasmo_db)
altExtDbRlsVer (versão do banco acima)
  1. ler $mapFile, criando um hash do tipo %idMap{$source_id} = @alternative_ids
  2. para cada $source_id, usar retrieveFromDB() pra pegar um registro em DoTS.IdentifierGroup caso ele exista, ou criar um caso não exista
my $iw = GUS::Model::DoTS::IdentifierGroup->new({'source_id' => $source_id,}); 
$iw->retrieveFromDB();
  1. inserir um registro em DoTS.Identifier para cada $alternative_id in @alternative_ids, com o devido mapeamento para DoTS.IdentifierGroup
  2. emitir mensagem de sucesso ou falha, com a contagem de registros inseridos

Como instalar e usar o novo plugin

Rode a instalacao para copiá-lo para o $GUS_HOME

build TcruziDBData/Load install -append

Mapeamentos: ApiDB, TcruziDB and ProtozoaDB

O proposito do plugin InsertExternalDatabaseLink é carregar na tabela sres.externaldatabaselink a informacao necesaria para mapear nossas sequencias para sequencias de bancos externos e poder fazer um link ao url do Banco externo correspondente com a informacao da coluna id_url da tabela sres.externaldatabaserelease

1-) No banco atual nao existem entradas para os Bancos: GeneDB_Tcruzi, TcruziDB, PlasmoDB, Supfamily.

Então será precisso carrega-los nas tabelas sres.externaldatabase e sres.externaldatabaserelease

O banco GeneDB_Pfalciparum já esta carregado no sres.externaldatabase entao so falta carregar a entrada correspondente em sres.externaldatabaserelease

Para fazer isso utilize os plugins do GUS InsertExternalDatabase e InsertExternalDatabaseRls

Exemplo:

ga GUS::Supported::Plugin::InsertExternalDatabase --name GeneDB_Tcruzi  --commit

ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName GeneDB_Tcruzi --databaseVersion 2.1 
--idUrl 'http://www.genedb.org/genedb/Search?submit=Search+for&organism=tcruzi&desc=yes&wildcard=yes&name=' --commit
 
ga GUS::Supported::Plugin::InsertExternalDatabase --name TcruziDB --commit

ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName TcruziDB --databaseVersion 5.1 
--idUrl 'http://tcruzidb.org/tcruzidb/showRecord.do?name=GeneRecordClasses.GeneRecordClass&primary_key=' --commit
 
ga GUS::Supported::Plugin::InsertExternalDatabase --name PlasmoDB --commit

ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName PlasmoDB --databaseVersion 5.3 
--idUrl 'http://www.plasmodb.org/plasmo/showRecord.do?name=GeneRecordClasses.GeneRecordClass&primary_key=' --commit
 
ga GUS::Supported::Plugin::InsertExternalDatabase --name Supfam_pfal --commit

ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName Supfam_pfal --databaseVersion 1.69 
--idUrl 'http://www.supfam.org/SUPERFAMILY/cgi-bin/gene.cgi?genome=pl;seqid=' --commit

ga GUS::Supported::Plugin::InsertExternalDatabase --name Supfam_tcruzi --commit

ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName Supfam_tcruzi --databaseVersion 1.69 
--idUrl 'http://www.supfam.org/SUPERFAMILY/cgi-bin/gene.cgi?genome=uz;seqid=' --commit

ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName GeneDB_Pfalciparum --databaseVersion 2.1 
--idUrl 'http://www.genedb.org/genedb/Search?submit=Search+for&organism=malaria&desc=yes&wildcard=yes&name=' --commit

2-) Agora podemos rodar o plugin que carrega a tabela sres.externaldatabaselink

Definicao do plugin

$ ga TcruziDBData::Load::Plugin::InsertExternalDatabaseLink --mapFile $MAPFILE  --sourceExtDbName 

$DATABASENAME --extDbRlsVer 

$VERSIONEXTERNALDB --tableName 

$TABLENAME --commit

$MAPFILE um arquivo "tab-delimited" que contem os dados que mapeam o source_id com o identificador do banco de dados externo. $DATABASENAME nome do banco de dados externo. $VERSIONEXTERNALDB versao do banco de dados externo mencionado arriba. $TABLENAME nome da tabela de onde proviene o sequence_id gerado por GUS (ex. dots.TranslatedAASequence)

Exemplo:

ga TcruziDBData::Load::Plugin::InsertExternalDatabaseLink --mapFile pfal_accession_mapping.txt --sourceExtDbName GeneDB_Pfalciparum --extDbRlsVer 2.1 
--tableName TranslatedAASequence --commit

ga TcruziDBData::Load::Plugin::InsertExternalDatabaseLink --mapFile tcruzi_accession_mapping.txt --sourceExtDbName GeneDB_Tcruzi --extDbRlsVer 2.1 
--tableName TranslatedAASequence --commit

Detalhe da tabela ExternalDatabaseLink

A tabela SRes.ExternalDatabaseLink contem informação para mapear as sequencias carregadas no banco de dados ProtozaDB com outro bancos de dados externos (como Superfamily, GeneDB, TcruziDB, PlasmoDB)

As colunas estão definidas como segue:

link_table_id  => Id da tabela referenciada para obter nosso sequencesId (ex. Id de dots.traslatedaasequence)
link_primary_id => Id das sequencias  de nosso  Banco (ex. coluna dots.translatedaasequence.aa_sequence_id)
external_database_release_id => foreign key que referencia sres.externaldatabaserelease
external_primary_identifier => identifier das sequencias no banco externo (ex. para tcr_gb un exemplo seria Tc00.1047053506173.10)
external_secondary_identifier => accession number de GenBank (ex. dots.translatedaasequence.source_id)

Mapeamentos: Cdhits

O proposito do plugin InsertCdhit é carregar na tabela sres.externaldatabaselink o resultado do cdhit, ou seja o porcentagem de identidade de proteinas agrupadas por cluster

1-)Inserir na tabela sres.externaldatabase uma entrada para cada cdhit executado. Esta denominação deve describir o nome do banco e a porcentagem de identidade (ex. tcruzi100) e sua correspondente entrada no sres.externaldatabaserelease

Exemplo:

ga GUS::Supported::Plugin::InsertExternalDatabase --name tcruzi100 --commit
ga GUS::Supported::Plugin::InsertExternalDatabaseRls --databaseName tcruzi100 --databaseVersion 1.0 --commit

2-)Rodar o plugin InsertCdhit com os seguintes parametros ga TcruziDBData::Load::Plugin::InsertCdhit --mapFile $MAPFILE --sourceExtDbName $DATABASENAME --extDbRlsVer $DATABASEVERSION --tableName $TABLENAME --commit

Exemplo:

ga TcruziDBData::Load::Plugin::InsertCdhit --mapFile ptnas_GUS_gb_nr.100.fasta.clstr --sourceExtDbName tcruzi100 --extDbRlsVer 1.0 --tableName 
TranslatedAASequence --commit

Detalhe dos dados da tabela sres.externaldatabaselink

link_table_id  => Id da tabela referenciada para obter nosso sequencesId (ex. Id de dots.traslatedaasequence)
link_primary_id => Id das sequencias  de nosso  Banco (ex. coluna dots.translatedaasequence.aa_sequence_id)
external_database_release_id => foreign key que referencia sres.externaldatabaserelease
external_primary_identifier => accession number de GenBank (ex. dots.translatedaasequence.source_id = XP_843163.1)
external_secondary_identifier => numero de cluster ao qual pertence a sequencia.

query para obter o agrupamento por numero de cluster

Exemplo:

select edbl.external_primary_identifier from sres.externaldatabaselink edbl, sres.externaldatabaserelease edbr, sres.externaldatabase edb where 
edb.name = 'tcruzi100' AND edb.external_database_id = edbr.external_database_id AND edbr.external_database_release_id = edbl.external_database_release_id 
AND edbl.external_secondary_identifier = 0;

 external_primary_identifier
-----------------------------
 AAZ14281.1
 XP_843163.1
(2 rows)

XML mappings to load GenBank sequences

Extending GUS Schema

Refer to this older material: http://www.gusdb.org/documentation/3.5/developers/html/ch02.html


  • Create the tables, define primary keys and sequences that will generate unique numbers for the primary keys:
CREATE TABLE Dots.Identifier (
    identifier_id numeric(3,0) NOT NULL,
    alternative_id character varying(32)  NOT NULL,
    identifier_group_id numeric(3,0),
    external_database_release_id numeric(10,0),		-- will allow us to link back to the source
    modification_date timestamp without time zone NOT NULL,
    user_read numeric(1,0) NOT NULL,
    user_write numeric(1,0) NOT NULL,
    group_read numeric(1,0) NOT NULL,
    group_write numeric(1,0) NOT NULL,
    other_read numeric(1,0) NOT NULL,
    other_write numeric(1,0) NOT NULL,
    row_user_id numeric(12,0) NOT NULL,
    row_group_id numeric(4,0) NOT NULL,
    row_project_id numeric(4,0) NOT NULL,
    row_alg_invocation_id numeric(12,0) NOT NULL
);

CREATE TABLE Dots.IdentifierGroup (
    identifier_group_id numeric(3,0),
    source_id character varying(32),			-- preferred source_id for those IDs in the group
    modification_date timestamp without time zone NOT NULL,
    user_read numeric(1,0) NOT NULL,
    user_write numeric(1,0) NOT NULL,
    group_read numeric(1,0) NOT NULL,
    group_write numeric(1,0) NOT NULL,
    other_read numeric(1,0) NOT NULL,
    other_write numeric(1,0) NOT NULL,
    row_user_id numeric(12,0) NOT NULL,
    row_group_id numeric(4,0) NOT NULL,
    row_project_id numeric(4,0) NOT NULL,
    row_alg_invocation_id numeric(12,0) NOT NULL
);

alter table Dots.Identifier add primary key (identifier_id);
alter table Dots.IdentifierGroup add primary key (identifier_group_id);
alter table Dots.Identifier add constraint identifier_group_id foreign key (identifier_group_id) references Dots.IdentifierGroup (identifier_group_id);

CREATE SEQUENCE Dots.identifier_sq;
CREATE SEQUENCE Dots.identifiergroup_sq;

Make sure everything was created fine.

  • Describe the tables in the GUS Core Schema so that the installation is able to automatically generate the correct objects:
INSERT INTO Core.tableinfo VALUES              
(963, 'Identifier', 'Standard', 'IDENTIFIER_ID', 3, 1, 0, NULL,  
NULL, 1, '2007-09-12 10:52', 1,1,1,1,1,0,1,1,1,1);

INSERT INTO Core.tableinfo VALUES              
(964, 'IdentifierGroup', 'Standard', 'IDENTIFIER_GROUP_ID', 3, 1, 0, NULL,  
NULL, 1, '2007-09-12 10:55', 1,1,1,1,1,0,1,1,1,1);
  • Find where in the GUS 3.5 installation the schema installation instructions are found and add the information there. Antigamente, eles tinham um conjunto de arquivos SQL que criavam o esquema do GUS quando voce executava:

build GUS -installDBSchema

Não consegui achar os arquivos SQL, então meu palpite é que agora eles usam um XML encontrado aqui:

/home/pablo1/GUS/project_home/GusSchema/Definition/config

ATENÇÃO: Se esse arquivo não for modificado, é capaz que na próxima vez que instalarmos o GUS, as tabelas não serão criadas e, portanto, o plugin não funcionará.

  • Run the gus installation again:
build GUS install -append
  • Make sure the objects were created in the respective directory:
/home/pablo1/GUS/project_home/GUS/Model/lib/perl/DoTS

Quick Guide to Some Common Queries on GUS

Some queries are executed frequently and are worth being kept in a handy place.

List all external data sources

SELECT d.name, r.version, r.external_database_release_id
FROM Sres.ExternalDatabase d, Sres.ExternalDatabaseRelease r
WHERE d.external_database_id = r.external_database_id

Find a protein by accession number

SELECT tas.source_id, tas.aa_sequence_id
FROM
Dots.TranslatedAASequence tas,
Sres.ExternalDatabaserelease r,
Sres.ExternalDatabase d
WHERE
  d.name = 'Pfalciparum'
  AND r.version = '159.t'
  AND d.external_database_id = r.external_database_id
  AND r.external_database_release_id = tas.external_database_release_id
  AND tas.source_id LIKE '%001351080%'


Get Genes that Have a Translation in the Database

SELECT distinct gf.source_id
FROM dots.ExternalNaSequence enas,
     dots.GeneFeature gf,
     dots.Transcript t,
     dots.TranslatedAaFeature taf,
     dots.rnatype r,
     core.TableInfo ti
WHERE enas.na_sequence_id = gf.na_sequence_id
     AND gf.na_feature_id = r.parent_id
     AND r.na_feature_id = t.parent_id
     AND t.na_feature_id = taf.na_feature_id
     AND ti.name = 'TranslatedAASequence'
ORDER BY gf.source_id 

Mapping of Used Tables

ProtozoaDB_used_tables

ProtozoaDB

ProtozoaDB

Retrieved from "/index.php/Gus"
IAEA-AGB
Personal tools