TROPLIST: TROPLIST summary of: Xeno Affy data ortholog conversions
Kristen Kroll
kkroll@wustl.edu
Thu Nov 8 19:46:12 EST 2007
Here is a summary of replies to this query that others may also find
helpful. This one is currently complete on this query. Since it is fairly
long, I'll send summaries of the other queries as separate emails. In
future, I think I will separate them as posts initially to simplify things.
> 1. Large scale Xenopus gene-list correlation to gene symbols for mammalian
> orthologs/nearest homolog.
>
> Does anyone know of a program that can do a large scale correlation of
> Xenopus laevis or tropicalis Unigene #s (or other Xenopus gene IDs, Affy
> tags) to the official gene symbol of their mammalian ortholog or nearest
> homolog?
>
> It would seem this should be SO simple to do, but...The annotation available
> through Affymetrix for their Xenopus laevis arrays has improved, but it does
> not facilitate defining gene symbols of mouse/human orthologs for Xenopus
> genes on a large scale. I have spoken with Affy tech support, and they say
> that the consortia arrays (like the laevis arrays) do not yet have this
> feature, and using their site/annotation queries engine we were not able to
> find a way to look this up.
>
> The many other "gene/ortholog conversion" tools I find on the web will not
> accept Xenopus genelists, or (for programs like DAVID Bioinformatics), will
> only convert one Xenopus gene ID to a different one, but not to the
> ortholog. It is frustrating, since this information exists and must be
> built into sites like Metazome or XGI, but there does not seem to be a web
> tool to batch search and bulk extract it, unless I have just missed finding
> this.
>
> For single genes (or small lists) one can of course use the Unigene database
> (NCBI), clicking on ortholog links for mouse/human for our Xenopus genes
> (where these exist) and collecting the gene symbol for the ortholog. This
> "gene-by-gene" approach is impractical for large datasets however, and it
> seems there must be a web-based tool that will this information from the
> appropriate databases, and match it to the Xenopus Affytags/unigene#s, but I
> have been unable to find one...if anyone has information or suggestions, I'd
> greatly appreciate it...
Hi Kris,
1. Large scale Xenopus gene list correlations - I think we ran into the
same issues that you mentioned and one of our bioinformatics guys wrote a
script to search for orthologs based on the Xenopus laevis probe set ID.
Copy and paste the list of probe set IDs, one per line (copy and paste from
an excel file makes this easy). It will give you the Affy data plus will
give a list of the orthologs, percent ID etc - take a look and see if this
might be helpful for you. Let me know if you have any questions or if you
run into any problems. The guy who wrote it is out today but if you have
any suggestions for making it better we could incorporate them into new
versions. You should be able to export the entire data set to an excel file
(link in the upper right side of page). This script will give you all
orthologs - from yeast, plants thru mammals.
The address to get to this is:
http://www.hartwellcenter.org/hcnetdat/webFront/searchFrogOrtholog.php
2. we have not tried insulators as we are doing enhancer traps so are
looking for exactly what you don't want!
3. we have run into the same issues with "off the rack frogs" ourselves -
we tend to raise all our own now. We have also found the store bought males
to be less than useful as well. You may want to try NASCO females and use
"in house" males to fertilize. We do all of our injections using natural
matings these days as it saves on good males!
Hope this helps,
best regards,
Paul
> Paul E. Mead, Ph.D.
> Hi Paul,
> I¹ve used one of our datasets this evening to see what I could get from the
> ortholog converter. It generated putative ortholog information for the entire
> dataset I put in as a test, except for some of the ³transcribed loci² where
> we¹ve had no success in manual searches for othologs either (I think some of
> those are misannotations and will fall out of the Xenopus gene collection over
> time). As suggestions for improving the versatility of the converter and
> making for less that needs to be done manually:
>
> 1. The exported data would be more easily manipulated if one could export
> ortholog data for one selected species only into a table. I found that having
> information for all of the species in the same column (and 4-5 ortholog entry
> lines in excel associated with one Xl unigene number) meant that I had to move
> some lines to match up the Xenopus gene information with the ortholog set.
> Then I had to use autofill for each Xenopus Affy ID to fill the blank lines so
> there was a 1:1 match between the Xenopus affy tag and each species ortholog.
> This was needed so that I could sort the excel spreadsheet dataset by species
> and run each species¹ accession numbers through a gene conversion program to
> get corresponding gene symbols/unigene IDs for the mouse/human genes for input
> into the pathway analysis/functional clustering programs we are using, since
> these programs did not seem able to work with the multi-species list of
> accession numbers the converter generated as input.
>
> 2. It would be useful to add the Official Gene Symbol (and perhaps also the
> unigene ID) next to the accession number column for each of the orthologs.
> This information would standardize that information so that it can be input
> into functional analysis software like Ingenuity Pathways Analysis (IPA) suite
> without further manipulation. The accession numbers the ortholog finder
> currently provides do allow one to derive this information, but it requires a
> little work:
>
> (a) The ³tag² in front of the accession number (³pir:² or ³ref:²) that the
> ortholog converter gives confused the gene conversion programs I tested. To
> get around this, I pasted the list into word and used the replace feature to
> replace the numbers with their untagged counterparts. So it would be good to
> eliminate this tag from the accession numbers or to put it in a separate
> column.
>
> (b) The accession numbers the ortholog finder pulls are a mix of protein
> database IDs (pir: and sp:) and nucleic acid IDs (ref:). This directly
> reflects the way these links/data currently exist in the Unigene database I
> think. However, putting in both pir and ref types of data in a single
> genelist confuses some of the annotation programs, especially when input data
> is from from multiple species (as the excel download feature on the ortholog
> converter currently organizes orthologs from multiple species into a single
> table column). All of the ³pir² protein entries defined as orthologs where I
> linked into NCBI claim to have been removed or replaced, so I think many of
> those numbers may no longer be present in the NCBI protein database, and would
> need to find a current identifier. IPA appears to have disregarded all of the
> ³pir² accession numbers on my test run and probably that is why.
>
> What I was able to do: obtain gene symbols and other information for almost
> all of the accession numbers the ortholog converter assigned by using DAVID:
> the DAVID converter considered the pir numbers ²ambiguous² and managed to
> assign them information. DAVID allows input of multiple types of ID
> simultaneously and it managed to assign gene symbols and additional
> identifying information to >95% of the accession numbers I submitted. So at
> least some converters can work with the ³pir² protein type IDs as well as the
> range of accession # types provided by the ortholog converter. To get
> meaningful data from this required that I presort the ortholog data by species
> and only input accession numbers for one species though, and this required
> manipulating the exported table data in excel, as described above.
>
> So if it were possible to add Gene Symbol and/or Unigene information for each
> ortholog (in addition to the accession numbers) and to allow down-load of
> orthologs for a single species as a table, those two changes would improve the
> versatility of this program and make for less human manipulation to get the
> data. Even without these changes, I¹m very pleased that it was possible to
> start with a large laevis Affy dataset with this and end up with fairly
> complete/comprehensive ortholog information in various species, even if it did
> take some additional manipulations.
>
> On the frogswe¹d been setting up the matings much as you say below, several
> possible pairs to a bin, but during the day. The bins are on a black
> countertop, but no doubt we disturbed them quite a bit going in and out doing
> IVF also. Maybe we should try seeing whether overnight natural matings go
> better and plan on using IVF to generate embryos for injection in the short
> term.
>
> If your bioinformatics gurus improve the converter or are not sure what I was
> getting at in the text above, perhaps you could put us in touch and I will try
> to give a clearer description. Thanks again for generating thatI suspect it
> will be useful to many people using the Xenopus affy arrays, so I can post our
> correspondence as a digest if you think it beneficial to spread the wordjust
> let me know if that is okay.
>
> Cheers,
> Kris
> Hi Kris,
>
> Thanks for the detailed e-mail. I agree that it would be could to sort the
> orthologs by species into separate columns - and perhaps add a pull down list
> upfront of which species you may be interested in (ie. only tetrapod
> orthologs, etc - similar to the choices available in the metazome search
> feature).
>
> The tag issue will be simple to resolve as well. We can easily add the
> unigene/gene symbol information as well. We will end up with lots of columns
> but we should be able to come up with a way of only displaying the data sets
> that are of interest to each user - some may only be interested in vertebrate
> orthologs and not want all the C. elegans etc species data as well.
>
> I will pass along these suggestions to the bioinformatics guys - I'll try and
> meet with them later today. I have a grant deadline in two weeks but will try
> and get on with the changes to the converter.
>
> Our chip experiments have been on hold a bit until the new laevis chips come
> out - it is not like we did not get plenty of data from the first version
> chips but as we were planning some other experiments I decided to wait for the
> version 2.0 chips to come out.
>
> Sure, go ahead and post a synopsis of our chat on the troplist - always good
> to spread the word.
>
> Best regards,
>
> Paul
>
Kris comments/Note: If others want to try this, and add their comments,
that would be great. Also,
For those doing trop arrays, I did notice that the g:Convert site claims to
convert data from tropicalis to various other orthologs, so this may be
useful for some people also. Weblink is:
http://www.bioinf.ebc.ee/gprofiler/gconvert.cgi
> Hi Kris,
> Paul Jorgensen from the Kirschner lab forwarded me your message. I think I
> can help you out with the first issue of mapping xenopus unigene IDs to the
> mammalian gene symbols. I would recommend fetching the information from
> Ensembl http://www.ensembl.org/index.html using Biomart. Here's what I was
> able to get:
>
> 1. mapping ensemble gene id's to Unigene's
> Ensembl Gene ID Unigene ID
> ENSXETG00000003219
> ENSXETG00000024160 Str.4741
> ENSXETG00000024163
> ENSXETG00000024161 Str.37520
> ENSXETG00000013019
> ENSXETG00000013022
> ENSXETG00000013028
> ENSXETG00000013030
> ENSXETG00000018803 Str.10811
> ENSXETG00000013035 Str.13817
>
> 2.mapping unigene id's to the external id's (that is, gene symbols)
> Ensembl Gene ID Mouse External ID Chicken External ID Chimp External ID
> ENSXETG00000003219
> ENSXETG00000024160 Pex11b PEX11B
> ENSXETG00000024163
> ENSXETG00000024161 Lmna LMNA
> ENSXETG00000013019 Cldn14 CLDN14 CLDN14
> ENSXETG00000013022 Sim2 SIM2 SIM2
> ENSXETG00000013028 Hlcs HLCS HLCS
> ENSXETG00000013030 Ripply3 DSCR6
> ENSXETG00000018803 Pigp PIGP XR_025247.1
> ENSXETG00000013035 Dscr3 DSCR3 DSCR3
>
> I'm happy to do the matching for you. I guess I would just need to which
> Unigene ID's or other identifiers you need information for and then it's
> just a matter of writing a few scripts. I'm currently collaborating with
> Paul and Marc on designing new arrays for tropicalis and laevis with the
> goal of comparing gene expression across development, so similar kinds of
> bioinformatics is running on our end and this is not a problem.
>
> Cheers,
> Itai
Itai Yanai <yanai@mcb.harvard.edu>
Hi Itai,
Thanks for the offer. Since placing my troplist request, I heard from Paul
Mead's bioinformatics guys also, who appear to have worked up a script.
Maybe I could put you all in touch, and you can see if there is an optimal
way to do this. I "test drove" his script w my dataset last night and tried
to describe the issues I ran into.
I will put my conversation w Paul in. Can also attach the unigene list for
one of our datasets; if you want give conversion a try; let me know what you
find/develop. If this results in useful tools, we can link them to the
Xenbase page so that others can do these conversions readily also.
Maybe Paul's new and improved version of his site would do it, but if you
wanted to chime in and see if there are things to do to improve it this may
be welcome.
Cheers,
Kris
_________________________________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
More information about the Troplist
mailing list