Applications and Extensions of pClust to Big Microbial Proteomic Data

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Lockwood, Svetlana
Language:
English
Source:
ProQuest LLC. 2016Ph.D. Dissertation, Washington State University.
Publication Date:
2016
Document Type:
Dissertations/Theses - Doctoral Dissertations
Online Access:
http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10139743

Additional Information
- Availability:
  ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
- Peer Reviewed:
  N
- Source:
  100
- Subject Terms:
  Data; Microbiology; Comparative Analysis; Identification; Data Analysis; Genetics
- ISBN:
  978-1-339-95226-0
- Abstract:
  The goal of biological sciences is to understand the biomolecular mechanics of living organisms. Proteins serve as the foundation for organisms functional analysis and sequence analysis has shown to be invaluable in answering questions about individual organisms. The first step in any sequence analysis is alignment and it is common that even modestly sized studies involve hundreds of thousands of protein sequences. In multigenome studies, the time consideration for sequence alignment becomes paramount and heuristic algorithms are frequently used sacrificing accuracy for speedup. At the same time, new algorithms have appeared that provide not only highly efficient performance, but also guarantee to deliver optimal solutions. However, the adoption of these algorithms is hindered by the absence of generalized analysis pipeline as well as availability of user-friendly computational tools. In this dissertation we present applications of existing, computationally efficient algorithms to multigenome studies where we apply our developed pClust pipeline to various sets of microbial organisms. The computational time is significantly improved and the results are more accurate than those obtained by traditional methods. The first study is a baseline comparison study on a small set of 11 microorganisms. It compares pClust results to the existing scientific knowledge and finds it to be consistent while at the same time providing new insights. The second study addresses the question of identification of common tick-transmissiblity mechanisms across different species. It involves a larger set of 108 microbial genomes with approximately 127K protein sequences. Traditionally, a study of such scope would have required days or at least hours of CPU time of high-performance computers to produce all-versus-all sequence alignment. Using pClust it took less than 10 minutes on a desktop computer to perform sequence alignment and clustering. For this study we also developed a graphical user interface for pClust in order to make the new algorithms more accessible for use by microbiologists. The third study analyzes the set of all proteobacterial genomes. The study comprised of 2326 complete genomes containing 8.7M protein sequences. The alignment was performed using pGraph-Tascel algorithm on high-performance computers. This is the first study of its kind. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
- Abstract:
  As Provided
- Publication Date:
  2016
- Accession Number:
  ED571068

Comments

No Comments.

menu

Applications and Extensions of pClust to Big Microbial Proteomic Data

Contact CCPL

Patron Login

menu

Applications and Extensions of pClust to Big Microbial Proteomic Data

Engage with CCPL

Contact CCPL