Item request has been placed!
×
Item request cannot be made.
×
Processing Request
dgfr: an R package to assess sequence diversity of gene families.
Item request has been placed!
×
Item request cannot be made.
×
Processing Request
- Additional Information
- Source:
Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
- Publication Information:
Original Publication: [London] : BioMed Central, 2000-
- Subject Terms:
- Abstract:
Background: Gene families are groups of homologous genes that often have similar biological functions. These families are formed by gene duplication events throughout evolution, resulting in multiple copies of an ancestral gene. Over time, these copies can acquire mutations and structural variations, resulting in members that may vary in size, motif ordering and sequence. Multigene families have been described in a broad range of organisms, from single-celled bacteria to complex multicellular organisms, and have been linked to an array of phenomena, such as host-pathogen interactions, immune evasion and embryonic development. Despite the importance of gene families, few approaches have been developed for estimating and graphically visualizing their diversity patterns and expression profiles in genome-wide studies.
Results: Here, we introduce an R package named dgfr, which estimates and enables the visualization of sequence divergence within gene families, as well as the visualization of secondary data such as gene expression. The package takes as input a multi-fasta file containing the coding sequences (CDS) or amino acid sequences from a multigene family, performs a pairwise alignment among all sequences, and estimates their distance, which is subjected to dimension reduction, optimal cluster determination, and gene assignment to each cluster. The result is a dataset that allows for the visualization of sequence divergence and expression within the gene family, an approximation of the number of clusters present in the family.
Conclusions: dgfr provides a way to estimate and study the diversity of gene families, as well as visualize the dispersion and secondary profile of the sequences. The dgfr package is available at https://github.com/lailaviana/dgfr under the GPL-3 license.
(© 2024. The Author(s).)
- References:
Infect Immun. 2012 Jul;80(7):2258-64. (PMID: 22431647)
Bioinformatics. 2009 Sep 15;25(18):2397-403. (PMID: 19605421)
Mol Biochem Parasitol. 1993 Jun;59(2):293-303. (PMID: 8341326)
Mol Biochem Parasitol. 2001 May;114(2):143-50. (PMID: 11378194)
PLoS Pathog. 2021 Jan 28;17(1):e1009254. (PMID: 33508020)
- Grant Information:
MR/T016019/1 MRC New Investigator Research Grant; APQ-01822-18 Fundação de Amparo à Pesquisa do Estado de Minas Gerais; 310531/2023-3 Conselho Nacional de Desenvolvimento Científico e Tecnológico
- Contributed Indexing:
Keywords: Clustering; Gene families; Sequence diversity
- Publication Date:
Date Created: 20240606 Date Completed: 20240606 Latest Revision: 20240609
- Publication Date:
20240609
- Accession Number:
PMC11155016
- Accession Number:
10.1186/s12859-024-05826-2
- Accession Number:
38844845
No Comments.