Методы поиска интронов и экзонов в геномных последовательностях: примеры кода и инструменты - Fcodenotes

Чтобы найти интроны и экзоны, обычно требуется доступ к данным геномной последовательности и соответствующим аннотациям генов. Вот несколько методов, обычно используемых в биоинформатике для обнаружения интронов и экзонов, а также примеры кода:

Использование BioPython:

from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
# Load the genomic sequence and corresponding gene annotations
genomic_sequence = SeqIO.read("genome.fasta", "fasta")
gene_annotations = SeqIO.read("annotations.gff", "gff")
# Iterate over the gene annotations and extract the exons and introns
exons = []
introns = []
for feature in gene_annotations.features:
if feature.type == "exon":
    exon_location = feature.location
    exon_sequence = exon_location.extract(genomic_sequence.seq)
    exons.append(exon_sequence)
elif feature.type == "intron":
    intron_location = feature.location
    intron_sequence = intron_location.extract(genomic_sequence.seq)
    introns.append(intron_sequence)
# Print the exons and introns
print("Exons:")
for exon in exons:
print(exon)
print("Introns:")
for intron in introns:
print(intron)

Использование BEDTools:
Пакет BEDTools предоставляет набор мощных инструментов командной строки для геномного анализа. Вы можете использовать команду bedtoolsдля извлечения экзонов и интронов из геномных аннотаций в формате BED.

# Extract exons
bedtools getfasta -fi genome.fasta -bed annotations.bed -name -split > exons.fasta
# Extract introns
bedtools complement -i annotations.bed -g genome.size > introns.bed
bedtools getfasta -fi genome.fasta -bed introns.bed -name -split > introns.fasta

Использование Ensembl REST API:
Ensembl предоставляет REST API, который обеспечивает программный доступ к геномным данным. Вы можете использовать API для получения аннотаций генов и соответствующих последовательностей.

import requests
# Specify the gene ID
gene_id = "ENSG00000157764"
# Fetch gene information
response = requests.get(f"https://rest.ensembl.org/lookup/id/{gene_id}?expand=1")
if response.status_code == 200:
    gene_data = response.json()
    # Extract exons
    exons = []
    for transcript in gene_data["Transcript"]:
        exons.extend(transcript["Exon"])
    # Extract introns
    introns = []
    for i in range(len(exons) - 1):
        intron_start = exons[i]["end"] + 1
        intron_end = exons[i + 1]["start"] - 1
        intron_seq = genomic_sequence[intron_start:intron_end]
        introns.append(intron_seq)
    # Print exons and introns
    print("Exons:")
    for exon in exons:
        print(exon)
    print("Introns:")
    for intron in introns:
        print(intron)