반응형
python VCF file 정보 추출 (chromosome) 하고 plot 만들기
import matplotlib.pyplot as plt
def extract_chromosome(vcf_file):
chromosomes = []
with open(vcf_file, 'r') as f:
for line in f:
# Skip header lines
if line.startswith('#'):
continue
# Split the line into fields
fields = line.strip().split('\t')
# Extract chromosome information
chromosome = fields[0]
# Append the chromosome information to the list
chromosomes.append(chromosome)
return chromosomes
# Function to create a bar plot of chromosome frequencies
def plot_chromosome_frequencies(chromosomes):
chromosome_counts = {chromosome: chromosomes.count(chromosome) for chromosome in set(chromosomes)}
sorted_chromosomes = sorted(chromosome_counts.items(), key=lambda x: x[1], reverse=True)
chromosome_names, counts = zip(*sorted_chromosomes)
# Create bar plot
plt.figure(figsize=(10, 6))
plt.bar(chromosome_names, counts, color='skyblue')
plt.xlabel('Chromosome')
plt.ylabel('Frequency')
plt.title('Chromosome Frequencies in VCF File')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
# usage
if __name__ == "__main__":
vcf_file = 'sample.vcf'
chromosomes = extract_chromosome(vcf_file)
plot_chromosome_frequencies(chromosomes)
반응형
'Python' 카테고리의 다른 글
Pandas를 활용한 VCF 파일 분석 방법 (기본) (0) | 2024.07.14 |
---|---|
Pandas를 사용하여 첫 번째 행을 헤더로 설정하는 방법 (0) | 2024.07.14 |
python VCF file 데이터 불러오기 (0) | 2024.04.16 |
기업 코딩 테스트 후기 (Bioinformatics 포지션) (0) | 2024.04.08 |
VCF 파일 pandas dataframe 으로 전환하기 (0) | 2024.03.15 |