본문 바로가기
Python

VCF 파일 pandas dataframe 으로 전환하기

by 코딩하는 미토콘드리아 bioinformatics 2024. 3. 15.
반응형

VCF -> Dataframe

 

1. VCF 파일 불러오기

2. metadata 제거

3. 나머지 데이터 pandas  전환

 

import io
import os
import pandas as pd


def read_vcf(path):
    with open(path, 'r') as f:
        lines = [l for l in f if not l.startswith('##')]
    return pd.read_csv(
        io.StringIO(''.join(lines)),
        dtype={'#CHROM': str, 'POS': int, 'ID': str, 'REF': str, 'ALT': str,
               'QUAL': str, 'FILTER': str, 'INFO': str},
        sep='\t'
    ).rename(columns={'#CHROM': 'CHROM'})

 

 

참고:https://dmnfarrell.github.io/bioinformatics/multi-sample-vcf-dataframe

 

Bioinformatics and other bits - Convert a multi-sample VCF to a pandas DataFrame

Background Here is some code I wrote to convert a vcf file with many samples into a table format. This was done to make the calls for many samples easier to read. Reading a multi sample vcf is tortuous. The vcf is read in using pyVCF and for each record (a

dmnfarrell.github.io

반응형