본문 바로가기
Bioinformatics

VCF 파일에서 filter 열 PASS 만 골라내기

by 코딩하는 미토콘드리아 bioinformatics 2024. 3. 13.
반응형

VCF 파일에서 filter 열 PASS 만 골라내기

 

방법1: python script

#extract.py

cnt = 0

with open("sample1.vcf", "r") as fr:
	for line in fr:
		if line.startswith("#"):
		pass
	else:
		l = line.split()
		if l[6] == "PASS":
			cnt += 1

print(cnt)

 

방법2: linux command line

awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print}' sample1.vcf > sample1_pass.vcf
#or
awk '$7=="PASS" {print $0}' sample1.vcf > sample1_pass.vcf

 

방법3: bcftools

 bcftools view -i 'ID="PASS"' sample1.vcf > sample1_pass.vcf
 #or
 bcftools view -i "%FILTER='PASS' | %FILTER='.'" sample1.vcf.gz
 #or
 bcftools view -f 'PASS,.' sample1.vcf.gz

 

 

참고:https://samtools.github.io/bcftools/bcftools.html

 

bcftools(1)

HTSlib was designed with BCF format in mind. When parsing VCF files, all records are internally converted into BCF representation. Simple operations, like removing a single column from a VCF file, can be therefore done much faster with standard UNIX comman

samtools.github.io

 

반응형