Python programming for Bioinformatics

파이썬으로 실무에 적용 가능할 법한 bioinformatics 연습문제를 만들어서 풀어보겠습니다.

input 데이터를 사용해서 output 형태로 만들면 성공입니다.

INPUT

HMMPFAM_human.txt

1.73MB

1.문제 설명 :

1)첫 번째 필드인 IPI 별로 두 번째 필드인 Domain 을 가져온다. 이때, 해당 IPI에 들어가는 Domain은 중복처리를 하여 여러 번 Domain이 나와도 한번으로만 처리한다.

2) 첫 번째 필드의 IPI 에 대해서 세 번째 필드의 값을 모두 더해준다. Domain의 중복 여부에 상관없이 모두 값을 더해준다.

OUTPUT

2. 결과 파일:

• IPI 별로 Domain 개수가 4인 것만을 출력한다.

• 각 필드는 tab으로 구분한다.

• 1번째 필드 : IPI_number

• 2번째 필드 : 4개의 Domain 정보와 구분은 세 개의 콤마가 들어감.

• 3번째 필드 : IPI 에 해당하는 (input파일)세 번째 필드의 모든 합.

=> 결과는 header 없이 663 라인이 나옵니다.

코드 설명

# data load
f_name = 'HMMPFAM_human.txt'
f = open(f_name)
field = f.readlines()

#create dict
adict = {}

for i in field:
	split_field = i.split('\t')

	if adict.get(split_field[0]):
		adict[split_field[0]].add(split_field[1])
	else:
		adict[split_field[0]] = set([split_field[1]])

f.close()

# print adict
f = open(f_name)
field = f.readlines()

sum_adict = {}

for i in field:
	split_field = i.split('\t')

	if sum_adict.get(split_field[0]):
		sum_adict[split_field[0]] += int(split_field[2])
	else:
		sum_adict[split_field[0]] = int(split_field[2])


for i in adict.keys():
	if len(adict[i])==4:
		print(f"{i}\t{', '.join(adict[i])}\t{sum_adict[i]}")

'Python' 카테고리의 다른 글

Python programming for Bioinformatics - 연습문제 3 (0)	2023.07.24
Python programming for Bioinformatics - 연습문제 2 (0)	2023.07.24
Pandas DataFrame 첫번째 행을 헤더로 지정 (1)	2022.03.11
for 나 while 루프 뒤에 else 블록을 사용하지 말자 (1)	2022.01.18
range 보다는 enumerate 를 사용하자 (0)	2022.01.17

코딩하는 미토콘드리아의 Bioinformatics Lab

Python programming for Bioinformatics - 연습문제 1

'Python' 카테고리의 다른 글

티스토리툴바

Python programming for Bioinformatics - 연습문제 1

'Python' 카테고리의 다른 글

관련글

티스토리툴바