본문 바로가기
Python

python 판다스(pandas) DataFrame rows 행 작업

by 코딩하는 미토콘드리아 bioinformatics 2023. 11. 12.
반응형

DataFrame rows 작업

 

행 인덱스 및 레이블 가져오기

idx = df.index # get row index
label = df.index[0] # first row label
label = df.index[-1] # last row label
l = df.index.tolist() # get as a list
a = df.index.values # get as an array


(행) 인덱스 변경

df.index = idx # new ad hoc index
df = df.set_index('A') # col A new index
df = df.set_index(['A', 'B']) # MultiIndex
df = df.reset_index() # replace old w new
# note: old index stored as a col in df
df.index = range(len(df)) # set with list
df = df.reindex(index=range(len(df)))
df = df.set_index(keys=['r1','r2','etc'])
df.rename(index={'old':'new'}, inplace=True)

 

행 추가

df = original_df.append(more_rows_in_df)


행 삭제(이름별)

df = df.drop('row_label')
df = df.drop(['row1','row2']) # multi-row


열의 값을 기준으로 boolean 행 선택

df = df[df['col2'] >= 0.0]
df = df[(df['col3']>=1.0) | (df['col1']<0.0)]
df = df[df['col'].isin([1,2,5,7,11])]
df = df[~df['col'].isin([1,2,5,7,11])]
df = df[df['col'].str.contains('hello')]


여러 열에 대해 isin을 사용하여 행 선택

# make up some data
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = DataFrame(data)
# multi-column isin
lf = {1:[1, 3], 3:[8, 27]} # look for
f = df[df[list(lf)].isin(lf).all(axis=1)]


인덱스를 사용하여 행 선택

idx = df[df['col'] >= 2].index
print(df.ix[idx])


정수 위치로 행 조각 선택

df = df[:] # copy entire DataFrame
df = df[0:2] # rows 0 and 1
df = df[2:3] # row 2 (the third row)
df = df[-1:] # the last row
df = df[:-1] # all but the last row
df = df[::2] # every 2nd row (0 2 ..)


레이블/색인별로 행 조각 선택

df = df['a':'c'] # rows 'a' through 'c'


DataFrame에 열 합계 행 추가

# Option 1: use dictionary comprehension
sums = {col: df[col].sum() for col in df}
sums_df = DataFrame(sums,index=['Total'])
df = df.append(sums_df)
# Option 2: All done with pandas
df = df.append(DataFrame(df.sum(),
 	columns=['Total']).T)


DataFrame 행 반복

for (index, row) in df.iterrows(): # pass


DataFrame 행 값 정렬 (sort)

df = df.sort(df.columns[0],
 			ascending=False)
df.sort(['col1', 'col2'], inplace=True)


행 인덱스를 기준으로 DataFrame 정렬

df.sort_index(inplace=True) # sort by row
df = df.sort_index(ascending=False)


행의 무작위 선택

import random as r
k = 20 # pick a number
selection = r.sample(range(len(df)), k)
df_sample = df.iloc[selection, :] # get copy


행 인덱스에서 중복 삭제

df['index'] = df.index # 1 create new col
df = df.drop_duplicates(cols='index',
         take_last=True)# 2 use new col
del df['index'] # 3 del the col
df.sort_index(inplace=True)# 4 tidy up


두 DataFrame의 행 인덱스가 동일한지 테스트

len(a)==len(b) and all(a.index==b.index)


행 또는 열 인덱스 레이블의 정수 위치를 가져옵니다.

i = df.index.get_loc('row_label')


조건을 충족하는 행의 정수 위치 가져오기

a = np.where(df['col'] >= 2) #numpy array


행 인덱스 값이 고유하거나 단조로운지 확인사기

if df.index.is_unique: pass # ...
b = df.index.is_monotonic_increasing
b = df.index.is_monotonic_decreasing


행 인덱스 중복 찾기

if df.index.has_duplicates:
	 print(df.index.duplicated())

 

 

참고 : https://www.geeksforgeeks.org/pandas-cheat-sheet

 

Pandas Cheat Sheet for Data Science in Python

This cheat sheet provides a quick reference to the most common Pandas commands, covering everything from data loading and manipulation to plotting and visualization. Whether you're a beginner or a seasoned data scientist, this cheat sheet is a valuable res

www.geeksforgeeks.org

 

반응형