본문 바로가기
Python

python 판다스(pandas) DataFrame 합치기 merge joining groupby 작업

by 코딩하는 미토콘드리아 bioinformatics 2023. 11. 12.
반응형

DataFrame 결합/그룹화

 

인덱스 병합

df_new = pd.merge(left=df1, right=df2,
 	how='outer', left_index=True,
 	right_index=True)
    
df_new = df1.join(other=df2, on='col1',
 	how='outer')
    
df_new = df1.join(other=df2,on=['a','b'],
 	how='outer')


열 병합

df_new = pd.merge(left=df1, right=df2,
 	how='left', left_on='col1',
 	right_on='col2')


Concatenation 으로 병합

df=pd.concat([df1,df2],axis=0)#top/bottom
df = df1.append([df2, df3]) #top/bottom
df=pd.concat([df1,df2],axis=1)#left/right


Combine_first 로 병합

df = df1.combine_first(other=df2)
# multi-combine with python reduce()
df = reduce(lambda x, y:
 	x.combine_first(y),
 		[df1, df2, df3, df4, df5])

 

 

DataFrame Groupby (분할-적용-결합)

 

그룹화

gb = df.groupby('cat') # by one columns
gb = df.groupby(['c1','c2']) # by 2 cols
gb = df.groupby(level=0) # multi-index gb
gb = df.groupby(level=['a','b']) # mi gb
print(gb.groups)


그룹 선택

dfa = df.groupby('cat').get_group('a')
dfb = df.groupby('cat').get_group('b')


집계 함수 적용

# apply to a column ...
s = df.groupby('cat')['col1'].sum()
s = df.groupby('cat')['col1'].agg(np.sum)
# apply to the every column in DataFrame
s = df.groupby('cat').agg(np.sum)
df_summary = df.groupby('cat').describe()
df_row_1s = df.groupby('cat').head(1)


여러 집계 함수 적용

gb = df.groupby('cat')
# apply multiple functions to one column
dfx = gb['col2'].agg([np.sum, np.mean])
# apply to multiple fns to multiple cols
dfy = gb.agg({
 'cat': np.count_nonzero,
 'col1': [np.sum, np.mean, np.std],
 'col2': [np.min, np.max]
})


기능 변환

# transform to group z-scores, which have
# a group mean of 0, and a std dev of 1.
zscore = lambda x: (x-x.mean())/x.std()
dfz = df.groupby('cat').transform(zscore)
# replace missing data with group mean
mean_r = lambda x: x.fillna(x.mean())
dfm = df.groupby('cat').transform(mean_r)


필터링 기능 적용

# select groups with more than 10 members
eleven = lambda x: (len(x['col1']) >= 11)
df11 = df.groupby('cat').filter(eleven)


행 인덱스를 기준으로 그룹화

df = df.set_index(keys='cat')
s = df.groupby(level=0)['col1'].sum()
dfg = df.groupby(level=0).sum()

 

 

참고 : https://www.geeksforgeeks.org/pandas-cheat-sheet

 

Pandas Cheat Sheet for Data Science in Python

This cheat sheet provides a quick reference to the most common Pandas commands, covering everything from data loading and manipulation to plotting and visualization. Whether you're a beginner or a seasoned data scientist, this cheat sheet is a valuable res

www.geeksforgeeks.org

 

반응형