반응형
DataFrame 결합/그룹화
인덱스 병합
df_new = pd.merge(left=df1, right=df2,
how='outer', left_index=True,
right_index=True)
df_new = df1.join(other=df2, on='col1',
how='outer')
df_new = df1.join(other=df2,on=['a','b'],
how='outer')
열 병합
df_new = pd.merge(left=df1, right=df2,
how='left', left_on='col1',
right_on='col2')
Concatenation 으로 병합
df=pd.concat([df1,df2],axis=0)#top/bottom
df = df1.append([df2, df3]) #top/bottom
df=pd.concat([df1,df2],axis=1)#left/right
Combine_first 로 병합
df = df1.combine_first(other=df2)
# multi-combine with python reduce()
df = reduce(lambda x, y:
x.combine_first(y),
[df1, df2, df3, df4, df5])
DataFrame Groupby (분할-적용-결합)
그룹화
gb = df.groupby('cat') # by one columns
gb = df.groupby(['c1','c2']) # by 2 cols
gb = df.groupby(level=0) # multi-index gb
gb = df.groupby(level=['a','b']) # mi gb
print(gb.groups)
그룹 선택
dfa = df.groupby('cat').get_group('a')
dfb = df.groupby('cat').get_group('b')
집계 함수 적용
# apply to a column ...
s = df.groupby('cat')['col1'].sum()
s = df.groupby('cat')['col1'].agg(np.sum)
# apply to the every column in DataFrame
s = df.groupby('cat').agg(np.sum)
df_summary = df.groupby('cat').describe()
df_row_1s = df.groupby('cat').head(1)
여러 집계 함수 적용
gb = df.groupby('cat')
# apply multiple functions to one column
dfx = gb['col2'].agg([np.sum, np.mean])
# apply to multiple fns to multiple cols
dfy = gb.agg({
'cat': np.count_nonzero,
'col1': [np.sum, np.mean, np.std],
'col2': [np.min, np.max]
})
기능 변환
# transform to group z-scores, which have
# a group mean of 0, and a std dev of 1.
zscore = lambda x: (x-x.mean())/x.std()
dfz = df.groupby('cat').transform(zscore)
# replace missing data with group mean
mean_r = lambda x: x.fillna(x.mean())
dfm = df.groupby('cat').transform(mean_r)
필터링 기능 적용
# select groups with more than 10 members
eleven = lambda x: (len(x['col1']) >= 11)
df11 = df.groupby('cat').filter(eleven)
행 인덱스를 기준으로 그룹화
df = df.set_index(keys='cat')
s = df.groupby(level=0)['col1'].sum()
dfg = df.groupby(level=0).sum()
참고 : https://www.geeksforgeeks.org/pandas-cheat-sheet
Pandas Cheat Sheet for Data Science in Python
This cheat sheet provides a quick reference to the most common Pandas commands, covering everything from data loading and manipulation to plotting and visualization. Whether you're a beginner or a seasoned data scientist, this cheat sheet is a valuable res
www.geeksforgeeks.org
반응형
'Python' 카테고리의 다른 글
python 판다스(pandas) DataFrame matplotlib plotting 그래프 만들기 (2) | 2023.11.16 |
---|---|
python 판다스(pandas) 소개 및 기능 (0) | 2023.11.12 |
python 판다스(pandas) DataFrame 셀 cells 작업 (2) | 2023.11.12 |
python 판다스(pandas) DataFrame rows 행 작업 (0) | 2023.11.12 |
python 판다스(pandas) DataFrame columns 열 작업 (0) | 2023.10.21 |