본문 바로가기
Python

python 판다스(pandas) DataFrame 구조 통계 확인하기

by 코딩하는 미토콘드리아 bioinformatics 2023. 10. 21.
반응형

전체 DataFrame 구조/통계 파악하기

 

1. DataFrame 내용/구조 살펴보기

df.info() # index & data types
dfh = df.head(k) # first k rows
dft = df.tail(k) # last k rows
dfs = df.describe() # summary stats cols
top_left_corner_df = df.iloc[:4, :4]


2. DataFrame 비인덱싱 속성

df = df.T # rows and cols 전환
l = df.axes # row and col indexes 확인
(r_idx, c_idx) = df.axes # 위에서 부터
s = df.dtypes # Series column data types
b = df.empty # empty DataFrame 만들기
i = df.ndim # number of axes (it is 2)
t = df.shape # (row-count, column-count)
i = df.size # row-count * column-count
a = df.values # numpy array for df

 

3. DataFrame 유틸리티 메서드

df = df.copy() # DataFrame 복사
df = df.rank() # rank each col (default)
df = df.sort_values(by=col)
df = df.sort_values(by=[col1, col2])
df = df.sort_index()
df = df.astype(dtype) # type conversion


4. DataFrame 반복 방법

df.iteritems() # (col-index, Series) pairs
df.iterrows() # (row-index, Series) pairs

for (name, series) in df.iteritems():
 	print('\nCol name: ' + str(name))
 	print('1st value: ' + str(series.iat[0]))


5. 전체 DataFrame에 대한 수학/통계 계산 방법

df = df.abs() # absolute values 
df = df.add(o) # add df, Series or value
s = df.count() # non NA/null values
df = df.cummax() # (cols default axis)
df = df.cummin() # (cols default axis)
df = df.cumsum() # (cols default axis)
df = df.diff() # 1st diff (col def axis)
df = df.div(o) # div by df, Series, value
df = df.dot(o) # matrix dot product
s = df.max() # max of axis (col def)
s = df.mean() # mean (col default axis)
s = df.median() # median (col default)
s = df.min() # min of axis (col def)
df = df.mul(o) # mul by df Series val
s = df.sum() # sum axis (cols default)
df = df.where(df > 0.5, other=np.nan)


6. 색인 레이블 값을 기준으로 행/열 선택/필터링

df = df.filter(items=['a', 'b']) # by col
df = df.filter(items=[5], axis=0) # by row
df = df.filter(like='x') # keep x in col
df = df.filter(regex='x') # regex in col
df = df.select(lambda x: not x%5) # 5th rows

 

 

참고 : https://www.geeksforgeeks.org/pandas-cheat-sheet/

 

Pandas Cheat Sheet for Data Science in Python

This cheat sheet provides a quick reference to the most common Pandas commands, covering everything from data loading and manipulation to plotting and visualization. Whether you're a beginner or a seasoned data scientist, this cheat sheet is a valuable res

www.geeksforgeeks.org

반응형