본문 바로가기
Python

python 판다스(pandas) DataFrame 기초통계 확인하기

by 코딩하는 미토콘드리아 bioinformatics 2023. 11. 18.
반응형

DataFrame 기초통계

 

요약 통계

s = df['col1'].describe()
df1 = df.describe()

 

 

주요 통계 메소드

df.corr() # pairwise correlation cols
df.cov() # pairwise covariance cols
df.kurt() # kurtosis over cols (def)
df.mad() # mean absolute deviation
df.sem() # standard error of mean
df.var() # variance over cols (def)


값 개수(count)

s = df['col1'].value_counts()

 

교차표(빈도수)

ct = pd.crosstab(index=df['a'], cols=df['b'])


분위수 및 순위

quants = [0.05, 0.25, 0.5, 0.75, 0.95]
q = df.quantile(quants)
r = df.rank()


히스토그램 비닝 (Histogram binning)

count, bins = np.histogram(df['col1'])
count, bins = np.histogram(df['col'],
        	bins=5)
count, bins = np.histogram(df['col1'],
 		bins=[-3,-2,-1,0,1,2,3,4])


회귀(Regression)

import statsmodels.formula.api as sm
result = sm.ols(formula="col1 ~ col2 +
 		col3", data=df).fit()
print (result.params)
print (result.summary())

 

참고 : https://www.geeksforgeeks.org/pandas-cheat-sheet

 

Pandas Cheat Sheet for Data Science in Python

This cheat sheet provides a quick reference to the most common Pandas commands, covering everything from data loading and manipulation to plotting and visualization. Whether you're a beginner or a seasoned data scientist, this cheat sheet is a valuable res

www.geeksforgeeks.org

 

반응형