본문 바로가기
Python

python 판다스(pandas) DataFrame 만들기 CSV 가져오기

by 코딩하는 미토콘드리아 bioinformatics 2023. 10. 21.
반응형

데이터를 DataFrame으로 만들기

data -> DataFrame

 

1. 빈 DataFrame 인스턴스화

df = DataFrame()

2. CSV 파일에서 DataFrame 만들기

df = pd.read_csv('file.csv') # 주로 사용
df = pd.read_csv('file.csv', header=0,
 	index_col=0, quotechar='"', sep=':',
 	na_values = ['na', '-', '.', ''])

3. 인라인 CSV 텍스트에서 DataFrame으로 데이터 가져오기

from io import StringIO
data = """, Animal, Cuteness, Desirable
row-1, dog, 8.7, True
row-2, cat, 9.5, True
row-3, bat, 2.6, False"""
df = pd.read_csv(StringIO(data), header=0,
 	index_col=0, skipinitialspace=True)

4. Microsoft Excel 파일에서 DataFrame 만들기

# 각각의 Excel sheet -> Python dictionary
workbook = pd.ExcelFile('file.xlsx')
d = {} # empty dictionary
for sheet_name in workbook.sheet_names:
 	df = workbook.parse(sheet_name)
 	d[sheet_name] = df

5. 시리즈의 데이터는 DataFrame으로 결합

# 예시1 ...
s1 = Series(range(6))
s2 = s1 * s1
s2.index = s2.index + 2 # misalign indexes
df = pd.concat([s1, s2], axis=1)
# 예시2 ...
s3 = Series({'Tom':1, 'Dick':4, 'Har':9})
s4 = Series({'Tom':3, 'Dick':2, 'Mar':5})
df = pd.concat({'A':s3, 'B':s4 }, axis=1)

6. Python 사전에서 DataFrame 가져오기

df = DataFrame({
 	'col0' : [1.0, 2.0, 3.0, 4.0],
 	'col1' : [100, 200, 300, 400]
 })

7. Python 사전의 데이터에서 DataFrame 가져오기

df = DataFrame.from_dict({ 
 # rows as dictionaries
 	'row0' : {'col0':0, 'col1':'A'},
 	'row1' : {'col0':1, 'col1':'B'}
 	}, orient='index')
df = DataFrame.from_dict({ # data by row
 # rows as lists
 	'row0' : [1, 1+1j, 'A'],
 	'row1' : [2, 2+2j, 'B

8. data 생성 (testing 연습)

# creat simple data set
df = DataFrame(np.random.rand(50,5))

# row index
df = DataFrame(np.random.rand(500,5))
df.index = pd.date_range('1/1/2005',
 	periods=len(df), freq='M')

# alphabetic row and col indexes and a "groupable" variable
import string
import random

rows = 52
cols = 5

assert(1 <= rows <= 52)
df = DataFrame(np.random.randn(rows, cols),
 	columns=['c'+str(i) for i in range(cols)],
 	index=list((string.ascii_uppercase +
 		string.ascii_lowercase)[0:rows]))
df['groupable'] = [random.choice('abcde')
 	for _ in range(rows)]

 

DataFrame 저장

 

1. DataFrame을 CSV 파일로 저장

df.to_csv('name.csv', encoding='utf-8')


2. Excel 통합 문서에 DataFrame 저장

from pandas import ExcelWriter
writer = ExcelWriter('filename.xlsx')
df1.to_excel(writer,'Sheet1')
df2.to_excel(writer,'Sheet2')
writer.save()


3. Python 객체에 저장

d = df.to_dict() # to dictionary
str = df.to_string() # to string
m = df.as_matrix() # to numpy matrix

 

 

참고 : https://www.geeksforgeeks.org/pandas-cheat-sheet/

 

Pandas Cheat Sheet for Data Science in Python

This cheat sheet provides a quick reference to the most common Pandas commands, covering everything from data loading and manipulation to plotting and visualization. Whether you're a beginner or a seasoned data scientist, this cheat sheet is a valuable res

www.geeksforgeeks.org

 

반응형