[๊ฟ€ํŒ] Python์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ

Posted by Euisuk's Dev Log on May 24, 2024

[๊ฟ€ํŒ] Python์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๋‹ค๋ฃจ๊ธฐ

์›๋ณธ ๊ฒŒ์‹œ๊ธ€: https://velog.io/@euisuk-chung/๊ฟ€ํŒ-Python์œผ๋กœ-๋‹ค์–‘ํ•œ-๋ฐ์ดํ„ฐ-๋‹ค๋ฃจ๊ธฐ

์•ˆ๋…•ํ•˜์„ธ์š”๐Ÿค— ์˜ค๋Š˜์€ Python์œผ๋กœ ๋‹ค์–‘ํ•œ ํŒŒ์ผ ํ˜•์‹์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•˜๋Š” ๊ธ€์„ ์ž‘์„ฑํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!

ํŒŒ์ด์ฌ์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„์„์ด๋‚˜ ๋จธ์‹ ๋Ÿฌ๋‹ ์ž‘์—…์„ ํ•˜๋‹ค ๋ณด๋ฉด ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ํŒŒ์ผ์„ ๋งˆ์ฃผํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํŠนํžˆ๋‚˜ LLM(Large Language Model), aka ChatGPT ์‹œ๋Œ€์— ์ ‘์–ด๋“ค๋ฉด์„œ ๊ธฐ์กด์— ์‚ฌ์šฉํ•˜๋˜ CSVํŒŒ์ผ ์ด์™ธ์—๋„ PDF, PPT ๋“ฑ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ํŒŒ์ผ์„ ๋กœ๋“œ ๋ฐ ์ €์žฅํ•  ์ผ์ด ๋งŽ์•„์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค!

์ €๋„ ์ด๋ฒˆ ๊ธฐํšŒ์— ํ•œ๋ฒˆ ์ •๋ฆฌํ•˜๊ณ  ๋‘๊ณ ๋‘๊ณ  ์ฐธ๊ณ  ํ•ด์„œ ์“ฐ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค!!โœ ๊ฐ™์ด ํ•œ๋ฒˆ ์‚ดํŽด ๋ณด์‹œ์ฃ  ๐Ÿ“–

  1. CSV ํŒŒ์ผ(.csv) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

CSV

CSV (Comma-Separated Values) ํŒŒ์ผ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ €์žฅํ•˜๋Š” ํŒŒ์ผ ํ˜•์‹์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ๋ฐ์ดํ„ฐ ๊ตํ™˜์ด๋‚˜ ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ ์ €์žฅ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import csv

# CSV ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with open('example.csv', mode='r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        print(row)

# CSV ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
with open('example.csv', mode='w', newline='') as file:
    csv_writer = csv.writer(file)
    csv_writer.writerow(['Name', 'Age', 'City'])
    csv_writer.writerow(['Alice', '24', 'New York'])
    csv_writer.writerow(['Bob', '30', 'Los Angeles'])
  1. Excel ํŒŒ์ผ(.xlsx) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

xlsx

Excel ํŒŒ์ผ์€ Microsoft Excel์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์Šคํ”„๋ ˆ๋“œ์‹œํŠธ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ์‹œํŠธ์™€ ๋‹ค์–‘ํ•œ ์„œ์‹์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
import pandas as pd

# Excel ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
df = pd.read_excel('example.xlsx')
print(df)

# Excel ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
df.to_excel('example_saved.xlsx', index=False)
  1. JSON ํŒŒ์ผ(.json) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

JSON

JSON (JavaScript Object Notation) ํŒŒ์ผ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์กฐํ™”๋œ ํ…์ŠคํŠธ ํ˜•์‹์œผ๋กœ ์ €์žฅํ•˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ์›น ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ๋ฐ์ดํ„ฐ ๊ตํ™˜์šฉ์œผ๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
import json

# JSON ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with open('example.json', 'r') as file:
    data = json.load(file)
    print(data)

# JSON ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
with open('example_saved.json', 'w') as file:
    json.dump(data, file, indent=4)
  1. ์ด๋ฏธ์ง€ ํŒŒ์ผ(.png) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

์ด๋ฏธ์ง€

์ด๋ฏธ์ง€ ํŒŒ์ผ์€ ์‚ฌ์ง„์ด๋‚˜ ๊ทธ๋ฆผ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ํฌ๋งท์ด ์žˆ์ง€๋งŒ ๋Œ€ํ‘œ์ ์œผ๋กœ๋Š” JPG์™€ PNG ํŒŒ์ผ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
from PIL import Image

# ์ด๋ฏธ์ง€ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ (JPG, PNG ๋“ฑ)
image = Image.open('example.png')
image.show()

# ์ด๋ฏธ์ง€ ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ (JPG, PNG ๋“ฑ)
image.save('example_saved.png')
  1. ํ…์ŠคํŠธ ํŒŒ์ผ(.txt) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

TXT

ํ…์ŠคํŠธ ํŒŒ์ผ์€ ์ˆœ์ˆ˜ํ•œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ํŒŒ์ผ ํ˜•์‹ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
# ํ…์ŠคํŠธ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with open('example.txt', 'r') as file:
    content = file.read()
    print(content)

# ํ…์ŠคํŠธ ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
with open('example.txt', 'w') as file:
    file.write("์ด๊ฒƒ์€ ์ €์žฅ๋  ํ…์ŠคํŠธ์ž…๋‹ˆ๋‹ค.")
  1. YAML ํŒŒ์ผ(.yaml) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

YAML

YAML (YAML Ainโ€™t Markup Language) ํŒŒ์ผ์€ ์„ค์ • ํŒŒ์ผ์ด๋‚˜ ๋ฐ์ดํ„ฐ ์ง๋ ฌํ™”์— ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ํ˜•์‹์ž…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์ด ์ฝ๊ณ  ์“ฐ๊ธฐ ์‰ฝ๊ฒŒ ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
import yaml

# YAML ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with open('example.yaml', 'r') as file:
    data = yaml.safe_load(file)
    print(data)

# YAML ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
with open('example_saved.yaml', 'w') as file:
    yaml.safe_dump(data, file)
  1. ํ”ผํด ํŒŒ์ผ(.pkl) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

ํ”ผํด

ํ”ผํด ํŒŒ์ผ์€ Python ๊ฐ์ฒด๋ฅผ ์ง๋ ฌํ™”ํ•˜์—ฌ ์ €์žฅํ•˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ๋‚˜์ค‘์— ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
import pickle

# ํ”ผํด ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with open('example.pkl', 'rb') as file:
    data = pickle.load(file)
    print(data)

# ํ”ผํด ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
with open('example_saved.pkl', 'wb') as file:
    pickle.dump(data, file)
  1. SQLite ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํŒŒ์ผ(.db) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

SQLite

SQLite ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํŒŒ์ผ์€ ๊ฒฝ๋Ÿ‰ SQL ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ด€๋ฆฌ ์‹œ์Šคํ…œ์ธ SQLite์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ์ž‘๊ณ  ๋น ๋ฅด๋ฉฐ ๋…๋ฆฝ์ ์ธ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํŒŒ์ผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import sqlite3

# SQLite ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM example_table")
rows = cursor.fetchall()
for row in rows:
    print(row)
conn.close()

# SQLite ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
cursor.execute("INSERT INTO example_table (name, age) VALUES (?, ?)", ('Charlie', 35))
conn.commit()
conn.close()
  1. HDF5(.h5) ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

HDF5

HDF5 (Hierarchical Data Format version 5) ํŒŒ์ผ์€ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ €์žฅํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ํŒŒ์ผ ํ˜•์‹์ž…๋‹ˆ๋‹ค. ๊ณผํ•™ ๋ฐ์ดํ„ฐ์— ์ž์ฃผ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
import h5py

# HDF5 ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
with h5py.File('example.h5', 'r') as file:
    data = file['dataset_name'][:]
    print(data)

# HDF5 ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
with h5py.File('example_saved.h5', 'w') as file:
    file.create_dataset('dataset_name', data=data)
  1. MATLAB ํŒŒ์ผ(.mat) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

MATLAB

MAT ํŒŒ์ผ์€ MATLAB์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ผ ํฌ๋งท์œผ๋กœ, MATLAB ์ž‘์—… ํ™˜๊ฒฝ์˜ ๋ณ€์ˆ˜๋ฅผ ์ €์žฅํ•ด์š”.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
from scipy.io import loadmat, savemat

# MATLAB ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
data = loadmat('file.mat')
print(data)

# MATLAB ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
savemat('file_saved.mat', {'variable_name': data})
  1. XML ํŒŒ์ผ(.xml) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

XML

XML (eXtensible Markup Language) ํŒŒ์ผ์€ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ๋งˆํฌ์—… ์–ธ์–ด๋กœ, ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ๊ตํ™˜ ํฌ๋งท์œผ๋กœ ์‚ฌ์šฉ๋ผ์š”.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import xml.etree.ElementTree as ET

# XML ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
tree = ET.parse('example.xml')
root = tree.getroot()
for child in root:
    print(child.tag, child.attrib)

# XML ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
root = ET.Element("root")
child = ET.SubElement(root, "child")
child.text = "์ด๊ฒƒ์€ ์ €์žฅ๋  ํ…์ŠคํŠธ์ž…๋‹ˆ๋‹ค."
tree = ET.ElementTree(root)
tree.write('example_saved.xml')
  1. PPTX ํŒŒ์ผ(.pptx) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ์™€ ์ €์žฅํ•˜๊ธฐ

PPTX

PPTX ํŒŒ์ผ์€ Microsoft PowerPoint ํ”„๋ ˆ์  ํ…Œ์ด์…˜ ํŒŒ์ผ ํ˜•์‹์ด์—์š”. python-pptx ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์–ด์š”.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
11
from pptx import Presentation

# PPTX ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
presentation = Presentation('example.pptx')
for slide in presentation.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            print(shape.text)

# PPTX ํŒŒ์ผ ์ €์žฅํ•˜๊ธฐ
presentation.save('example_saved.pptx')
  1. PDF ํŒŒ์ผ(.pdf) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

PDF

PDF (Portable Document Format) ํŒŒ์ผ์€ Adobe Systems์—์„œ ๊ฐœ๋ฐœํ•œ ์ „์ž ๋ฌธ์„œ ํ˜•์‹์ž…๋‹ˆ๋‹ค. yPDF2 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋กœ ์ฝ๊ณ  ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import PyPDF2

def load_pdf(file_path):
    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfFileReader(file)
        pages = [reader.getPage(i) for i in range(reader.numPages)]
    return pages

def create_pdf(pages):
    writer = PyPDF2.PdfFileWriter()
    for page in pages:
        writer.addPage(page)
    return writer

def save_pdf(writer, output_path):
    with open(output_path, 'wb') as file:
        writer.write(file)

# ์ „์ฒด ์˜ˆ์‹œ
input_path = 'example.pdf'
output_path = 'new_example.pdf'

# PDF ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
pages = load_pdf(input_path)

# PDF ์ž‘์„ฑํ•˜๊ธฐ
pdf_writer = create_pdf(pages)

# PDF ์ €์žฅํ•˜๊ธฐ
save_pdf(pdf_writer, output_path)

์ด๋ ‡๊ฒŒ ๋‹ค์–‘ํ•œ ํŒŒ์ผ ํ˜•์‹์„ Python์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ดค์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•๋“ค์„ ํ™œ์šฉํ•˜๋ฉด ์—ฌ๋Ÿฌ ์œ ํ˜•์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์†์‰ฝ๊ฒŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ถ”๊ฐ€๋กœ ๋‹ค๋ฃจ๊ณ  ์‹ถ์€ ํŒŒ์ผ ํ˜•์‹์ด ์žˆ๋‹ค๋ฉด ๋Œ“๊ธ€๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”! ๐Ÿ˜Ž

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค~~!



-->