【原】利用python和postgresql构建自己的数据库

脑系科数据科学 2024-05-24 发布于德国

展开全文

各位学友大家好，这几天几个外院的老师问我想构建自己的关系型数据库应该怎么做？目前数据分析已经成为独立的研究方向，构建自己的数据库是很有意义的，这里给大家说一下

首先进入自己的psql，输入以下代码

postgres=# CREATE TABLESPACE sicdb_space LOCATION 'E:/pgsql_data';

CREATE TABLESPACE

postgres=# CREATE DATABASE sicdb TABLESPACE sicdb_space;

CREATE DATABASE

然后找到自己对应的链接地址用python输入

import pandas as pd

from sqlalchemy import create_engine, text

from sqlalchemy.exc import SQLAlchemyError

import glob

import os

# PostgreSQL database connection string

connection_string = 'postgresql://postgres:3027291@localhost:5432/sicdb'

# Create the engine

engine = create_engine(connection_string)

# Directory where your .csv.gz files are located (update this to your actual directory)

directory = 'E:/pgsql_data/salzburg' # Replace with the actual path to your files on C drive

# Check if cases.csv.gz exists in the directory

if not glob.glob(f'{directory}/cases.csv.gz'):

print("File cases.csv.gz not found in the directory.")

exit()

# Size of each chunk (number of rows)

chunk_size = 5000 # Adjust based on your system's capabilities

# Loop through each .csv.gz file in the directory

for filepath in glob.glob(f'{directory}/*.csv.gz'):

# Extract the file name without extension to use as table name

table_name = os.path.basename(filepath).replace('.csv.gz', '')

# Inform about the process starting for this file

print(f"Starting to process {filepath} into {table_name}...")

# Initialize a counter for chunks

chunk_counter = 0

primary_key = None

primary_key_set = False

try:

# Establish a connection

conn = engine.connect()

# Read and insert the CSV file in chunks

for chunk in pd.read_csv(filepath, compression='gzip', chunksize=chunk_size):

if primary_key is None:

primary_key = chunk.columns[0]

# This basic script does not handle interruptions, so we will delete the table if it exists

conn.execute(text(f'DROP TABLE IF EXISTS {table_name};'))

conn.commit()

trans = conn.begin()

chunk.to_sql(name=table_name, con=conn, if_exists='append', index=False)

trans.commit()

# Pandas does not set a primary key. For query time reasons this cannot be done at the end, so do it now

if primary_key_set is False:

print(f"Setting primary key...")

conn.execute(text(f'ALTER TABLE {table_name} ADD PRIMARY KEY ("{primary_key}");'))

conn.commit()

# Set the most important indices

important_indices = ["CaseID", "PatientID", "DataID", "LaboratoryID", "RefID", "DrugID", "Offset", "FieldID"]

for index in important_indices:

if index in chunk.columns and index != primary_key:

print(f"Setting {index} as index in {table_name}")

conn.execute(text(f'CREATE INDEX ON {table_name} ("{index}");'))

conn.commit()

primary_key_set = True

chunk_counter += 1

row_count = chunk_counter * chunk_size

print(f"Processed chunk {chunk_counter} ({row_count} rows) for {table_name}")

print(f"Finished adding data to {table_name}")

except SQLAlchemyError as e:

# Roll back the transaction on error

#trans.rollback()

print(f"An error occurred: {e}")

finally:

# Ensure the connection is closed

conn.close()

print(f"Finished processing {filepath}.")

怎么样过瘾吗？呵呵

查找并删除表空间记录：

sql

复制代码

SELECT oid, spcname FROM pg_tablespace WHERE spcname = 'sicdb_space';

DELETE FROM pg_tablespace WHERE spcname = 'sicdb_space';

在操作系统的命令行中重启 PostgreSQL 服务：

复制代码

net stop postgresql-x64-15

net start postgresql-x64-15

重新创建表空间和数据库：

sql

复制代码

CREATE TABLESPACE sicdb_space LOCATION 'E:/sicdb_folder';

CREATE DATABASE sicdb TABLESPACE sicdb_space;

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自：脑系科数据科学 > 《2024python和R》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

脑系科数据科学

科学领域优质作者

关注对话

TA的最新馆藏

用最通俗的语言说清楚人体一天的能量需求
随机效应和残差方差
pycaret处理多协变量时间序列
利用pgAdmin，进行数据库表格的增删
利用python和postgresql构建自己的数据库
power analysis

喜欢该文的人也喜欢更多

热门阅读换一换