分享

利用python和postgresql构建自己的数据库

 脑系科数据科学 2024-05-24 发布于德国

各位学友大家好,这几天几个外院的老师问我想构建自己的关系型数据库应该怎么做?目前数据分析已经成为独立的研究方向,构建自己的数据库是很有意义的,这里给大家说一下

首先进入自己的psql,输入以下代码

postgres=# CREATE TABLESPACE sicdb_space LOCATION 'E:/pgsql_data';

CREATE TABLESPACE

postgres=# CREATE DATABASE sicdb TABLESPACE sicdb_space;

CREATE DATABASE

然后找到自己对应的链接地址用python输入

import pandas as pd

from sqlalchemy import create_engine, text

from sqlalchemy.exc import SQLAlchemyError

import glob

import os

# PostgreSQL database connection string

connection_string = 'postgresql://postgres:3027291@localhost:5432/sicdb'

# Create the engine

engine = create_engine(connection_string)

# Directory where your .csv.gz files are located (update this to your actual directory)

directory = 'E:/pgsql_data/salzburg'  # Replace with the actual path to your files on C drive

# Check if cases.csv.gz exists in the directory

if not glob.glob(f'{directory}/cases.csv.gz'):

    print("File cases.csv.gz not found in the directory.")

    exit()

# Size of each chunk (number of rows)

chunk_size = 5000  # Adjust based on your system's capabilities

# Loop through each .csv.gz file in the directory

for filepath in glob.glob(f'{directory}/*.csv.gz'):

    # Extract the file name without extension to use as table name

    table_name = os.path.basename(filepath).replace('.csv.gz', '')

    # Inform about the process starting for this file

    print(f"Starting to process {filepath} into {table_name}...")

    # Initialize a counter for chunks

    chunk_counter = 0

    primary_key = None

    primary_key_set = False

    try:

        # Establish a connection

        conn = engine.connect()

        # Read and insert the CSV file in chunks

        for chunk in pd.read_csv(filepath, compression='gzip', chunksize=chunk_size):

            if primary_key is None:

                primary_key = chunk.columns[0]

                # This basic script does not handle interruptions, so we will delete the table if it exists

                conn.execute(text(f'DROP TABLE IF EXISTS {table_name};'))

                conn.commit()

            trans = conn.begin()

            chunk.to_sql(name=table_name, con=conn, if_exists='append', index=False)

            trans.commit()

            # Pandas does not set a primary key. For query time reasons this cannot be done at the end, so do it now

            if primary_key_set is False:

                print(f"Setting primary key...")

                conn.execute(text(f'ALTER TABLE {table_name} ADD PRIMARY KEY ("{primary_key}");'))

                conn.commit()

                # Set the most important indices

                important_indices = ["CaseID", "PatientID", "DataID", "LaboratoryID", "RefID", "DrugID", "Offset", "FieldID"]

                for index in important_indices:

                    if index in chunk.columns and index != primary_key:

                        print(f"Setting {index} as index in {table_name}")

                        conn.execute(text(f'CREATE INDEX ON {table_name} ("{index}");'))

                        conn.commit()

                primary_key_set = True

            chunk_counter += 1

            row_count = chunk_counter * chunk_size

            print(f"Processed chunk {chunk_counter} ({row_count} rows) for {table_name}")

        print(f"Finished adding data to {table_name}")

    except SQLAlchemyError as e:

        # Roll back the transaction on error

        #trans.rollback()

        print(f"An error occurred: {e}")

    finally:

        # Ensure the connection is closed

        conn.close()

    print(f"Finished processing {filepath}.")


怎么样 过瘾吗?呵呵

查找并删除表空间记录:

sql

复制代码

SELECT oid, spcname FROM pg_tablespace WHERE spcname = 'sicdb_space';

DELETE FROM pg_tablespace WHERE spcname = 'sicdb_space';

在操作系统的命令行中重启 PostgreSQL 服务:

sh

复制代码

net stop postgresql-x64-15

net start postgresql-x64-15

重新创建表空间和数据库:

sql

复制代码

CREATE TABLESPACE sicdb_space LOCATION 'E:/sicdb_folder';

CREATE DATABASE sicdb TABLESPACE sicdb_space;

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多