【原】Python|python实现将题目转化为字典

算法与编程之美 2020-10-28

展开全文

问题描述

在这里首先要提到JSON文件，JSON文件是用来存储简单的数据结构和对象的文件，可以在web应用程序中进行数据交换。而它的格式就有点类似于常用的字典结构，形如：{‘title’ :’ 关于《花间集》说法错误的是’ ,’content’ :{ ‘A’ :’ 作者是赵崇佐’, ’B’ : ‘收录当时流行歌曲歌词’ }, ‘true_choice’:”C” , ’type’:’ 单选题’ }。今天要做的就是读取word里的信息并把它们按照如上的格式进行转化。

解决方案

首先要用python来解决并处理word的文档，就需要引进docx的库来读取word里的信息，读取出信息后，可以用正则表达式对信息进行进一步的提取和处理，最后以字典的格式存储并输出。

第一步引用docx库，读取每一个题目的信息并按不同的题目存放在列表中方便下一步处理。

file = docx.Document(s)
all_paragraphs = file.paragraphs
paragraphs_text = []
for paragraph in all_paragraphs:
     paragraphs_text.append(paragraph.text)
l = []
a = 0
for i in range(len(paragraphs_text)):
     if paragraphs_text[i] == '':
         l.append(paragraphs_text[a:i])
         a = i

第二步用正则表达式对信息进行进一步的提取和处理，最后字典的格式存储并输出。

list = []
for questions in l:
     val = {}
     cotent = {}
     for strs in questions:
         if re.match('\d', strs):
             val['title'] = strs
         if re.match('A', strs):
             cotent['A'] = strs[2:]
         if re.match('B', strs):
             cotent['B'] = strs[2:]
         if re.match('C', strs):
             cotent['C'] = strs[2:]
         if re.match('D', strs):
             cotent['D'] = strs[2:]
         if re.match('答案：', strs):
             val['true_choice'] = strs[3:]
         if re.match('题型：', strs):
             val['type'] = strs[3:]
     if len(cotent) > 1:
         val['count'] = cotent
     list.append(val)
return list

完整代码如下：

import docx
import re
def f(s):
     file = docx.Document(s)
     all_paragraphs = file.paragraphs
     paragraphs_text = []
     for paragraph in all_paragraphs:
         paragraphs_text.append(paragraph.text)
     l = []
     a = 0
     for i in range(len(paragraphs_text)):
         if paragraphs_text[i] == '':
             l.append(paragraphs_text[a:i])
             a = i
     list = []
     for questions in l:
         val = {}
         cotent = {}
         for strs in questions:
             if re.match('\d', strs):
                 val['title'] = strs
             if re.match('A', strs):
                 cotent['A'] = strs[2:]
             if re.match('B', strs):
                 cotent['B'] = strs[2:]
             if re.match('C', strs):
                 cotent['C'] = strs[2:]
             if re.match('D', strs):
                 cotent['D'] = strs[2:]
             if re.match('答案：', strs):
                 val['true_choice'] = strs[3:]
             if re.match('题型：', strs):
                 val['type'] = strs[3:]
         if len(cotent) > 1:
             val['count'] = cotent
         list.append(val)
     return list

print(f("D://print2.docx"))