什么是CSV CSV广泛用于不同体系结构的应用程序之间交换数据表格信息,解决不兼容数据格式的互通问题,一般按照传输双方既定标准进行格式定义,而其本身并无明确格式标准。 CSV用逗号分隔字段的基本思想是清楚的,但是当字段数据也可能包含逗号或者甚至嵌入换行符时,该想法变得复杂。 CSV实现可能无法处理这些字段数据,或者可能会使用引号来包围字段。引用并不能解决所有问题:有些字段可能需要嵌入引号,因此CSV实现可能包含转义字符或转义序列。 RFC 4180提出了MIME类型(”text/csv”)对于CSV格式的标准,可以作为一般使用的常用定义,满足大多数实现似乎遵循的格式。 CSV的格式规范 1. 每一行记录位于一个单独的行上,用回车换行符CRLF(也就是\r\n)分割。 Each record is located on a separate line, delimited by a line break (CRLF). For example: aaa,bbb,ccc CRLF The last record in the file may or may not have an ending line break. For example: aaa,bbb,ccc CRLF There maybe an optional header line appearing as the first line of the file with the same format as normal record lines. This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file (the presence or absence of the header line should be indicated via the optional “header” parameter of this MIME type). For example: field_name,field_name,field_name CRLF Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma. For example: aaa,bbb,ccc Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example: "aaa","bbb","ccc" CRLF Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:(下面原文的例子可能有些问题) "aaa","b CRLF If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc" 纯文本,使用某个字符集,比如ASCII、Unicode、EBCDIC或GB2312; 正如CSV并不明确的格式,CSV文件的解析同样没有标准方法,一般可以自己实现读写,网上也有很多种不同语言的实现版本。例如opencsv、csvreader等。它们可能会与RFC中的规定有所出入,例如在csvreader中有要求: 前缀和后缀的空格字符,逗号和制表符,与逗号或记录分隔符相邻的内容将被修剪。 使用时需要注意。 |
|