Python将csv导入列表

我有一个大约有2000条记录的CSV文件。

每个记录都有一个字符串和一个类别：

 This is the first line,Line1
This is the second line,Line2
This is the third line,Line3

我需要将此文件读入如下列表：

 data = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]

如何使用Python将CSV导入到我需要的列表中？

然后使用csv模块：docs.python.org/2/library/csv.html

如果有适合您问题的答案，请接受。

如何使用Python读取和写入CSV文件的可能重复项？

#1 楼

使用csv模块：

 import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

print(data)

输出：

< pre class =“ lang-py prettyprint-override”>

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

如果需要元组：

 import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = [tuple(row) for row in reader]

print(data)

输出：

 [('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]

旧的Python 2答案，也使用csv模块：

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

为什么使用“ rb”而不是“ r”？

– imrek
15年5月21日在14:28

@ DrunkenMaster，b使文件以二进制模式（而不是文本模式）打开。在某些系统上，文本模式表示\ n在读取或写入时将转换为特定于平台的新行。参见文档。

– Maciej Gol
15年5月24日在8:12

这在Python 3.x中不起作用：“ csv.Error：迭代器应返回字符串，而不是字节（您是否以文本模式打开文件？）”有关在Python 3.x中起作用的答案，请参见下文

–吉尔伯特
16年5月30日在18:12

为了节省几秒钟的调试时间，您可能应该为第一个解决方案添加注释，例如“ Python 2.x版本”

–天堂
17年1月30日在9:03

如何使用第一种解决方案，但csv文件中只有一些列？

– Sigur
17年5月6日，3:13

#2 楼

已针对Python 3更新：

 import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print(your_list)

输出：

< pre class =“ lang-py prettyprint-override”>

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

指定“ r”是默认模式，因此无需指定。文档还提到如果csvfile是文件对象，则应使用newline =''将其打开。

– AMC
1月6日1:40

#3 楼

熊猫擅长处理数据。这是一个使用它的示例：

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

一个大的优点是，pandas自动处理标题行。

没听说过Seaborn，我建议看看。

另请参阅：如何使用Python读写CSV文件？

熊猫＃2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')

df的内容是：

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

字典的内容是

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]

lists的内容是：

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]

元组= [df.values中x的元组（x）]可以写成元组= list（df.itertuples（index = False））。请注意，Pandas文档不鼓励使用.values来支持.to_numpy（）。第三个例子让我感到困惑。首先，因为变量被命名为元组，这意味着它是一个元组列表，而实际上是一个列表列表。第二，因为据我所知，整个表达式可以用df.to_list（）替换。我也不知道第二个例子在这里是否真的有用。

– AMC
1月6日2:05

#4 楼

Python3更新：

 import csv
from pprint import pprint

with open('text.csv', newline='') as file:
    reader = csv.reader(file)
    res = list(map(tuple, reader))

pprint(res)

输出：

 [('This is the first line', ' Line1'),
 ('This is the second line', ' Line2'),
 ('This is the third line', ' Line3')]

如果csvfile是文件对象，则应使用newline='' .csv模块将其打开

为什么在列表理解上使用list（map（））？另外，请注意第二列每个元素开头的空格。

– AMC
1月6日17:14

#5 楼

如果您确定输入内容中没有逗号（除了分隔类别），则可以逐行读取文件并在,上拆分，然后将结果推送到List

看起来就像您正在查看CSV文件，因此您可以考虑为其使用模块

#6 楼

result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))

您能在这篇文章中添加一些解释吗？仅代码（有时）是好的，但是代码和解释（多数时候）是更好的

–巴兰卡
2014年7月9日在20:29

我知道Barranka的评论已经使用了一年多，但是对于任何偶然发现但无法弄清楚的人：对于text.splitlines（）中的行：将每个单独的行放入临时变量“ line”中。 line.split（“，”）创建一个逗号分隔的字符串列表。 tuple（〜）将该列表放入一个元组，append（〜）将其添加到结果中。循环之后，结果是一个元组列表，每个元组一行，而每个元组元素则是csv文件中的一个元素。

–路易
2015年10月18日在10:05

除了@Louis所说的以外，无需使用.read（）。splitlines（），您可以直接遍历文件的每一行：in_file中的行：res.append（tuple（line.rstrip（）.split（“，”）））另外，请注意，使用.split（'，'）意味着第二列的每个元素都将以多余的空格开头。

– AMC
1月6日17:22

我上面共享的代码的附录：line.rstrip（）-> line.rstrip（'\ n'）。

– AMC
1月6日17:29

#7 楼

一个简单的循环就足够了：

lines = []
with open('test.txt', 'r') as f:
    for line in f.readlines():
        l,name = line.strip().split(',')
        lines.append((l,name))

print lines

如果某些条目中包含逗号怎么办？

–托尼·恩尼斯（Tony Ennis）
16年2月16日在17:59

@TonyEnnis然后，您将需要使用更高级的处理循环。上面Maciej的答案显示了如何使用Python随附的csv解析器来执行此操作。该解析器很可能具有您需要的所有逻辑。

–亨特·麦克米伦
16-2-16在18:21

#8 楼

正如评论中已经说过的那样，您可以在python中使用csv库。 csv表示用逗号分隔的值，这似乎恰好在您的情况下：标签和用逗号分隔的值。

作为类别和值类型，我宁愿使用字典类型而不是元组列表。

无论如何，我在下面的代码中都展示了两种方式：d是字典，而l是元组列表。

import csv

file_name = "test.txt"
try:
    csvfile = open(file_name, 'rt')
except:
    print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l =  list()
for row in csvReader:
    d[row[1]] = row[0]
    l.append((row[0], row[1]))
print(d)
print(l)

为什么不使用上下文管理器来处理文件？为什么要混合两种不同的变量命名约定？（row [0]，row [1]）是否比仅使用tuple（row）更弱/更容易出错？

– AMC
1月6日在1:22

为什么您认为执行tuple（row）不太容易出错？您指的是什么变量命名约定？请链接官方的python命名约定。据我所知，try -except是处理文件的好方法：上下文处理程序是什么意思？

– Francesco Boi
1月6日15:41

您为什么认为执行tuple（row）不太容易出错？因为它不需要您手动写出每个索引。如果您输入有误，或者元素数量发生了变化，则必须返回并更改代码。 try-except很好，上下文管理器是with语句。您可以在该主题上找到大量资源，例如这一资源。

– AMC
1月6日15:44

我看不出上下文管理器会比ol'的try-except块更好。另一方面，积极的方面是您键入的代码更少；对于其余元素，如果元素数量（我想你是说列数）发生变化，则我的更好，因为它仅提取所需的值，而另一个则提取所有优点。没有任何特定要求，您无法说出哪个更好，因此浪费时间争论哪个更好：在这种情况下，两者都是有效的

– Francesco Boi
1月6日16:09

我看不出上下文管理器会比ol'的try-except块更好。请参阅我之前的评论，上下文管理器不会替换try-except。

– AMC
1月6日17:07

#9 楼

不幸的是，我发现没有一个现有的答案特别令人满意。

这是使用csv模块的简单，完整的Python 3解决方案。

 import csv

with open('../resources/temp_in.csv', newline='') as f:
    reader = csv.reader(f, skipinitialspace=True)
    rows = list(reader)

print(rows)

注意skipinitialspace=True参数。这是必要的，因为不幸的是，OP的CSV在每个逗号后都包含空格。

输出：

 [['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

#10 楼

稍微扩展您的需求，并假设您不在乎行的顺序，而是希望将其归类，以下解决方案可能对您有用：

>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
...     for line in f:
...         text, cat = line.rstrip("\n").split(",", 1)
...         dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

这样，您可以在字典中键为类别下获得所有可用的相关行。

#11 楼

这是Python 3.x中最简单的将CSV导入多维数组的方法，它仅4行代码而无需导入任何内容！

#pull a CSV into a multidimensional array in 4 lines!

L=[]                            #Create an empty list for the main array
for line in open('log.txt'):    #Open the file and read all the lines
    x=line.rstrip()             #Strip the \n from each line
    L.append(x.split(','))      #Split each line into a list and add it to the
                                #Multidimensional array
print(L)

注意，它是一个列表，而不是数组！为什么不使用上下文管理器正确处理文件对象？请注意，此解决方案在每一行的第二项上都留有多余的空格，并且如果任何数据包含逗号，它将失败。

– AMC
1月6日在1:19

#12 楼

您可以使用list()函数将csv阅读器对象转换为列表

import csv

with open('input.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)
    print(rows)

#13 楼

接下来是一段代码，该代码使用csv模块，但使用第一行（其为csv表的标头）将file.csv内容提取到字典列表中。

import csv
def csv2dicts(filename):
  with open(filename, 'rb') as f:
    reader = csv.reader(f)
    lines = list(reader)
    if len(lines) < 2: return None
    names = lines[0]
    if len(names) < 1: return None
    dicts = []
    for values in lines[1:]:
      if len(values) != len(names): return None
      d = {}
      for i,_ in enumerate(names):
        d[names[i]] = values[i]
      dicts.append(d)
    return dicts
  return None

if __name__ == '__main__':
  your_list = csv2dicts('file.csv')
  print your_list

为什么不只使用csv.DictReader？

– AMC
1月6日0:10

编程黑洞网