I learn of NumPy while seeking a more effective way to read a text file. Therefore, I decide to put some notes here for convenient access.
1. Input and Output
1.1 Load data from a text file
np.loadtxt
is used to load data from a simply formatted text file and return an array called ndarray
. Note that each row in the text file must have the same number of values (no data is missing).
import numpy as np
np.loadtxt(fname, # file or str, File, filename, or generator to read.
dtype=<type 'float'>, # data-type, data-type of the resulting array; default: float.
comments='#', # str or sequence, the characters or list of characters used to indicate the start of a comment.
delimiter=None, # str, by default, this is any whitespace.
converters=None, # dict, mapping column number to a function. E.g., if column 0 is a date string: converters = {0: datestr2num}.
skiprows=0, # int, skip the first skiprows lines
usecols=None, # sequence, identify which columns to read. E.g, `usecols = (1,4,5)` will extract the 2nd, 5th and 6th columns.
unpack=False, # If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...).
ndmin=0) # int, the returned array will have at least ndmin dimensions. The reLegal values: 0 (default), 1 or 2.
1.2 Save an array to a text file
np.savetxt
is used to save an array to a text file.
np.savetxt(fname, # filename or file handle, '.gz' is automatically saved in compressed gzip format.
X, # array_like, data to be saved to a text file.
fmt='%.18e', # str or sequence of strs, e.g. [‘%.3e + %.3ej’, ‘(%.15e%+.15ej)’] for 2 columns (a list of specifiers, one per column)
delimiter=' ', # str
newline='\n', # str, string or character separating lines.
header='', # str
footer='', # str
comments='# ') # str, string that will be prepended to the header and footer strings, to mark them as comments.
1.3 An example
Here is an example that shows how to read and write a CSV file with NumPy.
header = ['hopcount', 'r1', 'r2', 'r3', 'r4', 'r5']
fname = 'hopcount_created.csv'
# write to a file
np.savetxt(fname, table, fmt='%d', delimiter=',', header=','.join(header)) # the type of header is str
# read from a file
table = np.loadtxt(fname, dtype=int, delimiter=',') # header isn't read in
The contents of hopcount_created.csv
,
# hopcount,r1,r2,r3,r4,r5
1,4900,4834,4836,4860,4860
2,13894,13244,13254,13607,13724
3,30155,25789,25804,28619,29423
4,55506,42344,42289,51220,53589
5,77348,54515,55165,70919,74873
6,80973,58049,59442,75744,79389
7,68230,55699,57578,67883,68917
8,50773,51123,52449,54718,52662
9,37409,46793,47001,41331,38791
10,23385,38989,38149,27519,24708
11,13484,31091,28411,16723,14412
12,6541,21284,19205,8318,7020
13,2808,11936,10906,3664,3006
14,963,6051,6007,1223,995
15,268,2914,3444,289,268
16,46,1279,1750,46,46
17,0,516,692,0,0
18,0,189,257,0,0
19,0,44,44,0,0
2. genfromtxt
numpy.genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None)
3. Indexing
The basic slice syntax is start:stop:step
x[obj]
There are three kinds of indexing available: field access, basic slicing, advanced indexing. Which one occurs depends on obj.
Field access
import numpy as np
table = np.array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
# get the ith row
>>> table[2]
array([7, 8, 9])
# get the ith column
>>> table[:,2]
array([ 3, 6, 9, 12])
# get an element
>>> table[2][2]
9
# get a range of rows and columns
>>> table[2:4, 1:3] # take the 2-3 rows of the 1-2 columns
array([[ 8, 9],
[11, 12]])
# get a subarray with specific rows and columns
>>> table[[[1],[3]], [0,2]] # take the 1, 3 rows of the 0, 2 columns
array([[ 4, 6],
[10, 12]])
>>> table[[1,3], [[0],[2]]] # ?
array([[ 4, 10],
[ 6, 12]])
# Advance indexing
>>> table[[1,3], [0,2]] # the row index is [1, 3]; the column index specifies the element to choose for the corresponding row. Select the 0 column for the 1 row and the 2 column for the 3 row.
array([ 4, 12])
>>> table[[1,3], [0]]
array([ 4, 10])