I learn of NumPy while seeking a more effective way to read a text file. Therefore, I decide to put some notes here for convenient access.
1. Input and Output
1.1 Load data from a text file
np.loadtxt is used to load data from a simply formatted text file and return an array called ndarray. Note that each row in the text file must have the same number of values (no data is missing).
import numpy as np
np.loadtxt(fname, # file or str, File, filename, or generator to read.
dtype=<type 'float'>, # data-type, data-type of the resulting array; default: float.
comments='#', # str or sequence, the characters or list of characters used to indicate the start of a comment.
delimiter=None, # str, by default, this is any whitespace.
converters=None, # dict, mapping column number to a function. E.g., if column 0 is a date string: converters = {0: datestr2num}.
skiprows=0, # int, skip the first skiprows lines
usecols=None, # sequence, identify which columns to read. E.g, `usecols = (1,4,5)` will extract the 2nd, 5th and 6th columns.
unpack=False, # If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...).
ndmin=0) # int, the returned array will have at least ndmin dimensions. The reLegal values: 0 (default), 1 or 2.
1.2 Save an array to a text file
np.savetxt is used to save an array to a text file.
np.savetxt(fname, # filename or file handle, '.gz' is automatically saved in compressed gzip format.
X, # array_like, data to be saved to a text file.
fmt='%.18e', # str or sequence of strs, e.g. [‘%.3e + %.3ej’, ‘(%.15e%+.15ej)’] for 2 columns (a list of specifiers, one per column)
delimiter=' ', # str
newline='\n', # str, string or character separating lines.
header='', # str
footer='', # str
comments='# ') # str, string that will be prepended to the header and footer strings, to mark them as comments.
1.3 An example
Here is an example that shows how to read and write a CSV file with NumPy.
header = ['hopcount', 'r1', 'r2', 'r3', 'r4', 'r5']
fname = 'hopcount_created.csv'
# write to a file
np.savetxt(fname, table, fmt='%d', delimiter=',', header=','.join(header)) # the type of header is str
# read from a file
table = np.loadtxt(fname, dtype=int, delimiter=',') # header isn't read in
The contents of hopcount_created.csv,
# hopcount,r1,r2,r3,r4,r5
1,4900,4834,4836,4860,4860
2,13894,13244,13254,13607,13724
3,30155,25789,25804,28619,29423
4,55506,42344,42289,51220,53589
5,77348,54515,55165,70919,74873
6,80973,58049,59442,75744,79389
7,68230,55699,57578,67883,68917
8,50773,51123,52449,54718,52662
9,37409,46793,47001,41331,38791
10,23385,38989,38149,27519,24708
11,13484,31091,28411,16723,14412
12,6541,21284,19205,8318,7020
13,2808,11936,10906,3664,3006
14,963,6051,6007,1223,995
15,268,2914,3444,289,268
16,46,1279,1750,46,46
17,0,516,692,0,0
18,0,189,257,0,0
19,0,44,44,0,0
2. genfromtxt
numpy.genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None)
3. Indexing
The basic slice syntax is start:stop:step
x[obj]
There are three kinds of indexing available: field access, basic slicing, advanced indexing. Which one occurs depends on obj.
Field access
import numpy as np
table = np.array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
# get the ith row
>>> table[2]
array([7, 8, 9])
# get the ith column
>>> table[:,2]
array([ 3, 6, 9, 12])
# get an element
>>> table[2][2]
9
# get a range of rows and columns
>>> table[2:4, 1:3] # take the 2-3 rows of the 1-2 columns
array([[ 8, 9],
[11, 12]])
# get a subarray with specific rows and columns
>>> table[[[1],[3]], [0,2]] # take the 1, 3 rows of the 0, 2 columns
array([[ 4, 6],
[10, 12]])
>>> table[[1,3], [[0],[2]]] # ?
array([[ 4, 10],
[ 6, 12]])
# Advance indexing
>>> table[[1,3], [0,2]] # the row index is [1, 3]; the column index specifies the element to choose for the corresponding row. Select the 0 column for the 1 row and the 2 column for the 3 row.
array([ 4, 12])
>>> table[[1,3], [0]]
array([ 4, 10])