Quality assurance¶
Quality assurance and data cleaning.
- permute.qa.find_consecutive_duplicate_rows(x, as_string=False)[source]¶
Find rows which are duplicated in x
- permute.qa.find_duplicate_rows(x, as_string=False)[source]¶
Find rows which are duplicated in x
Notes
If you load a file, for example nsgk.csv, as a 2D array, say x, then if you found ‘16,20,2,8’ in the list returned by
find_duplicate_rows(x, as_string=True)
you might do something like:$ grep -n --context=1 '16,20,2,8' nsgk.csv 12512-16,15,2,8 12513:16,20,2,8 12514-16,45,2,8 -- 12532-17,17,2,8 12533:16,20,2,8 12534-17,24,2,8
http://stackoverflow.com/questions/8560440/removing-duplicate-columns-and-rows-from-a-numpy-2d-array