A fast CSV command line toolkit written in Rust.

Related tags

rust cli csv command-line
Overview

xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable:

  1. Simple tasks should be easy.
  2. Performance trade offs should be exposed in the CLI interface.
  3. Composition should not come at the expense of performance.

This README contains information on how to install xsv, in addition to a quick tour of several commands.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Available commands

  • cat - Concatenate CSV files by row or by column.
  • count - Count the rows in a CSV file. (Instantaneous with an index.)
  • fixlengths - Force a CSV file to have same-length records by either padding or truncating them.
  • flatten - A flattened view of CSV records. Useful for viewing one record at a time. e.g., xsv slice -i 5 data.csv | xsv flatten.
  • fmt - Reformat CSV data with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.)
  • frequency - Build frequency tables of each column in CSV data. (Uses parallelism to go faster if an index is present.)
  • headers - Show the headers of CSV data. Or show the intersection of all headers between many CSV files.
  • index - Create an index for a CSV file. This is very quick and provides constant time indexing into the CSV file.
  • input - Read CSV data with exotic quoting/escaping rules.
  • join - Inner, outer and cross joins. Uses a simple hash index to make it fast.
  • partition - Partition CSV data based on a column value.
  • sample - Randomly draw rows from CSV data using reservoir sampling (i.e., use memory proportional to the size of the sample).
  • reverse - Reverse order of rows in CSV data.
  • search - Run a regex over CSV data. Applies the regex to each field individually and shows only matching rows.
  • select - Select or re-order columns from CSV data.
  • slice - Slice rows from any part of a CSV file. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
  • sort - Sort CSV data.
  • split - Split one CSV file into many CSV files of N chunks.
  • stats - Show basic types and statistics of each column in the CSV file. (i.e., mean, standard deviation, median, range, etc.)
  • table - Show aligned output of any CSV data using elastic tabstops.

A whirlwind tour

Let's say you're playing with some of the data from the Data Science Toolkit, which contains several CSV files. Maybe you're interested in the population counts of each city in the world. So grab the data and start examining it:

$ curl -LO https://burntsushi.net/stuff/worldcitiespop.csv
$ xsv headers worldcitiespop.csv
1   Country
2   City
3   AccentCity
4   Region
5   Population
6   Latitude
7   Longitude

The next thing you might want to do is get an overview of the kind of data that appears in each column. The stats command will do this for you:

$ xsv stats worldcitiespop.csv --everything | xsv table
field       type     min            max            min_length  max_length  mean          stddev         median     mode         cardinality
Country     Unicode  ad             zw             2           2                                                   cn           234
City        Unicode   bab el ahmar  Þykkvibaer     1           91                                                  san jose     2351892
AccentCity  Unicode   Bâb el Ahmar  ïn Bou Chella  1           91                                                  San Antonio  2375760
Region      Unicode  00             Z9             0           2                                        13         04           397
Population  Integer  7              31480498       0           8           47719.570634  302885.559204  10779                   28754
Latitude    Float    -54.933333     82.483333      1           12          27.188166     21.952614      32.497222  51.15        1038349
Longitude   Float    -179.983333    180            1           14          37.08886      63.22301       35.28      23.8         1167162

The xsv table command takes any CSV data and formats it into aligned columns using elastic tabstops. You'll notice that it even gets alignment right with respect to Unicode characters.

So, this command takes about 12 seconds to run on my machine, but we can speed it up by creating an index and re-running the command:

$ xsv index worldcitiespop.csv
$ xsv stats worldcitiespop.csv --everything | xsv table
...

Which cuts it down to about 8 seconds on my machine. (And creating the index takes less than 2 seconds.)

Notably, the same type of "statistics" command in another CSV command line toolkit takes about 2 minutes to produce similar statistics on the same data set.

Creating an index gives us more than just faster statistics gathering. It also makes slice operations extremely fast because only the sliced portion has to be parsed. For example, let's say you wanted to grab the last 10 records:

$ xsv count worldcitiespop.csv
3173958
$ xsv slice worldcitiespop.csv -s 3173948 | xsv table
Country  City               AccentCity         Region  Population  Latitude     Longitude
zw       zibalonkwe         Zibalonkwe         06                  -19.8333333  27.4666667
zw       zibunkululu        Zibunkululu        06                  -19.6666667  27.6166667
zw       ziga               Ziga               06                  -19.2166667  27.4833333
zw       zikamanas village  Zikamanas Village  00                  -18.2166667  27.95
zw       zimbabwe           Zimbabwe           07                  -20.2666667  30.9166667
zw       zimre park         Zimre Park         04                  -17.8661111  31.2136111
zw       ziyakamanas        Ziyakamanas        00                  -18.2166667  27.95
zw       zizalisari         Zizalisari         04                  -17.7588889  31.0105556
zw       zuzumba            Zuzumba            06                  -20.0333333  27.9333333
zw       zvishavane         Zvishavane         07      79876       -20.3333333  30.0333333

These commands are instantaneous because they run in time and memory proportional to the size of the slice (which means they will scale to arbitrarily large CSV data).

Switching gears a little bit, you might not always want to see every column in the CSV data. In this case, maybe we only care about the country, city and population. So let's take a look at 10 random rows:

$ xsv select Country,AccentCity,Population worldcitiespop.csv \
  | xsv sample 10 \
  | xsv table
Country  AccentCity       Population
cn       Guankoushang
za       Klipdrift
ma       Ouled Hammou
fr       Les Gravues
la       Ban Phadèng
de       Lüdenscheid      80045
qa       Umm ash Shubrum
bd       Panditgoan
us       Appleton
ua       Lukashenkivske

Whoops! It seems some cities don't have population counts. How pervasive is that?

$ xsv frequency worldcitiespop.csv --limit 5
field,value,count
Country,cn,238985
Country,ru,215938
Country,id,176546
Country,us,141989
Country,ir,123872
City,san jose,328
City,san antonio,320
City,santa rosa,296
City,santa cruz,282
City,san juan,255
AccentCity,San Antonio,317
AccentCity,Santa Rosa,296
AccentCity,Santa Cruz,281
AccentCity,San Juan,254
AccentCity,San Miguel,254
Region,04,159916
Region,02,142158
Region,07,126867
Region,03,122161
Region,05,118441
Population,(NULL),3125978
Population,2310,12
Population,3097,11
Population,983,11
Population,2684,11
Latitude,51.15,777
Latitude,51.083333,772
Latitude,50.933333,769
Latitude,51.116667,769
Latitude,51.133333,767
Longitude,23.8,484
Longitude,23.2,477
Longitude,23.05,476
Longitude,25.3,474
Longitude,23.1,459

(The xsv frequency command builds a frequency table for each column in the CSV data. This one only took 5 seconds.)

So it seems that most cities do not have a population count associated with them at all. No matter—we can adjust our previous command so that it only shows rows with a population count:

$ xsv search -s Population '[0-9]' worldcitiespop.csv \
  | xsv select Country,AccentCity,Population \
  | xsv sample 10 \
  | xsv table
Country  AccentCity       Population
es       Barañáin         22264
es       Puerto Real      36946
at       Moosburg         4602
hu       Hejobaba         1949
ru       Polyarnyye Zori  15092
gr       Kandíla          1245
is       Ólafsvík         992
hu       Decs             4210
bg       Sliven           94252
gb       Leatherhead      43544

Erk. Which country is at? No clue, but the Data Science Toolkit has a CSV file called countrynames.csv. Let's grab it and do a join so we can see which countries these are:

curl -LO https://gist.githubusercontent.com/anonymous/063cb470e56e64e98cf1/raw/98e2589b801f6ca3ff900b01a87fbb7452eb35c7/countrynames.csv
$ xsv headers countrynames.csv
1   Abbrev
2   Country
$ xsv join --no-case  Country sample.csv Abbrev countrynames.csv | xsv table
Country  AccentCity       Population  Abbrev  Country
es       Barañáin         22264       ES      Spain
es       Puerto Real      36946       ES      Spain
at       Moosburg         4602        AT      Austria
hu       Hejobaba         1949        HU      Hungary
ru       Polyarnyye Zori  15092       RU      Russian Federation | Russia
gr       Kandíla          1245        GR      Greece
is       Ólafsvík         992         IS      Iceland
hu       Decs             4210        HU      Hungary
bg       Sliven           94252       BG      Bulgaria
gb       Leatherhead      43544       GB      Great Britain | UK | England | Scotland | Wales | Northern Ireland | United Kingdom

Whoops, now we have two columns called Country and an Abbrev column that we no longer need. This is easy to fix by re-ordering columns with the xsv select command:

$ xsv join --no-case  Country sample.csv Abbrev countrynames.csv \
  | xsv select 'Country[1],AccentCity,Population' \
  | xsv table
Country                                                                              AccentCity       Population
Spain                                                                                Barañáin         22264
Spain                                                                                Puerto Real      36946
Austria                                                                              Moosburg         4602
Hungary                                                                              Hejobaba         1949
Russian Federation | Russia                                                          Polyarnyye Zori  15092
Greece                                                                               Kandíla          1245
Iceland                                                                              Ólafsvík         992
Hungary                                                                              Decs             4210
Bulgaria                                                                             Sliven           94252
Great Britain | UK | England | Scotland | Wales | Northern Ireland | United Kingdom  Leatherhead      43544

Perhaps we can do this with the original CSV data? Indeed we can—because joins in xsv are fast.

$ xsv join --no-case Abbrev countrynames.csv Country worldcitiespop.csv \
  | xsv select '!Abbrev,Country[1]' \
  > worldcitiespop_countrynames.csv
$ xsv sample 10 worldcitiespop_countrynames.csv | xsv table
Country                      City                   AccentCity             Region  Population  Latitude    Longitude
Sri Lanka                    miriswatte             Miriswatte             36                  7.2333333   79.9
Romania                      livezile               Livezile               26      1985        44.512222   22.863333
Indonesia                    tawainalu              Tawainalu              22                  -4.0225     121.9273
Russian Federation | Russia  otar                   Otar                   45                  56.975278   48.305278
France                       le breuil-bois robert  le Breuil-Bois Robert  A8                  48.945567   1.717026
France                       lissac                 Lissac                 B1                  45.103094   1.464927
Albania                      lumalasi               Lumalasi               46                  40.6586111  20.7363889
China                        motzushih              Motzushih              11                  27.65       111.966667
Russian Federation | Russia  svakino                Svakino                69                  55.60211    34.559785
Romania                      tirgu pancesti         Tirgu Pancesti         38                  46.216667   27.1

The !Abbrev,Country[1] syntax means, "remove the Abbrev column and remove the second occurrence of the Country column." Since we joined with countrynames.csv first, the first Country name (fully expanded) is now included in the CSV data.

This xsv join command takes about 7 seconds on my machine. The performance comes from constructing a very simple hash index of one of the CSV data files given. The join command does an inner join by default, but it also has left, right and full outer join support too.

Installation

Binaries for Windows, Linux and macOS are available from Github.

If you're a macOS Homebrew user, then you can install xsv from homebrew-core:

$ brew install xsv

If you're a macOS MacPorts user, then you can install xsv from the official ports:

$ sudo port install xsv

If you're a Nix/NixOS user, you can install xsv from nixpkgs:

$ nix-env -i xsv

Alternatively, you can compile from source by installing Cargo (Rust's package manager) and installing xsv using Cargo:

cargo install xsv

Compiling from this repository also works similarly:

git clone git://github.com/BurntSushi/xsv
cd xsv
cargo build --release

Compilation will probably take a few minutes depending on your machine. The binary will end up in ./target/release/xsv.

Benchmarks

I've compiled some very rough benchmarks of various xsv commands.

Motivation

Here are several valid criticisms of this project:

  1. You shouldn't be working with CSV data because CSV is a terrible format.
  2. If your data is gigabytes in size, then CSV is the wrong storage type.
  3. Various SQL databases provide all of the operations available in xsv with more sophisticated indexing support. And the performance is a zillion times better.

I'm sure there are more criticisms, but the impetus for this project was a 40GB CSV file that was handed to me. I was tasked with figuring out the shape of the data inside of it and coming up with a way to integrate it into our existing system. It was then that I realized that every single CSV tool I knew about was woefully inadequate. They were just too slow or didn't provide enough flexibility. (Another project I had comprised of a few dozen CSV files. They were smaller than 40GB, but they were each supposed to represent the same kind of data. But they all had different column and unintuitive column names. Useful CSV inspection tools were critical here—and they had to be reasonably fast.)

The key ingredients for helping me with my task were indexing, random sampling, searching, slicing and selecting columns. All of these things made dealing with 40GB of CSV data a bit more manageable (or dozens of CSV files).

Getting handed a large CSV file once was enough to launch me on this quest. From conversations I've had with others, CSV data files this large don't seem to be a rare event. Therefore, I believe there is room for a tool that has a hope of dealing with data that large.

Naming collision

This project is unrelated to another similar project with the same name: https://mj.ucw.cz/sw/xsv/

Issues
  • xsv partition subcommand

    xsv partition subcommand

    READY FOR MERGE (I hope).

    I've tried to follow the standard xsv coding style and to re-use existing support code where possible.

    But I want show my initial work and ask for any general feedback now.

    One interesting wrinkle that I noticed: The split command has a --output flag, but it ignores it. I'm thinking that perhaps instead of having a --filename TEMPLATE argument, that both split and partition should have an --output argument that defaults to {}.csvusing my new FilenameTemplate type. Would this be a reasonable approach?

    TODO

    • [x] Empty strings in partition column
    • [x] Create output directory if it does not exist
    • [x] Sanitize filenames to contain only shell-safe characters
    • [x] Collisions between sanitized field values
    • [x] Test files with no headers & partitioning based on column number
    • [x] Test --filename argument, including prefix
    • [x] Invalid --filename arguments, including no {} or two {}
    • [x] Modify both partition and split to use the same filename template system, possibly as --output instead of --filename?
    • More as I think of them
    opened by emk 25
  • thread '<main>' panicked at 'index out of bounds: the len is 0 but the index is 9', src/select.rs:352

    thread '
    ' panicked at 'index out of bounds: the len is 0 but the index is 9', src/select.rs:352

    I caught this panic while performing a join:

    xsv join --no-case url images.csv url machines.csv
    thread '<main>' panicked at 'index out of bounds: the len is 0 but the index is 9', src/select.rs:352
    Image,index,url,Year,Manufacturer,Model,Title,Serial,Stock,Pricing,Description,index,url
    

    I was able to get by it by flipping the two CSVs to be joined.

    opened by zacstewart 13
  • Example for input

    Example for input

    I can't find any examples of how this is used, and this repo is the only source documentation

    opened by andrewnguyen42 12
  • Workaround for rust 1.27.0 not compiling on macOS 10.10

    Workaround for rust 1.27.0 not compiling on macOS 10.10

    xsv 0.13.0 depends on Rust and requires the latest version (1.27.0 as of now). I've found that on my machine Rust fails to compile, probably because I'm running 10.10 Yosemite because my mac is ancient

    https://github.com/rust-lang/rust/issues/51838

    Luckily there is a workaround for this.

    In the general case where you want to install package X, package X depends on the latest version of package Y and for some reason (poverty, laziness) you can't use the latest version of package Y on your machine. Just do a /usr/local/Cellar/Y and sym link the last known good version to the latest version. This was inspired by https://stackoverflow.com/questions/19664535/how-can-i-prevent-homebrew-from-upgrading-vtk-dependency-for-pcl/19665408#19665408

    In this specific case

    cd /usr/local/Cellar/rust ln -s 1.24.1 1.27.0

    This tricks homebrew into thinking it already has 1.27.0 and thus won't download it, fail to compile and end up all fubar.

    question 
    opened by STA-WSYNC 9
  • Color xsv table output

    Color xsv table output

    It would be useful if the output of xsv table could be colored, to clearly differentiate between the headers and data.

    opened by beojan 8
  • Add homebrew-core installation instructions

    Add homebrew-core installation instructions

    xsv was added to homebrew-core in https://github.com/Homebrew/homebrew-core/pull/11427, so this change mentions that installation option in the readme. The wording was adapted from ripgrep's readme: https://github.com/BurntSushi/ripgrep/tree/685cc6c5622b02fd5a53c8bc953176b159c780e4#installation

    opened by josephfrazier 8
  • Compilation failure under rustc 1.0.0-beta.2

    Compilation failure under rustc 1.0.0-beta.2

    This could be me doing something wrong...

    $ cargo build --release
       Compiling byteorder v0.3.7
       Compiling threadpool v0.1.4
       Compiling streaming-stats v0.1.23
       Compiling regex v0.1.28
       Compiling rustc-serialize v0.3.12
       Compiling libc v0.1.6
       Compiling log v0.3.1
    /Users/luis.casillas/.cargo/registry/src/github.com-1ecc6299db9ec823/streaming-stats-0.1.23/src/lib.rs:1:1: 1:41 error: unstable feature
    /Users/luis.casillas/.cargo/registry/src/github.com-1ecc6299db9ec823/streaming-stats-0.1.23/src/lib.rs:1 #![feature(collections, core, std_misc)]
                                                                                                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    note: this feature may not be used in the beta release channel
    error: aborting due to previous error
    Build failed, waiting for other jobs to finish...
    Could not compile `streaming-stats`.
    
    $ git log | head -n 5
    commit c19cf0bf43d001ee0555d943eb6703c058152877
    Author: Andrew Gallant <[email protected]>
    Date:   Thu Apr 16 17:50:25 2015 -0400
    
        crates.io merit badge
    
    opened by ldcasillas-progreso 7
  • Q: Does xsv have an equivalent operation to csvkit's csvclean ?

    Q: Does xsv have an equivalent operation to csvkit's csvclean ?

    I need a fast filter to parse csv file lines and drop those that are unparsable, as with https://csvkit.readthedocs.io/en/1.0.2/scripts/csvclean.html. csvclean works in a shell pipe (PR 781) but is limited in speed.

    Does xsv have a similar method?

    opened by kmatt 7
  • Partition into files based on columns?

    Partition into files based on columns?

    Wow, xsv is cool!

    I was exploring Pachyderm for large-scale data ingestion tasks, and one use-case popped up fairly often. This is basically a map/reduce implementation using an immutable, versioned cluster file system, and worker jobs running inside of Docker containers. You read files from /pfs/$INPUT_NAME/* and write them to /pfs/out.

    Imagine an input file which looks like:

    value,count
    FL,11957
    CA,11816
    TX,10157
    IL,5633
    OH,5556
    GA,4535
    NY,4088
    MI,4008
    NJ,3890
    CO,3690
    

    I would love to be able to write something like:

    xsv partition value /pfs/out
    

    ...and create a file named /pfs/out/FL containing 11957, a file named /pfs/out/CA containing 11816, and so on. It would also be OK if the files contained the partitioning column: FL,11957 and CA,11816. If there are more than two columns, they should all be included. All rows with the same value column should be in the appropriate output file.

    Does xsv have support for anything like this? Would you like support for something like this? If not, no worries, it would be a trivial standalone tool using your csv library. But I thought I'd check whether you wanted it upstream first.

    enhancement 
    opened by emk 7
  • Please make xsv table command optionally output the delimiter

    Please make xsv table command optionally output the delimiter

    As per the subject. I'd like to keep the delimiter if possible

    opened by chrisbra 6
  • Support quartiles

    Support quartiles

    May close #87. Depends on BurntSushi/rust-stats/pull/15.

    This PL adds option xsv stats --quartiles. Using quartiles Q1 and Q3, we can roughly estimate outliers. IMHO it's a handy option for preliminary CSV inspection.

    opened by m15a 0
  • `NaN`s break `xsv sort -N`

    `NaN`s break `xsv sort -N`

    NaNs, empty or broken values seem to silently break the entire sorting.

    $ printf '3\n5\n2\n2.01\nNaN\n2\n0\n' | xsv sort -n -N
    2
    2.01
    3
    5
    NaN
    0
    2
    

    I expect them to aggregate somewhere before or after the proper numeric values and have subset of values which succeed to parse as numbers to be sorted properly. Somewhat like /usr/bin/sort -n.

    opened by vi 0
  • Exclude header option

    Exclude header option

    TLDR; A command option to exclude the header line for CSV files that use headers incorrectly.

    My apologies if this is a duplicate. I did search the issues but couldn't find anything. I work in an industry where CSV is the primary means for transferring data between industry participants. The irony being ALL the CSV files are invalid. The first line of the files don't describe the columns, they contain meta data. This causes almost all xsv commands to fail with "found record with 64 fields, but the previous record has 7 fields" errors. My proposal is to add a skip-header option. An enhancement to this could be to add dummy header labels possibly using the common spreadsheet method of using alpha labels. I know the fixlengths command offers a workaround, but it results in some not so pleasant output for other commands. I'm prepared to have a crack at this myself but first is there an appetite for adding this functionality?

    opened by hmaddocks 4
  • bash: xsv: command not found... in CentOS 7

    bash: xsv: command not found... in CentOS 7

    (phosa) [[email protected] scratch3]$ git clone git://github.com/BurntSushi/xsv
    Cloning into 'xsv'...
    remote: Enumerating objects: 23, done.
    remote: Counting objects: 100% (23/23), done.
    remote: Compressing objects: 100% (18/18), done.
    remote: Total 2706 (delta 3), reused 17 (delta 3), pack-reused 2683
    Receiving objects: 100% (2706/2706), 568.35 KiB | 0 bytes/s, done.
    Resolving deltas: 100% (1962/1962), done.
    (phosa) [[email protected] scratch3]$ cd xsv
    (phosa) [[email protected] xsv]$ cargo build --release
      Downloaded filetime v0.1.15
      Downloaded rand v0.5.5
      Downloaded crossbeam-channel v0.2.4
      Downloaded csv-index v0.1.5
      Downloaded serde_derive v1.0.75
      Downloaded streaming-stats v0.2.0
      Downloaded tabwriter v1.1.0
      Downloaded threadpool v1.7.1
      Downloaded byteorder v1.2.4
      Downloaded num_cpus v1.8.0
      Downloaded regex v1.0.3
      Downloaded docopt v1.0.1
      Downloaded csv v1.0.1
      Downloaded crossbeam-epoch v0.5.2
      Downloaded parking_lot v0.6.3
      Downloaded smallvec v0.6.5
      Downloaded quote v0.6.8
      Downloaded rand_core v0.2.1
      Downloaded libc v0.2.43
      Downloaded crossbeam-utils v0.5.0
      Downloaded num-traits v0.2.5
      Downloaded syn v0.14.9
      Downloaded proc-macro2 v0.4.13
      Downloaded unreachable v1.0.0
      Downloaded lazy_static v1.1.0
      Downloaded lock_api v0.1.3
      Downloaded parking_lot_core v0.2.14
      Downloaded regex-syntax v0.6.2
      Downloaded memchr v2.0.1
      Downloaded memoffset v0.2.1
      Downloaded arrayvec v0.4.7
      Downloaded csv-core v0.1.4
      Downloaded aho-corasick v0.6.6
      Downloaded utf8-ranges v1.0.0
      Downloaded scopeguard v0.3.3
      Downloaded owning_ref v0.3.3
      Downloaded void v1.0.2
      Downloaded rand v0.4.3
      Downloaded ucd-util v0.1.1
      Downloaded nodrop v0.1.12
      Downloaded version_check v0.1.4
      Downloaded stable_deref_trait v1.1.1
      Downloaded cfg-if v0.1.5
      Downloaded serde v1.0.75
       Compiling version_check v0.1.4
       Compiling proc-macro2 v0.4.13
       Compiling void v1.0.2
       Compiling unicode-xid v0.1.0
       Compiling libc v0.2.43
       Compiling stable_deref_trait v1.1.1
       Compiling serde v1.0.75
       Compiling nodrop v0.1.12
       Compiling ucd-util v0.1.1
       Compiling scopeguard v0.3.3
       Compiling num-traits v0.2.5
       Compiling regex v1.0.3
       Compiling utf8-ranges v1.0.0
       Compiling cfg-if v0.1.5
       Compiling rand_core v0.2.1
       Compiling memoffset v0.2.1
       Compiling crossbeam-utils v0.5.0
       Compiling unicode-width v0.1.5
       Compiling strsim v0.7.0
       Compiling byteorder v1.2.4
       Compiling owning_ref v0.3.3
       Compiling arrayvec v0.4.7
       Compiling unreachable v1.0.0
       Compiling regex-syntax v0.6.2
       Compiling rand v0.4.3
       Compiling memchr v2.0.1
       Compiling num_cpus v1.8.0
       Compiling rand v0.5.5
       Compiling filetime v0.1.15
       Compiling smallvec v0.6.5
       Compiling lazy_static v1.1.0
       Compiling tabwriter v1.1.0
       Compiling lock_api v0.1.3
       Compiling csv-core v0.1.4
       Compiling aho-corasick v0.6.6
       Compiling threadpool v1.7.1
       Compiling parking_lot_core v0.2.14
       Compiling parking_lot v0.6.3
       Compiling thread_local v0.3.6
       Compiling crossbeam-epoch v0.5.2
       Compiling crossbeam-channel v0.2.4
       Compiling streaming-stats v0.2.0
       Compiling quote v0.6.8
       Compiling syn v0.14.9
       Compiling csv v1.0.1
       Compiling serde_derive v1.0.75
       Compiling csv-index v0.1.5
       Compiling docopt v1.0.1
       Compiling xsv v0.13.0 (/scratch3/xsv)
        Finished release [optimized + debuginfo] target(s) in 8m 09s
    (phosa) [[email protected] scratch3]$ curl -LO https://burntsushi.net/stuff/worldcitiespop.csv
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  144M  100  144M    0     0  16.1M      0  0:00:08  0:00:08 --:--:-- 26.3M
    (phosa) [[email protected] scratch3]$ xsv headers worldcitiespop.csv
    bash: xsv: command not found...
    
    
    opened by monacv 4
  • Looking for maintainership

    Looking for maintainership

    As it seems @BurntSushi the original author of this software doesn’t have time to create new releases of this tool at the moment — which is perfectly fine and great that he openly communicates this!

    Since this is useful and popular software and there are many contributors to this project it would be sad to see it decline. As such this seems a great opportunity to spread the maintenence burden over more people.

    In order to do that we’ll need:

    • Someone (ideally more than one person) to step up as maintainer for this project. Maybe @Yomguithereal and/or @jqnatividad who are the maintainers of the currently most active forks?
    • @BurntSushi to hand over the necessary resources once a new team has been found. Are you willing to do that?

    Although I’m a developer myself I haven’t done anything with Rust yet, but I’m happy to help out with administrative tasks.

    Also please let me know if this steps on any one’s toes. That’s definitely not the intention!

    opened by torotil 3
  • Option to align decimal points in printed statistics?

    Option to align decimal points in printed statistics?

    In my program to summarize data, the decimal points of the printed summary statistics are aligned, for example

                     var       mean         sd        min        max
                       x1     0.4933     0.2770     0.0006     0.9995
                       x2     5.0344     2.8776     0.0022     9.9953
                       x3    49.5909    28.5406     0.0142    99.9723
    

    Is there a way to get this behavior using xsv stats?

    opened by Beliavsky 0
  • Raw value output from fmt

    Raw value output from fmt

    I have a CSV where one of the fields is actually JSON:

    foo,"[""bar"",""baz""]",bazinga
    

    (Before anyone tells me this is dumb: Yes, I agree. But it's what I have to work with, for various reasons.)

    I would like to extract just the JSON column for further processing as JSON, but there doesn't seem to be a way to convince xsv fmt to output the "raw" text value; it is always quoted for CSV purposes.

    I imagine it would look something like this, e.g. via an -r or --raw flag:

    $ echo 'foo,"[""bar"",""baz""]",bazinga' | xsv select 2 | xsv fmt -r
    ["bar","baz"]
    

    Or, as a hack, fmt --quote could accept an empty string I guess?

    (I know that such output would not be suitable for further processing by xsv, but that's kind of the point.)

    For prior art on this, see for instance the --raw-output / -r flag in jq.

    opened by amake 2
  • Feature Request: `teip` like feature

    Feature Request: `teip` like feature

    Teip is a command line utility to perform other commands only on a specific column of input data. Example(from teip docs):

    $ echo "100 200 300 400" | teip -f 3 sed 's/./@/g'
    

    Will perform sed only on the third column of input, the output will be:

    100 200 @@@ 400
    

    I wish there was a subcommand which did the same on csv columns, strip header, unescape strings, pass to subcommand, escape output again, put it back into csv.

    opened by ahrzb 0
  • Feature Request: Use modes for stats

    Feature Request: Use modes for stats

    The underlying rust-stats library support modes (https://github.com/BurntSushi/rust-stats/pull/9), returning multiple modes if more than one is found.

    Right now, it returns nothing when there are multiple modes (https://github.com/BurntSushi/rust-stats/issues/8).

    opened by jqnatividad 0
  • Feature Request: Add variance to stats

    Feature Request: Add variance to stats

    The underlying rust-stats library supports computing variance. It would be nice if variance is added to the standard "streaming" stats that xsv calculates.

    opened by jqnatividad 0
Releases(0.13.0)
Owner
Andrew Gallant
I love to code.
Andrew Gallant
A modern replacement for ‘ls’.

exa exa is a modern replacement for ls. README Sections: Options — Installation — Development exa is a modern replacement for the venerable file-listi

Benjamin Sago 13.5k Jun 6, 2021
cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.

cloc Count Lines of Code cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. Latest release: v1.9

null 11.7k Jun 7, 2021
Git repository summary on your terminal

A command-line Git information tool written in Rust Onefetch is a command-line Git information tool written in Rust that displays project information

Ossama Hjaji 3.1k Jun 6, 2021
Linux Kernel Manager and Activity Monitor 🐧💻

Linux Kernel Manager and Activity Monitor ?? ?? The kernel is the part of the operating system that facilitates interactions between hardware and soft

Orhun Parmaksız 1.2k Jun 1, 2021
A command-line benchmarking tool

hyperfine 中文 A command-line benchmarking tool. Demo: Benchmarking fd and find: Features Statistical analysis across multiple runs. Support for arbitra

David Peter 7.9k Jun 6, 2021
:cherry_blossom: A command-line fuzzy finder

fzf is a general-purpose command-line fuzzy finder. It's an interactive Unix filter for command-line that can be used with any list; files, command hi

Junegunn Choi 37.1k Jun 6, 2021
Terminal-based CPU stress and monitoring utility

The Stress Terminal UI: s-tui Stress-Terminal UI, s-tui, monitors CPU temperature, frequency, power and utilization in a graphical way from the termin

Alex Manuskin 2.6k Jun 3, 2021
An RSS/Atom feed reader for text terminals

Newsboat Newsboat is an RSS/Atom feed reader for the text console. It's an actively maintained fork of Newsbeuter. A feed reader pulls updates directl

Newsboat 1.6k Jun 6, 2021
Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.

Glances - An eye on your system Summary Glances is a cross-platform monitoring tool which aims to present a large amount of monitoring information thr

Nicolas Hennion 18.5k Jun 6, 2021
Command-line program to download videos from YouTube.com and other video sites

youtube-dl - download videos from youtube.com or other video platforms INSTALLATION DESCRIPTION OPTIONS CONFIGURATION OUTPUT TEMPLATE FORMAT SELECTION

youtube-dl 95.9k Jun 5, 2021
The next gen ls command

LSD (LSDeluxe) Table of Contents Description Screenshot Installation Configuration External Configurations Required Optional F.A.Q. Contributors Credi

Pierre Peltier 5.8k Jun 6, 2021
A smarter cd command for your terminal

zoxide A smarter cd command for your terminal zoxide is a blazing fast replacement for your cd command, inspired by z and z.lua. It keeps track of the

Ajeet D'Souza 2.7k Jun 6, 2021
Magnificent app which corrects your previous console command.

The Fuck The Fuck is a magnificent app, inspired by a @liamosaur tweet, that corrects errors in previous console commands. Is The Fuck too slow? Try t

Vladimir Iakovlev 62.3k Jun 5, 2021
:bookmark: Browser-independent bookmark manager

buku buku in action! Introduction buku is a powerful bookmark manager written in Python3 and SQLite3. When I started writing it, I couldn't find a fle

Piña Colada 4.5k Jun 6, 2021