Skip to content

Call me Ishmael. Dix is a utility for quantifying large amounts of plaintext data using a revolutionary metric: Moby-Dicks.

License

Notifications You must be signed in to change notification settings

datascopeanalytics/dix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dix

Call me Ishmael. Dix is a utility for quantifying large amounts of plaintext data using a revolutionary metric: Moby-Dicks.

Motivation

Have you ever found yourself analyzing text data and thinking, "Wow, this data is BIG. This is some BIG DATA"?

Of course you have. And if you're like us, you're frustrated with the current tools and metrics at your disposal. How do you quantify how big your data is? Bits and bytes and word counts just don't cut it in the fast moving Data Age.

It's time for a new standard. One that's timeless, yet fully capable of expressing bigness. That's why we created dix.

About

dix is a command line utility that quantifies the size of plaintext data in relation to Herman Melville's classic novel Moby-Dick; or, The Whale, first published in 1851 and considered to be one of the Great American Novels.

Moby Dick is a sizeable book.

Installation

*Prerequisites: Python 2.6+, wc (which is included on most nix OSs)

Run sudo pip install dix to install dix from PyPI (dix needs sudo access to set permissions so you can run it from anywhere).

More installation options coming soon.

Examples

dix is run from the command line on a plaintext file, as follows.

$> dix text.txt

You can also pipe things into dix if desired:

$> echo “for there is no folly of the beast of the earth...” | dix

dix also supports a multitude of options. For example, if you feel bad about the size of your data, choose a smaller unit of comparison:

$> dix --tiny text.txt

You can see all the options and how to use them by calling dix -h.

Advanced usage

You can also redirect the output of dix. For example, pipe dix to cowsay for a more pleasing visual experience:

$> curl -s 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvsection=0&titles=Moby-Dick&rvprop=content&format=json' | python -m json.tool | grep "*" | dix | cowsay

 ____________________________________ 
/ 0.0022 Moby-Dicks                  \
|                                    |
\ You call that BIG data?! Please... /
 ------------------------------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Contribution

We welcome issues and pull requests if you find problems with dix or want to enhance it! You can also reach its creators at [email protected].

About

Call me Ishmael. Dix is a utility for quantifying large amounts of plaintext data using a revolutionary metric: Moby-Dicks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages