Skip to content
forked from LiveRamp/mockrdd

A Python3 module for testing PySpark code

License

Notifications You must be signed in to change notification settings

RajaShyam/mockrdd

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mockrdd

A Python3 module for testing PySpark code.

Build Status

The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the following extra benefits.

  • Extensive sanity checks to identify invalid inputs
  • More meaningful error messages for debugging issues
  • Straightforward to running within pdb
  • Removes Spark dependencies from development and testing environments
  • No Spark overhead when running through a large test suite

See our blog post Introducing MockRDD for testing PySpark code for additional details.

Here's a simple example of using MockRDD in a test.

from mockrdd import MockRDD

def job(rdd):
    return rdd.map(lambda x: x*2).filter(lambda x: x>3)
   
assert job(MockRDD.empty()).collect() == [] 
assert job(MockRDD.of(1)).collect() == [] 
assert job(MockRDD.of(2)).collect() == [4] 

Conventionally, you'd include a main method to create an RDD hooked up to appropriate sources and sinks. Further, the testing would be included in a separate file and use the module unittest for defining test cases.

See the docstring of mockrdd.MockRDD for more information.

About

A Python3 module for testing PySpark code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%