Generate Millions of Data in Seconds – Faker Module in Python

Introduction

Generating large volume of data for various purpose is often a hectic job and consumes a lot of time. Especially in scenarios like testing your application with dummy data,filling database tables, running machine learning algorithms if you do not have proper dataset, performance testing of applications, etc. Here we are going to see an effective technique to generate a huge amount of data in seconds with a python library called ‘Faker’.

Installation of faker Module

Faker module can be installed using pip:

pip install Faker

or

In anaconda environment, you can install using:

 conda install -c conda-forge faker 

Once we have successfully installed the faker module. You have to import this into the python code.

Working with Faker

Generating Names

Use faker.Faker() to initialize the fake data generator and there after you can generate data by providing the datatype you want and the corresponding property.

Generating Names using Python Faker
Generating Names using Faker

Each time when we run the loop, a name is getting generated. In the same way we can generate male, female names, first name, last name separately as below.

 from faker import Faker
 faker = Faker()
 print(f'Name: {faker.name()}')
 print(f'First name: {faker.first_name()}')
 print(f'Last name: {faker.last_name()}')
 print(f'Male name: {faker.name_male()}')
 print(f'Female name: {faker.name_female()}')

This produces the below output.

Name: Ms. Kimberly Nguyen DVM 
First name: Veronica 
Last name: Walker 
Male name: Eric Gonzalez 
Female name: Pamela Rhodes 

Generate Personal Data

Just like Name, personal information like email,address,job, etc can also be generated

 fake=Faker()
 print(f'email: {faker.email()}')
 print(f'Job: {faker.job()}')
 print(f'Phone Number: {faker.phone_number()}')
 print(f'Address: {faker.address()}')

This produces the below output.

email: qsilva@yahoo.com
Job: Trading standards officer
Phone Number: 4344803335
Address: 7145 Stevens Dam Ericside, DC 11673 

Generate Complete User Profile

Faker can generate simple and extended user profiles using simple_profile() and profile() methods with a single line of code. The simple_profile and profile() returns the data as a dictionary, hence you can access the data using the key and values. Further more, if you require gender specific profiles, you can pass ‘M’ or ‘F’ as the argument for Male and Female profiles respectively.

 from faker import Faker
 faker = Faker()
 print("Generating Simple Profile")
 sprof = faker.simple_profile()
 for k, v in sprof.items():
     print(k,":", v)
 print("-------------------\n")
 print("Generating Extended Profile")
 eprof = faker.profile()
 for k, v in eprof.items():
     print(k,":", v)
 print("-------------------\n")
 print("Generating Simple Profile for Male Candidate")
 mprof = faker.simple_profile('M')
 for k, v in mprof.items():
     print(k,":", v)

This produces the below output.

 Generating Simple Profile
 username : james12 
 name : Gary Phillips
 sex : M 
 address : 36883 Richard Points Suite 768 Ortizchester,MI  05275 
 mail : deborahsalinas@yahoo.com 
 birthdate : 1998-09-29 
 ------------------- 

 Generating Extended Profile
 job : Immunologist
 company : Barnes-Aguirre
 ssn : 230-11-9680
 residence : 3115 Watts Fields Port Alexanderton,NJ 90199                  
 blood_group : AB+ 
 website : ['http://nelson.org/']
 username : blairmatthew 
 name : Daniel Palmer 
 sex : M 
 address : Unit 1106 Box 9188 DPO AE 96606 
 mail : dominique96@hotmail.com 
 birthdate : 1984-02-23 
------------------- 

 Generating Simple Profile for Male Candidate
 username : hernandezjocelyn
 name : Dustin Joyce
 sex : M
 address : 62277 Graham Trace Suite 755 Foleyville, OH 18360 
 mail :jeremyhawkins@hotmail.com 
 birthdate : 1930-12-18 

Localization

Faker can generate localized data by taking locale as an argument. Alsi, If no argument is provided, it takes en_US locale as default. It supports multiple languages like French, Spanish, Japanese, Arabic, German, Hindi..etc.

Localization with Python Faker
Generating Localized Data in Hindi

Generate Numbers, Date and Time

Faker can generate integer using random_int() and digit using random_digit() methods. It is optional to pass the range of values as argument to restrict the numbers in certain range. A dummy date and time within current month can be obtained using date_time_this_month() . Faker module also allows to access dates in current year, century,..etc. In the below code snippet, we have also used faker to produce a weekly date (Time series data).

faker = Faker()
print(f'Random int: {faker.random_int(0, 100)}')
print(f'Random digit: {faker.random_digit()}')
print(f'Date this month: {faker.date_this_month()}')
print(f'Datetime this month: {faker.date_time_this_month()}')
print(f'Datetime this year:{faker.date_time_this_year()}')
print("------------------------------\n")

print ("Generate Time series - Weekly Data")
interval = 60*60*24*7 # Every Week
time_series = faker.time_series(start_date='-30d', end_date='now', precision=interval)
for week in time_series:
    print(week[0])

This produces the below output.

Random int: 66 
Random digit: 3
Date this month: 2019-12-02 
Datetime this month: 2019-12-02 22:49:24 
Datetime this year: 2019-03-27 13:47:27 
------------------------------ 

Generate Time series - Daily Data 
2019-11-03 00:26:43
2019-11-10 00:26:43 
2019-11-17 00:26:43 
2019-11-24 00:26:43 
2019-12-01 00:26:43 

Generate Network and IP Data

Faker can generate data related to the networks and IP as well. Various attributes like host hostname, domain name, Ipv4,Ipv6,MAC addresses can also be generated easily using corresponding methods. Likewsie, it also contains methods for generating hash values such as md5,sha1,sha256,..etc which are quite handy in cryptographic applications.

 faker = Faker()
 print(f'Host name: {faker.hostname()}')
 print(f'Domain name: {faker.domain_name()}')
 print(f'IPv4: {faker.ipv4()}')
 print(f'IPv6: {faker.ipv6()}')
 print(f'MAC address: {faker.mac_address()}')
 print(f'md5: {faker.md5()}')
 print(f'sha1: {faker.sha1()}')
 print(f'sha256: {faker.sha256()}')

This produces the below output.

 Host name: lt-01.henderson.biz
 Domain name: duarte.com 
 IPv4: 207.176.251.64 
 IPv6: 89b9:3673:df14:7fc8:3cdb:f35e:8498:3c91 
 MAC address: ed:0b:d9:9d:84:fc 
 md5: 4c76255c5a084e94eb05c40915c9a052 
 sha1: a4479b2558d9f0172506d18225885c55fa6ea3be 
 sha256: 85193bfb658cac5547c66935651d69d2ccbba6beb381a3
 968974fd65f0a55cd9 

Generating Sentences from a List

You can generate random fake sentences using faker_object.sentence() and this creates a random bunch of words to form a sentence. Faker allows user to decide the words that needs to be included in the sentence by passing that as an argument to the sentence method.

 from faker import Faker
 faker = Faker()
 print(faker.sentence()) 
 word_list = ["word1","word2","word3"]
 for i in range(0, 3): 
     print(fake.sentence(ext_word_list = word_list)) 

This produces the below output. Remember that the sentence generated will be purely random.

Amount strategy say control pass price.
Word2 word1 word1 word1 word3 word1 word3. 
Word2 word2 word2 word3 word1. 
Word1 word3 word3 word3 word2 word3 word2. 

Conclusion

In this tutorial, we have learned how to use Faker module to generate fake data in Python. You can run the above codes in iterative way and store it to files or database and use the data for testing your applications. Hope this tutorial was helpful and informative. Happy Learning!!!

Leave a Reply

Your email address will not be published. Required fields are marked *