For full screen view & to download, visit
HTML
: Click hereGITHUB
: Click hereTo go back to ML website, Click here
We have to learn some basic programming concepts before doing anything in Maching Learning or Data Science or any Projects.
This notebook is created on the presumption that you have some basic grounding on programming world, with some idea on variables & data types, arrays, flow conditions - if | else | elif | while | for | break | continue, importing libraries & functions, Functions - creating functions | passing arguments | returning values
.
TIP: Visit Python Tutor Visualizer to visualise the code and understand what happens in the behind the screen when each line of Python code runs.
2+2
Commonly used math operators
Order of Precedence
The order of operations (also called precedence) of Python math operators is similar to that of mathematics. The * operator is evaluated first; the , /, //, and % operators are evaluated next, from left to right; and the + and - operators are evaluated last (also from left to right)
2+3*6
(2 + 3) * 6
2**5
22/7
22//7
22 % 7
Common Data Types
int
0,50,-100
float
1.25, 3.14, -1.0, --0.5, 0.0
str
'a', 'aa', 'aaa', 'Hello!', '11 cats'
bool
True, False
String can be enclosed in single or double quotes
The meaning of an operator (e.g. +
) may change based on the data types of the values next to it
# Addition for int or float
1 + 2
# Concatenation for string data type
'Bruce' + 'Wayne'
If you try to use the + operator on a string and an integer value, Python will not know how to handle this, and it will display an error message
.'Bruce' + 42
The *
operator when used on one string value and one integer value becomes string replication operator.
2*5
'Batman'*5
Varaibles
Variable Names are case sensitive.
We can name a variable anything as long as it obeys the following three rules:
_
character.e.g. of invalid variable names:
current-balance, current balance 4account, total_$um, 'batman'
Hello world
- First Python program¶# This program says hello and asks for my name.
print('Hello world!')
print('What is your name?') # ask for their name
myName = input()
print('It is good to meet you, ' + myName)
print('The length of your name is:')
print(len(myName))
print('What is your age?') # ask for their age
myAge = input()
print (type(myAge))
print('You will be ' + str(int(myAge) + 1) + ' in a year.')
The following line is called a comment. Python ignores comments, and you can use them to write notes or remind yourself what the code is trying to do. Any text for the rest of the line following a hash mark #
is part of a comment.
# This program says hello and asks for my name.
To comment across multiple lies, we can use triple quotes (could be single '''
or double """
)
'''This line is part of multine comment
This line is also part of multiline comment
Again, this line as well is a multiline comment'''
print ('Hello')
print()
function¶The print()
function displays the string
value inside the parentheses on the screen. Notice that the quotes are not printed to the screen. They just mark where the string begins and ends; they are not part of the string value.
print('Hello world!')
print('What is your name?') # ask for their name
input()
function¶The input()
function waits for the user to type some text on the keyboard and press ENTER key.
The user input will then be stored in myName
variable
myName = input()
The expression 'It is good to meet you, ' + myName
is evaluated and is print
ed on the screen.
print('It is good to meet you, ' + myName)
len()
function¶The len()
function returns the number of characters in the given string.
print('The length of your name is:')
print(len(myName))
The type()
function returns the data type of the value given
print('What is your age?') # ask for their age
myAge = input()
print (type(myAge)) # the input functions usually returns a string data type
str(), int(), float()
functions¶These functions are used to change the data types of the values.
As the input()
function returns str
data, myAge
variable is of str
data type.
To add a numerical value to it, the value should be converted to int
.
After performing the addition, the int
value is converted back to str
again so that it can be concatenated with other strings.
print('You will be ' + str(int(myAge) + 1) + ' in a year.')
and,or,not
and <,>,<=,>=,==,!=
¶True, False
and,or,not
True
or False
<, >, <=, >=, ==, !=
True
or False
5 > 6
5 <= 6, 3>5, 3!=5, 3 == 5
5<6 and 3>5, 5<6 or 3>5
not (True), not(False)
if,else,elif
¶if
¶if
condition evalues to True
, the block of statement indented inside will get executed. Tab
. The block can be deindented by Shift+Tab
Note: There should be atleast one line of indented code after if
condition
age = 5
if age < 10: # age < 10 evaluates to True
print ('age is less than 10') # if True, this code will get executed
age = 15
if age < 10: # age < 10 evaluates to False
print ('age is less than 10') # Doesn't get printed as age variable us more than 10
else
¶if
condition evalues to True
, the block of statement indented inside if
will get executed.False
, the block of statement indented inside else
will get executed.age = 18
if age < 18:
print ('You are not an adult')
else:
print ('You are an adult')
elif
¶if
condition evalues to True
, the block of statement indented inside if
will get executed.elif
condition will be evaluated and if it turns out to be True
, the block of statement indented inside elif
will get executed.if
and elif
evaluates to False
, the block of statement indented inside else
will get executed.age = 10
if age > 18:
print ('You are an adult')
elif age > 12:
print ('You are a teenager')
else:
print ('You are neither adult nor teen. Just a kid!')
We can have multiple elif
statements after an if
statement. In the entire combo, if-elseif-else
combo, only one of the condition will be exeucted. The order of execution is from top to bottom.
age = 15
if age >60 :
print ('You are very old')
elif age>18:
print ('You are an adult')
elif age > 12:
print ('You are a teenager')
else:
print ('You are neither adult nor teen. Just a kid!')
if
¶We can have if-else
inside another if-else
and we can keep nesting them. The deeper the nest gets, the harder it will be to debug and trace.
user_name = input('Enter user name: ') # james bond
pass_word = input('Enter pass word: ') # 007
if user_name == 'james bond':
if pass_word == '007':
print ('Access granted')
else:
print ('Access denied. Wrong password')
else:
print ('Access denied. Wrong username')
The above code could be re-witten as follows using boolean
and comparison operators
making it easy to read and follow
user_name = input('Enter user name: ') # james bond
pass_word = input('Enter pass word: ') # 007
if user_name == 'james bond' and pass_word == '007':
print ('Access granted')
else:
print ('Access denied. Wrong login details')
while
loop¶while
loop keeps on running as long as the condition evaluates to True
i = 10
print ('Loop start')
while i>0: # while loop evaluates to true as long as i>0
print (i)
i = i - 1 # everytime, the loop runs, value of `i` decreases by 1
print ('Loop end') # This line of code is outside the loop. Will run only when the while loop ends
What happens if you run while False:
and while True:
?
for
loop runs for a fixed number of iterations.range(10)
returns a list of numbers from 0 to 9.for
loop runs, the iterator variable i
takes the next available value from the list.for i in range(10):
print (i)
range()
function in for
loop, we can use many other data types.list
type containing list of 4 names and used for
loop to read through all names in list, one at a time .Note: We will see more about list data types later.
for name in ['Batman', 'Superman', 'Ironman', 'Thanos']:
print (name)
string
data typeNote: We will see more about string data types later.
for letter in 'Bruce Wayne':
print (letter)
break
¶break
is used to exit the loop (while
or for
) the program is currently evaluating prematurely based on certain conditions.
print ('Loop start')
for letter in 'Bruce Wayne':
print (letter)
if letter == 'e':
print ('Break the loop')
break # this command breaks the for loop (takes the program outside the loop)
print ('Loop end')
continue
¶continue
is used to skip the execution of code inside the loop (for
or while
) and move on to next iteration on certain conditions.The program below skips printing the vowel letters and prints the rest of the consonants.
print ('Loop start')
for letter in 'Bruce Wayne':
if letter == 'a' or letter =='e' or letter =='i' or letter =='o' or letter =='u':
continue # this command skips lines inside the for loop below this line and starts the next iteration
print (letter)
print ('Loop end')
in
¶in
operator can be used to check whether a value is inside another and it evaluates to bool
(True
,False
)Note: We will see more about in
, str
& list
data types later.
'a' in 'James'
'b' in 'James', 'Ja' in 'James', 'James' in 'James', 'JameS' in 'James'
print ('Loop start')
for letter in 'Bruce Wayne':
if letter in 'aeiou':
continue # this command skips lines inside the for loop below this line and starts the next iteration
print (letter)
print ('Loop end')
import
¶built-in functions
print(), inpu()t, len(), str(), int(), float()
.standard library
math
module contains mathematic related functions.random
module containsrandom number related functions. Functions are often used as "Black Boxes". Often, all we need to know is its inputs (parameter values), outputs and its generic function. We don't have to burden to understand each and every function and the code inside it.
random
library
import random # Before using a function in standard library, we have to use import _______
print (random.randint(1,10)) # Randomly generates an integer from 1 to 10
print (random.randint(1,10)) # Randomly generates an integer from 1 to 10
math
library
import math
math.sqrt(2)
math.pi
def
¶function
is like mini program inside a program.arguments
and return
¶# create a function named 'welcome'. It neither take any input and nor returns any output
def welcome():
print ('hi')
print ('mi casa es tu casa')
# invoking/ calling a function
welcome()
welcome()
welcome()
arguments
¶def hello(name):
print ('Hello ' + name + '!')
print ('Bienvenido a casa')
hello('Scare Crow')
return
¶len()
- returns the lenght of the argument passed insidelen('Python')
>>> 5def greet(lan):
if lan == 'english':
return 'Welcome'
elif lan == 'spanish':
return 'Bienvenido'
elif lan == 'french':
return 'Bonjour'
else:
return 'Welcome'
greet('english')
greet('french')
Local scope - Variables inside the function exist in local scope
Global scope - Variables that are assigned outside all functions exist in global scope
A variable must be either in local or global scope, cannot be both.
Note: We will see more about Local and Global scope
later.
Lists
¶We have already seen bool
, int
, float
, str
built-in data types.list
is my personal favortie data type in Python . Two other cool built-in data types that we will explore in further sections are tuples
and dict
.
list
will be very helpful to handle large amount of data and hierarchical structures.str
and tuple
have lot of list
like features which will also be in further sections below.list
can contain another list
and many other different data types inside it.[
]
list
are called as items
animals = ['cat', 'bat', 'rat', 'cow', 'dog']
animals
Getting individual values in a List with indices
The index no can only be integers
To ponder : What happens when you try to access an element with index number that doesn't exist in the list? 🤔
animals[0]
animals[1]
animals[1.0]
animals[-1] # Go in reverse direction
animals = ['cat', 'bat', 'rat', 'cow', 'dog']
animals
animals[1] = 'horse' # Override the item in index 1 of list
print (animals)
num = [[1,2,3,4,5],[6,7,8,9,10]]
num[0]
num[0][0]
num[1][0]
Syntax: list_name[start:stop(exclusive):jump]
Default values
start = 0
stop = last item in list
jump = 1
animals = ['cat', 'bat', 'rat', 'cow', 'dog', 'rat']
animals[0:2] # Returns list items from index = 0 to index = 2-1
animals[0:5:2]
animals[0:5:1]
animals[::]
animals[::-1] # Traverse in reverse direction
for
loop with list¶animals = ['cat', 'bat', 'rat', 'cow', 'dog', 'rat']
for animal in animals:
print (animal)
len()
¶animals = ['cat', 'bat', 'rat', 'cow', 'dog', 'rat']
len(animals)
empty list
We can create empty list using []
notation or list()
function.
# empty list
animals = [] # or try # list()
print ('Data type =', type(animals))
print ('No of elements =', len(animals))
animals = ['cat', 'rat', 'cow', 'dog', 'rat']
birds = ['crow', 'pigeon', 'sparrow']
animals + birds # Joins the 2 list and returns a new list
del
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
del animals[2] # delete the list item at index 2
animals
in
& not in
operators¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
print ('cat'in animals)
print ('crow' not in animals)
list
methods¶append()
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
animals.append('Gorilla')
print (animals)
insert()
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
animals.insert(2, 'Gorilla')
print (animals)
remove()
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
animals.remove('rat')
print (animals)
index()
¶index()
method returns its smallest/first position.animals = ['cat', 'rat', 'cow', 'dog', 'rat']
print (animals.index('rat'))
reverse()
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
animals.reverse()
print (animals)
sort()
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
animals.sort()
print (animals)
animals.sort(reverse=True)
print (animals)
count()
¶animals = ['cat', 'rat', 'cow', 'dog', 'rat']
print (animals.count('rat'))
For more details on list, visit the Python documentation or Python List Methods.
Tuples
¶tuple
data type¶The tuple
data type is identicial to list
data type, except in 2 ways
(
and )
whereas lists are identified by sqaure brackets [
and ]
If you have only one value in the tuple, indicate that by adding a trailing comma after the value inside
('hello',)
is a tuple with only one valuetuple_1 = ('easter_eggs', 1, 3.14)
print (tuple_1)
type(tuple)
tuple_1[0]
tuple_1[2]
tuple_1[2] = 'Cant change the tuple item value as they are immutable'
Convert list to tuple
animals_list = ['cat', 'rat', 'cow', 'dog', 'rat']
animals_tuple = tuple(animals_list)
animals_tuple
type(animals_list), type(animals_tuple)
Convert tuple to list
animals = ('cat', 'rat', 'cow', 'dog', 'rat')
print (list(animals))
Tuples are faster than list
import timeit
print ('List execution time :', timeit.timeit('[1,2,3,1,7,1,2,3,1,7]', number=100000))
print ('Tuple execution time:',timeit.timeit('(1,2,3,1,7,1,2,3,1,7)', number=100000))
for
loop with tuple
¶animals = ('cat', 'rat', 'cow', 'dog', 'rat')
for animal in animals:
print (animal)
len ()
¶animals = ('cat', 'rat', 'cow', 'dog', 'rat')
print (len(animals))
empty tuple
We can create an empty tuple using ()
notation or tuple()
function.
# empty tuple
animals = () # or try # tuple()
print ('Data type =', type(animals))
print ('No of elements =', len(animals))
in
& not
in operators¶animals = ('cat', 'rat', 'cow', 'dog', 'rat')
print ('cat' not in animals)
tuple
¶a = 1
b = 2
a,b = b,a # swap a,b values using tuple assignemnt
print ('a = ' + str(a))
print ('b = ' + str(b))
name,height, weight = ('Christian Bale', 1.86, 90)
print (name)
print (height)
print (weight)
Dictionaries
¶We have already seen bool
, int
, float
, list
, tuple
built-in data types. dict
(dictionary) is my second favortie data type in Python, next to list
.
dict
provides flexible way to access and organize data.{
and }
with key:value
format inside, whereas [
and ]
and (
and )
animal_sounds = {'cat': 'meow',
'dog': 'lol',
'cow': 'moo',
'ducks': 'quack'
}
print (animal_sounds)
animal_sounds['cow']
To ponder : What happens when you try to access an element with key that doesn't exist in tuple? 🤔
list
vs dict
¶list
data type, we access its elements using the index number that starts from 0
.dict
data type, we access the items/values
using keys
.keys
.key-value
pair.Unlike list
,
keys
can be of many different data types. items
are unordered.Though dictionaries are not ordered, its flexibility allows to orgranize data in powerful ways.
keys(), values(), items()
¶These 3 dictionary methods will return list like values which can be used in loops.
animal_sounds = {'cat': 'meow',
'dog': 'lol',
'cow': 'moo',
'ducks': 'quack'
}
print (animal_sounds)
animal_sounds.keys() # returns a list of keys
animal_sounds.values() # returns a list of values
animal_sounds.items() # returns a list of key-value pairs
for key in animal_sounds.keys():
print (key, ',', animal_sounds[key])
for key,value in animal_sounds.items():
print (key, ',', value)
len()
¶animal_sounds = {'cat': 'meow',
'dog': 'lol',
'cow': 'moo',
'ducks': 'quack'
}
print (len(animal_sounds))
empty dictionary
We can create empty dictionary using {}
notation or dict()
function.
animal_sounds = {} # or try # dict()
print ('Data type =', type(animal_sounds))
print ('No of elements =', len(animal_sounds))
in
& not
in operators¶data = {'name':'Christian Bale',
'height': 1.86,
'weight': 90}
print (data)
90 in data.keys(), 90 in data.values()
get()
¶get()
method takes 2 argumentskey
of the value to retrievefallback value
to return if key doesn't existdata = {'name':'Christian Bale',
'height': 1.86,
'weight': 90}
print (data)
data['age'] # without get() method, there's no fallback value
data.get('age', 30) # As the dictionary doesn't have age key, it returns the fall back value provided
data.get('name', 'Unknown')
Strings
¶We have already seen bool
, int
, float
, list
, tuple
& dict
built-in data types. str
(string) is my third favortie data type in Python, next to list
& dict
.
Handling text such as emails, documents, web scraping is one of the most commong forms of data our program needs to handle. With Python, we can do lot of cool stuff with strings.
str
data type¶str
data type starts and ends with quotes (could be single or double).str
data type is immutablestr1 = 'hi welcome to my home'
str2 = "hola bienvenido a mi casa"
print (str1, type(str1))
print (str2, type(str2))
len ()
¶str1 = 'hi welcome to my home'
len(str1)
emptry string
We can create empty string using ''
notation or str()
function.
empty_string = '' # No characters inside empty string # or try # str()
print (len(empty_string))
str
index¶batman = 'Bruce Wayne!'
print (batman)
Letter | B | r | u | c | e | W | a | y | n | e | ! | |
Index no | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
Index no (Reverse) | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
The space and exclamation point are included in the string character and is counted as one character/ letter
batman[0]
batman[-12]
batman[5]
As string data type is immutable, an individual index position cannot be assigned a new value. In order to do that the entire string data type should be overwritten
batman[5] = '-'
batman = 'Bruce-Wayne!' # Overwritting the string data type with new vale
print (batman)
str
slicing¶str
slicing is very similar to slicing a list
data type
batman = 'Bruce Wayne!'
print (batman[1:5])
str
slicing returns an empty string when there is no element to slice in the given range.
print ('-' + batman[12:15] + '-') # returns an empty string
in
& not in
operators¶batman = 'Bruce Wayne!'
print ('Bruce' in batman)
print ('bruce' in batman) # Python is case-sensitive
print ('Wayne' not in batman)
isXXX()
methods¶There are several string methods that have names beginning with the word is
. These methods return a bool
value that describes the nature of the string. Here are some common isX
string methods:
isupper()
returns True
if the string has at least one letter and all the letters are uppercase.
islower()
returns True
if the string has at least one letter and all the letters are lowercase.
isalpha()
returns True
if the string consists only of letters and is not blank.
isalnum()
returns True
if the string consists only of letters and numbers and is not blank.
isdecimal()
returns True
if the string consists only of numeric characters and is not blank.
isspace()
returns True
if the string consists only of spaces, tabs, and new-lines and is not blank.
istitle()
returns True
if the string consists only of words that begin with an uppercase letter followed by only lowercase letters
WHY WE NEED IT? The isX
string methods are helpful when you need to validate user input & to make the code smarter and more generic/ fool proof.
spam = 'Hello world!'
spam.islower()
Set
¶We have already seen bool
, int
, float
, str
, list
, tuple
& dict
built-in data types. Set
unlike lists
or tuples
,
str
or tuple
)Set are identified with flower brackets {
and }
similar to dictionary. However, they are different from dictionary as set
doesn't have key-value pair.
set
data type¶dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
print (dataScientist)
Take note that unlike list
& tuple
, the set
values are not ordered. The order in which the set values are created and stored/ printed are different.
len()
¶dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'}
len(dataScientist)
Empty set
list, dict, tuples
, there is no notation for an empty set. We have to explicitly write it as set()
dataScientist_dict = {} # Creates an empty dictionary
print ('Data type =', type(dataScientist_dict))
print('No of elements =', len(dataScientist_dict))
dataScientist_set = set() # Creates an empty set
print ('Data type =', type(dataScientist_set))
print('No of elements =', len(dataScientist_set))
Convert list to sets
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])
print (dataEngineer)
Convert sets to list
dataEngineer = {'Scala', 'Python', 'Hadoop', 'Java', 'Git', 'SQL'}
dataEngineer_list = list(dataEngineer)
print (dataEngineer_list)
Convert sets to Ordered list
dataEngineer = {'Scala', 'Python', 'Hadoop', 'Java', 'Git', 'SQL'}
print (sorted(dataEngineer)) # Sort in alphabetical order (A-Z)
sorted(dataEngineer, reverse=True) # Sort in reverse alphabetical order (Z-A)
Remove duplicates from list
num = [1,1,1,1,2,2,2,3,4,4,5,5,5]
print ('Before :', num)
print ('After :', list(set(num))) # Convert the value to set and back to list again (duplicates are removed)
Add values to set
dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau'}
dataEngineer.add('SoS')
print (dataEngineer)
Remove values from set
There are 2 ways to remove values from set
remove()
method - The shortfall of using this is if we try to remove a value that is not in your set, we will get a KeyError.discard()
method - This will not throw Error if the value is not found. This is equivalent to get()
method in dict
.dataScientist = {'Python', 'R', 'SQL', 'Git', 'Tableau'}
dataScientist.remove('Python')
print(dataScientist)
dataScientist.remove('Java') # Throws error as the value is not inside the set
dataScientist.discard('Java')
Set
operations¶A common use of sets in Python is computing standard math operations such as union, intersection, difference, and symmetric difference
.
set.union(A,B)
- All values that are members of A or B or bothset.intersection(A,B)
- All values that are members of both A and Bset.difference(A,B)
- All values of A that not in Bset.symmetric_difference(A,B)
- All values which are in one of the sets, but not in both set.union(A,B) - set.intersection(A,B)
dataScientist = set(['Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS'])
dataEngineer = set(['Python', 'Java', 'Scala', 'Git', 'SQL', 'Hadoop'])
# For visualization purpose - 2 Circle VENN diagrams
import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn2_circles
plt.figure(figsize=(7,7))
subset_10 = '\n\n'.join(list(set.difference(dataScientist, dataEngineer)))
subset_01 = '\n\n'.join(list(set.difference(dataEngineer,dataScientist)))
subset_11 = '\n\n'.join(list(set.intersection(dataScientist, dataEngineer)))
v = venn2(subsets={'10': 3, '01': 3, '11': 3}, set_labels = ('dataScientist', 'dataEngineer'))
c = venn2_circles(subsets=(1, 1, 1), linestyle='solid', linewidth=1.0)
v.get_label_by_id('10').set_text(subset_10);v.get_label_by_id('10').set_fontsize(12)
v.get_label_by_id('01').set_text(subset_01);v.get_label_by_id('01').set_fontsize(12)
v.get_label_by_id('11').set_text(subset_11);v.get_label_by_id('11').set_fontsize(12)
plt.show()
set.union(dataScientist, dataEngineer)
set.intersection(dataScientist, dataEngineer)
set.difference(dataScientist, dataEngineer)
set.symmetric_difference(dataScientist, dataEngineer)
set.union(dataScientist, dataEngineer) - set.intersection(dataScientist, dataEngineer)
subset = {'R', 'SAS'}
set.issubset(subset, dataScientist), set.issubset(subset, dataEngineer)
in
and not in
operators¶in <=>
element ∈
set not in <=>
element ∉
set possibleSet = {'Python', 'R', 'SQL', 'Git', 'Tableau', 'SAS', 'Java', 'Spark', 'Scala'}
# Membership test
'Python' in possibleSet, 'C++' not in possibleSet
To download all the files used in the Python Refresher project, visit link here
.txt
files¶open()
¶Python has a built-in functions open()
with which we can open the file and read its contents.
The function returns a file handle if the requested file exisits & we have proper permission to read the file.
file handle
- A variable used to manipulate the file. Equivalent to File > Open
in Microsoft Word.Note: Handle doesn't have all the file data. It simply establishes a connection.
f_name = 'files/read-file-example.txt' # For mac & linux users, change the directory slashes accordingly
f_handle = open(f_name)
print (f_handle)
file handle
In its simplest form, a file handle for reading data works sort of like a pipe: the operating system puts data (usually strings) in one end, and we extract data out of the other end, usually one line at a time.
for loop
to read the contents of the file line by line.for line in f_handle:
print (line)
Why open()
does't read entire file?
Some files might be quite large with many GBs of data. The open()
just created a handle to read the file contents. Then, for loop actually reads the data from the file. This way, Python takes care of splitting the data in the file into separate lines using the newline character \n
. Because the for loop reads the data one line at a time, it can efficiently read and count the lines in very large files without running out of main memory (RAM) to store the data.
readline()
- To read text file one line at a time
f_name = 'files/read-file-example.txt'
f_handle = open(f_name)
print (f_handle.readline())
print (f_handle.readline())
readlines()
- To read all the lines in the text file
f_name = 'files/read-file-example.txt'
f_handle = open(f_name)
print (f_handle.readlines())
read()
entire file¶If we know the file is relatively small compared to the size of your main memory, we can read the whole file into one string using the read()
on the file handle.
f_name = 'files/read-file-example.txt'
f_handle = open(f_name)
file_content = f_handle.read()
print (file_content)
write()
a line into file¶
line1 = 'This is Line 1.\n'
To write a file, we have to open it with mode w
as second parameter.
f_name = 'files/write-file-example-1.txt'
fhandle = open(f_name, 'w')
The write()
method of the file handle object puts data into the file, returning the number of characters written.
fhandle.write(line1) # Writes the line into the text file and returns the no of characters written
close()
the file¶When we are done writing, we have to explicitly close the file to make sure that the last bit of data is physically written to the hard disk disk from the buffer so it will not be lost if the power goes off.
fhandle.close()
write multiple lines into a file using for loop and write()
line1 = 'This is Line 1.\n'
line2 = 'This is Line 2.\n'
line3 = 'This is Line 3 (last line).\n'
lines = [line1, line2, line3]
print (lines)
f_name = 'files/write-file-example-2.txt'
fhandle = open(f_name, 'w')
for line in lines:
fhandle.write(line)
fhandle.close()
writelines()
¶write multile lines into a file
f_name = 'files\\write-file-example-2.txt'
fhandle = open(f_name, 'w')
fhandle.writelines(lines)
fhandle.close()
open()
Mode¶Including a mode argument is optional because a default value of 'r'
will be assumed if it is omitted. The 'r'
value stands for read mode, which is just one of many.
The modes are:
========= =============================================================== Character| Meaning --------- --------------------------------------------------------------- 'r' | open for reading (default) 'w' | open for writing, truncating the file first 'x' | create a new file and open it for writing 'a' | open for writing, appending to the end of the file if it exists 'b' | binary mode 't' | text mode (default) '+' | open a disk file for updating (reading and writing) 'U' | universal newline mode (deprecated) ========= ===============================================================
To append a file
'a'
mode doesn't overwrite the existing contents of the file.f_name = 'files/write-file-example-2.txt'
fhandle = open(f_name, 'a')
fhandle.writelines(lines)
fhandle.close()
What is .csv
file?
Python csv
module
header = ['Hero Name', 'symbol', 'height', 'weight']
data = [['Batman', 'bat', 1.82, 85],
['Ironman', 'S', 1.78, 300],
['Spiderman', 'Spider', 1.62, 70]]
csv.writer()
¶import csv
f_name = 'files/write-csv-example-1.csv' # For mac & linux users, change the directory slashes accordingly
f_handle = open(f_name, 'w', newline='')
csv_writer = csv.writer(f_handle)
csv_writer.writerow(header)
for line in data:
csv_writer.writerow(line)
f_handle.close()
csv.reader()
¶import csv
f_name = 'files/write-csv-example-1.csv' # For mac & linux users, change the directory slashes accordingly
f_handle = open(f_name, 'r')
csv_reader = csv.reader(f_handle)
for line in csv_reader:
print (line)
f_handle.close()
zip()
¶x = [1,2,3]
y = [4,5,6]
z = [7,8,9]
zipped = zip(x,y,z)
print (zipped, type(zipped))
for i in zipped:
print (i)
unzip a list
zip()
in conjunction with the *
operator can be used to unzip a list as shown below
zipped = zip(x,y,z)
l1, l2, l3 = zip(*zipped)
print (l1, l2, l3)
If the length of the iterables are not equal, zip creates the list of tuples of length equal to the smallest iterable and truncates the extra elements.
names = ['Clark Kent', 'Bruce Wayne', 'Peter Parker', 'Tony Stark']
speical_names = ['Super Man', 'Bat Man', 'Spider Man']
color = ['Blue', 'Black']
zipped = zip(names,speical_names,color)
for i in zipped:
print (i) # As color is the shortest iterable, zip stops with 2 iterations
email parsing & scraping
¶We will work on mbox.txt
file which holds collections of publically availale email messages that are are concatenated and stored as plain text in a single file. For further details on mbox, refer https://en.wikipedia.org/wiki/Mbox.
We are going to parse and scrape through the txt file to extract information about senders, their email address, no of emails sent by each sender and the time at which emails are sent.
To do this, we will need a basic knowledge of for loop
, string slicing, splitting & searching
, list
, dict
.
f_name = 'files/mbox.txt' # For mac & linux users, change the directory slashes accordingly
f_handle = open(f_name)
email_list = []
for row in f_handle:
# Extract the row that startswith the word 'From ' and store in a list
if row.startswith('From '):
email = row.strip()
email_list.append(email)
Find the no of times email received from each sender
email_dict = {}
for row in email_list:
email = row.split(' ')[1]
email_dict[email] = email_dict.get(email, 0) + 1
email_dict
email_dict.values()
Finding the email id who sent max no of emails
max_email = max(email_dict.values())
for email, count in email_dict.items():
if count == max_email:
print (email, count)
break
Find total number of emails received in a hourly basis
email_hrs = {}
for row in email_list:
hr = int(row.split()[5].split(':')[0])
email_hrs[hr] = email_hrs.get(hr, 0) + 1
email_hrs
import matplotlib.pyplot as plt
x = list(email_hrs.keys())
height = list(email_hrs.values())
plt.bar(x, height )
plt.title('Emails received in each hour')
plt.xlabel('Hour of the day'); plt.ylabel('No of emails recevied')
plt.show()
tic-tac-toe
¶# -*- coding: utf-8 -*-
"""
Created on Thu Jun 14 21:20:40 2018
@author: Prasanth
"""
# Create an empty tic tac toe board
theBoard = {1: ' ', 2: ' ', 3: ' ',
4: ' ', 5: ' ', 6: ' ',
7: ' ', 8: ' ', 9: ' '}
# List of winning combinations
winningCombos = [[1,2,3],
[4,5,6],
[7,8,9],
[1,4,7],
[2,5,8],
[3,6,9],
[1,5,9],
[3,5,7]]
def printBoard(board):
print (board[1] + '|' + board[2] + '|' + board[3] + '\n' +
'-+-+-\n' +
board[4] + '|' + board[5] + '|' + board[6] + '\n' +
'-+-+-\n' +
board[7] + '|' + board[8] + '|' + board[9])
def resetBoard():
theBoard = {1: ' ', 2: ' ', 3: ' ',
4: ' ', 5: ' ', 6: ' ',
7: ' ', 8: ' ', 9: ' '}
return theBoard
printBoard(theBoard)
player = 'X'
winFlag = 0
counter = 0
while True:
try:
user_input = input('Turn for ' + player + ". Move on which space? ")
user_input = int(user_input)
if theBoard.get(user_input,'') != ' ':
print ('Invalid input : Try again')
continue
except ValueError:
print ('Invalid input : Try again')
continue
except:
quit()
break
theBoard[user_input] = player
printBoard(theBoard)
# If the player wins, inform
for combo in winningCombos:
if theBoard[combo[0]] == theBoard[combo[1]] == theBoard[combo[2]] == player:
print ("player " + player + " wins")
winFlag = 1
break
counter += 1
if winFlag == 1 or counter == 9:
user_input = input('Game over. Do you want to continue (y or n): ')
if user_input == 'n' :
break
else:
winFlag = 0
counter = 0
theBoard = resetBoard()
print ('\n'*3)
printBoard(theBoard)
# toggle player
if player == 'X':
player = 'O'
else:
player = 'X'
Encryption & Decryption
¶Will be added soon
import sys
print ("Python version: {}".format(sys.version))
import numpy as np
print ("Numpy version: {}".format(np.__version__))
import matplotlib
print ("Matplotlib version: {}".format(matplotlib.__version__))
import sklearn
print ("Scikit-learn version: {}".format(sklearn.__version__))
import scipy
print ("Scipy version: {}".format(scipy.__version__))
import pandas as pd
print ("Pandas version: {}".format(pd.__version__))
import mglearn
print ("Mglearn version: {}".format(mglearn.__version__))
import graphviz
print ("Graph viz verison: {}".format(graphviz.__version__))
import tensorflow as tf
print ("Tensorflow version: {}".format(tf.__version__))