A10: Python Project

Assignment Series #A10 – Journeyman’s Piece

You have two logfiles in different formats. Both logs have same attributes like:

  • timestamp (2020-07-22 07:22:37.822863)
  • error category (Exception or ERROR)
  • error message (Error1,Error2..)
  • user (oracle,administrator…)
  • id (3521,2294…)

sample_1.log has a JSON like pattern and looks like this:

2020-07-22 07:22:37.822863: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "3521" }  
2020-09-22 12:31:44.789319: { "level": "Exception", "message": "Error1", "user": "administrator", "id": "4371" }  
2021-04-06 22:51:10.999642: { "level": "ERROR", "message": "Error3", "user": "azureuser", "id": "2294" }  
2020-07-19 15:45:58.576940: { "level": "Exception", "message": "Error2", "user": "hacker40", "id": "8677" }  
2021-01-23 14:07:18.922480: { "level": "ERROR", "message": "Error5", "user": "hacker40", "id": "1865" }  
2020-08-19 05:47:46.983299: { "level": "Exception", "message": "Error4", "user": "pi", "id": "8993" }  
2021-03-25 13:13:06.012237: { "level": "ERROR", "message": "Error5", "user": "mysql", "id": "3561" }  
2020-05-05 06:37:50.976402: { "level": "ERROR", "message": "Error5", "user": "pi", "id": "1754" }

sample_2.log has a more general format and looks like this:

ERROR at 19:30:13 on Thu, 02/07/2020  - Error1 - tracking id is 4126 - user is puppet.  
ERROR at 12:08:30 on Mon, 26/10/2020  - Error1 - tracking id is 5567 - user is ftp.  
Exception at 21:35:12 on Sun, 28/06/2020  - Error5 - tracking id is 8077 - user is vagrant.  
ERROR at 06:36:05 on Sat, 11/07/2020  - Error1 - tracking id is 5218 - user is puppet.  
Exception at 17:40:33 on Fri, 01/01/2021  - Error3 - tracking id is 8252 - user is mysql.  
Exception at 13:49:18 on Wed, 06/05/2020  - Error2 - tracking id is 8369 - user is hacker15.  
Exception at 21:09:05 on Sat, 16/05/2020  - Error1 - tracking id is 5091 - user is adm.  
ERROR at 10:39:13 on Thu, 29/04/2021  - Error2 - tracking id is 4225 - user is oracle.

Your goal is to write a logfile generator for both output types and then write a python script that grab both logfiles, consolidate them to the same format and sort the events by date.

Logfile Generator 1 (gen_1.py)

import random
import datetime

levels = ["ERROR", "Exception"]
error_messages  = ["Error1","Error2","Error3","Error4","Error5"]
file = open("users.txt","r")
users = file.read().splitlines()
file.close()

def create_log_entry():
    dt = datetime.datetime(2020, 5, 1) + random.random() * datetime.timedelta(days=365)
    level = random.choice(levels)
    message = random.choice(error_messages)

    user = random.choice(users)

    id = random.randint(1000,10000)
    log_entry = (f'{dt}: {{ \"level\": "{level}", \"message\": "{message}", \"user\": "{user}", \"id\": "{id}" }}')
    return log_entry

logs = []
for i in range(50):
    logs.append(create_log_entry())
logfile = open("input1.log", "w")
for line in logs:
    logfile.write(line + "\n")
logfile.close()
Logfile Generator 2 (gen_2.py)
import random
import datetime

levels = ["ERROR", "Exception"]
error_messages  = ["Error1","Error2","Error3","Error4","Error5"]
file = open("users.txt","r")
users = file.read().splitlines()
file.close()

def create_log_entry():
    dt = datetime.datetime(2020, 5, 1) + random.random() * datetime.timedelta(days=365)
    level = random.choice(levels)
    message = random.choice(error_messages)

    user = random.choice(users)

    id = random.randint(1000,10000)
    log_entry = (f'{level} at {datetime.datetime.strftime(dt, "%H:%M:%S on %a, %d/%m/%Y")}  - {message} - tracking id is {id} - user is {user}.')

    return log_entry

logs = []
for i in range(50):
    logs.append(create_log_entry())
logfile = open("input2.log", "w")
for line in logs:
    logfile.write(line + "\n")
logfile.close()

Output from Generator 1

2021-01-31 17:17:23.631168: { "level": "ERROR", "message": "Error2", "user": "root", "id": "5219" }
2021-02-03 15:21:35.674267: { "level": "Exception", "message": "Error2", "user": "user", "id": "8611" }
2021-02-09 04:17:00.832823: { "level": "Exception", "message": "Error5", "user": "ansible", "id": "8817" }
2021-03-24 09:58:03.258777: { "level": "Exception", "message": "Error2", "user": "info", "id": "4091" }
2020-12-03 16:08:58.087425: { "level": "Exception", "message": "Error1", "user": "user", "id": "2269" }
2020-06-28 21:23:20.974961: { "level": "Exception", "message": "Error4", "user": "hacker50", "id": "1958" }

Output from Generator 2

Exception at 08:49:17 on Thu, 24/09/2020  - Error1 - tracking id is 2920 - user is user.
ERROR at 19:30:13 on Thu, 02/07/2020  - Error1 - tracking id is 4126 - user is puppet.
ERROR at 12:08:30 on Mon, 26/10/2020  - Error1 - tracking id is 5567 - user is ftp.
Exception at 21:35:12 on Sun, 28/06/2020  - Error5 - tracking id is 8077 - user is vagrant.
ERROR at 06:36:05 on Sat, 11/07/2020  - Error1 - tracking id is 5218 - user is puppet.
Exception at 17:40:33 on Fri, 01/01/2021  - Error3 - tracking id is 8252 - user is mysql.

Consolidation Script (consolidate.py)

import re
from datetime import datetime, date

file = open("input1.log","r")
log = file.read().splitlines()
file.close()

for e in log:
    '''                  2020-06-24 19:07:54.153862: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "9293" }'''
    pattern = r'(\d{4}-\d{2}-\d{2}) (\d\d:\d\d:\d\d.\d+): { "level": "(.+)", "message": "(.+)", "user": "(.+)", "id": "(.+)" }'
    match = re.search(pattern, e)

    DAY_OF_WEEK = {
        0: "Mon",
        1: "Tue",
        2: "Wed",
        3: "Thu",
        4: "Fri",
        5: "Sat",
        6: "Sun",
    }

    date_ = match.group(1)
    time = match.group(2)
    level = match.group(3)
    message = match.group(4)
    user = match.group(5)
    id = match.group(6)

    day = date.fromisoformat(date_).weekday()
    day = DAY_OF_WEEK[day]

    time = time.split('.')[0]

    dt = re.sub(r'-', '/', date_)

    newFormat = f'{level} at {time} on {day}, {dt}  - {message} - tracking id is {id} - user is {user}'
    print(newFormat)

Output from Consolidation script:

ERROR     at 01:00:13 on Sat, 02/05/2020  - Error4 - tracking id is 4578 - user is hacker25.
Exception at 07:58:10 on Tue, 05/05/2020  - Error2 - tracking id is 9393 - user is oracle
Exception at 13:49:18 on Wed, 06/05/2020  - Error2 - tracking id is 8369 - user is hacker15.
Exception at 06:42:46 on Tue, 12/05/2020  - Error1 - tracking id is 9312 - user is user.
Exception at 21:09:05 on Sat, 16/05/2020  - Error1 - tracking id is 5091 - user is adm.
Exception at 09:28:12 on Sun, 17/05/2020  - Error3 - tracking id is 7282 - user is ec2-user
Exception at 09:25:46 on Mon, 18/05/2020  - Error1 - tracking id is 7894 - user is ftp
Exception at 04:57:46 on Wed, 20/05/2020  - Error2 - tracking id is 4166 - user is root
ERROR     at 18:01:17 on Fri, 22/05/2020  - Error5 - tracking id is 1452 - user is user.
ERROR     at 08:26:01 on Thu, 04/06/2020  - Error1 - tracking id is 5332 - user is hacker50
ERROR     at 03:14:23 on Fri, 05/06/2020  - Error3 - tracking id is 1377 - user is vagrant
Exception at 18:57:30 on Fri, 05/06/2020  - Error4 - tracking id is 8949 - user is adm
ERROR     at 11:05:02 on Mon, 08/06/2020  - Error1 - tracking id is 1225 - user is root
ERROR     at 19:57:14 on Sat, 13/06/2020  - Error3 - tracking id is 4795 - user is adm.
ERROR     at 02:24:28 on Tue, 16/06/2020  - Error2 - tracking id is 5996 - user is admin.
ERROR     at 05:59:18 on Thu, 18/06/2020  - Error1 - tracking id is 7570 - user is info.
ERROR     at 16:05:12 on Fri, 19/06/2020  - Error3 - tracking id is 2196 - user is azureuser.
Exception at 20:09:20 on Sun, 21/06/2020  - Error5 - tracking id is 8771 - user is test
Exception at 21:23:20 on Sun, 28/06/2020  - Error4 - tracking id is 1958 - user is hacker50
Exception at 21:35:12 on Sun, 28/06/2020  - Error5 - tracking id is 8077 - user is vagrant.

PDF Report:

A10_Python#1