Assignment Series #A10 – Journeyman’s Piece

You have two logfiles in different formats. Both logs have same attributes like:
- timestamp (2020-07-22 07:22:37.822863)
- error category (Exception or ERROR)
- error message (Error1,Error2..)
- user (oracle,administrator…)
- id (3521,2294…)
sample_1.log has a JSON like pattern and looks like this:
2020-07-22 07:22:37.822863: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "3521" }
2020-09-22 12:31:44.789319: { "level": "Exception", "message": "Error1", "user": "administrator", "id": "4371" }
2021-04-06 22:51:10.999642: { "level": "ERROR", "message": "Error3", "user": "azureuser", "id": "2294" }
2020-07-19 15:45:58.576940: { "level": "Exception", "message": "Error2", "user": "hacker40", "id": "8677" }
2021-01-23 14:07:18.922480: { "level": "ERROR", "message": "Error5", "user": "hacker40", "id": "1865" }
2020-08-19 05:47:46.983299: { "level": "Exception", "message": "Error4", "user": "pi", "id": "8993" }
2021-03-25 13:13:06.012237: { "level": "ERROR", "message": "Error5", "user": "mysql", "id": "3561" }
2020-05-05 06:37:50.976402: { "level": "ERROR", "message": "Error5", "user": "pi", "id": "1754" }
sample_2.log has a more general format and looks like this:
ERROR at 19:30:13 on Thu, 02/07/2020 - Error1 - tracking id is 4126 - user is puppet. ERROR at 12:08:30 on Mon, 26/10/2020 - Error1 - tracking id is 5567 - user is ftp. Exception at 21:35:12 on Sun, 28/06/2020 - Error5 - tracking id is 8077 - user is vagrant. ERROR at 06:36:05 on Sat, 11/07/2020 - Error1 - tracking id is 5218 - user is puppet. Exception at 17:40:33 on Fri, 01/01/2021 - Error3 - tracking id is 8252 - user is mysql. Exception at 13:49:18 on Wed, 06/05/2020 - Error2 - tracking id is 8369 - user is hacker15. Exception at 21:09:05 on Sat, 16/05/2020 - Error1 - tracking id is 5091 - user is adm. ERROR at 10:39:13 on Thu, 29/04/2021 - Error2 - tracking id is 4225 - user is oracle.
Your goal is to write a logfile generator for both output types and then write a python script that grab both logfiles, consolidate them to the same format and sort the events by date.
Logfile Generator 1 (gen_1.py)
import random
import datetime
levels = ["ERROR", "Exception"]
error_messages = ["Error1","Error2","Error3","Error4","Error5"]
file = open("users.txt","r")
users = file.read().splitlines()
file.close()
def create_log_entry():
dt = datetime.datetime(2020, 5, 1) + random.random() * datetime.timedelta(days=365)
level = random.choice(levels)
message = random.choice(error_messages)
user = random.choice(users)
id = random.randint(1000,10000)
log_entry = (f'{dt}: {{ \"level\": "{level}", \"message\": "{message}", \"user\": "{user}", \"id\": "{id}" }}')
return log_entry
logs = []
for i in range(50):
logs.append(create_log_entry())
logfile = open("input1.log", "w")
for line in logs:
logfile.write(line + "\n")
logfile.close()
Logfile Generator 2 (gen_2.py)
import random
import datetime
levels = ["ERROR", "Exception"]
error_messages = ["Error1","Error2","Error3","Error4","Error5"]
file = open("users.txt","r")
users = file.read().splitlines()
file.close()
def create_log_entry():
dt = datetime.datetime(2020, 5, 1) + random.random() * datetime.timedelta(days=365)
level = random.choice(levels)
message = random.choice(error_messages)
user = random.choice(users)
id = random.randint(1000,10000)
log_entry = (f'{level} at {datetime.datetime.strftime(dt, "%H:%M:%S on %a, %d/%m/%Y")} - {message} - tracking id is {id} - user is {user}.')
return log_entry
logs = []
for i in range(50):
logs.append(create_log_entry())
logfile = open("input2.log", "w")
for line in logs:
logfile.write(line + "\n")
logfile.close()
Output from Generator 1
2021-01-31 17:17:23.631168: { "level": "ERROR", "message": "Error2", "user": "root", "id": "5219" }
2021-02-03 15:21:35.674267: { "level": "Exception", "message": "Error2", "user": "user", "id": "8611" }
2021-02-09 04:17:00.832823: { "level": "Exception", "message": "Error5", "user": "ansible", "id": "8817" }
2021-03-24 09:58:03.258777: { "level": "Exception", "message": "Error2", "user": "info", "id": "4091" }
2020-12-03 16:08:58.087425: { "level": "Exception", "message": "Error1", "user": "user", "id": "2269" }
2020-06-28 21:23:20.974961: { "level": "Exception", "message": "Error4", "user": "hacker50", "id": "1958" }
Output from Generator 2
Exception at 08:49:17 on Thu, 24/09/2020 - Error1 - tracking id is 2920 - user is user. ERROR at 19:30:13 on Thu, 02/07/2020 - Error1 - tracking id is 4126 - user is puppet. ERROR at 12:08:30 on Mon, 26/10/2020 - Error1 - tracking id is 5567 - user is ftp. Exception at 21:35:12 on Sun, 28/06/2020 - Error5 - tracking id is 8077 - user is vagrant. ERROR at 06:36:05 on Sat, 11/07/2020 - Error1 - tracking id is 5218 - user is puppet. Exception at 17:40:33 on Fri, 01/01/2021 - Error3 - tracking id is 8252 - user is mysql.
Consolidation Script (consolidate.py)
import re
from datetime import datetime, date
file = open("input1.log","r")
log = file.read().splitlines()
file.close()
for e in log:
''' 2020-06-24 19:07:54.153862: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "9293" }'''
pattern = r'(\d{4}-\d{2}-\d{2}) (\d\d:\d\d:\d\d.\d+): { "level": "(.+)", "message": "(.+)", "user": "(.+)", "id": "(.+)" }'
match = re.search(pattern, e)
DAY_OF_WEEK = {
0: "Mon",
1: "Tue",
2: "Wed",
3: "Thu",
4: "Fri",
5: "Sat",
6: "Sun",
}
date_ = match.group(1)
time = match.group(2)
level = match.group(3)
message = match.group(4)
user = match.group(5)
id = match.group(6)
day = date.fromisoformat(date_).weekday()
day = DAY_OF_WEEK[day]
time = time.split('.')[0]
dt = re.sub(r'-', '/', date_)
newFormat = f'{level} at {time} on {day}, {dt} - {message} - tracking id is {id} - user is {user}'
print(newFormat)
Output from Consolidation script:
ERROR at 01:00:13 on Sat, 02/05/2020 - Error4 - tracking id is 4578 - user is hacker25. Exception at 07:58:10 on Tue, 05/05/2020 - Error2 - tracking id is 9393 - user is oracle Exception at 13:49:18 on Wed, 06/05/2020 - Error2 - tracking id is 8369 - user is hacker15. Exception at 06:42:46 on Tue, 12/05/2020 - Error1 - tracking id is 9312 - user is user. Exception at 21:09:05 on Sat, 16/05/2020 - Error1 - tracking id is 5091 - user is adm. Exception at 09:28:12 on Sun, 17/05/2020 - Error3 - tracking id is 7282 - user is ec2-user Exception at 09:25:46 on Mon, 18/05/2020 - Error1 - tracking id is 7894 - user is ftp Exception at 04:57:46 on Wed, 20/05/2020 - Error2 - tracking id is 4166 - user is root ERROR at 18:01:17 on Fri, 22/05/2020 - Error5 - tracking id is 1452 - user is user. ERROR at 08:26:01 on Thu, 04/06/2020 - Error1 - tracking id is 5332 - user is hacker50 ERROR at 03:14:23 on Fri, 05/06/2020 - Error3 - tracking id is 1377 - user is vagrant Exception at 18:57:30 on Fri, 05/06/2020 - Error4 - tracking id is 8949 - user is adm ERROR at 11:05:02 on Mon, 08/06/2020 - Error1 - tracking id is 1225 - user is root ERROR at 19:57:14 on Sat, 13/06/2020 - Error3 - tracking id is 4795 - user is adm. ERROR at 02:24:28 on Tue, 16/06/2020 - Error2 - tracking id is 5996 - user is admin. ERROR at 05:59:18 on Thu, 18/06/2020 - Error1 - tracking id is 7570 - user is info. ERROR at 16:05:12 on Fri, 19/06/2020 - Error3 - tracking id is 2196 - user is azureuser. Exception at 20:09:20 on Sun, 21/06/2020 - Error5 - tracking id is 8771 - user is test Exception at 21:23:20 on Sun, 28/06/2020 - Error4 - tracking id is 1958 - user is hacker50 Exception at 21:35:12 on Sun, 28/06/2020 - Error5 - tracking id is 8077 - user is vagrant.
PDF Report:

