Creating a pre commit rules environment

Got it 👍 Here’s the same setup written in plain text only (no code block formatting), so you can copy it all at once:




Step 1: Install pre-commit

Run in terminal:

pip install pre-commit


Step 2: Create a file named .pre-commit-config.yaml in your project root and put this inside:


repos:


  • repo: https://github.com/psf/black
  • rev: 24.3.0
  • hooks:
  • id: black


  • repo: https://github.com/pycqa/isort
  • rev: 5.13.2
  • hooks:
  • id: isort


  • repo: https://github.com/pycqa/flake8
  • rev: 6.1.0
  • hooks:
  • id: flake8


  • repo: https://github.com/pre-commit/mirrors-mypy
  • rev: v1.10.0
  • hooks:
  • id: mypy


  • repo: https://github.com/PyCQA/bandit
  • rev: 1.7.9
  • hooks:
  • id: bandit




Step 3: Install the hooks

Run:

pre-commit install


Step 4: Run hooks on all files (first time setup)

Run:

pre-commit run –all-files




Now, every time you do git commit, the following will run automatically:


  • Black → formats your code
  • isort → organizes imports
  • Flake8 → checks style issues
  • Mypy → checks type hints
  • Bandit → scans for security issues



From Blogger iPhone client

Python code checker

Yes — there are several libraries and tools in Python that help you check, analyze, and improve your code quality. Here are the most widely used ones, explained in simple text with short examples:




1. pylint – checks for errors and enforces coding standards.

Example:

pylint my_script.py

This will scan your code and show warnings like unused variables, bad naming, or missing docstrings.




2. flake8 – focuses on style and PEP8 compliance.

Example:

flake8 my_script.py

It will flag things like extra spaces, long lines, or inconsistent indentation.




3. black – auto-formats your code to follow best practices.

Example:

black my_script.py

It rewrites your file with consistent formatting (indentation, spacing, quotes).




4. isort – automatically sorts and organizes imports.

Example:

isort my_script.py

It arranges imports alphabetically and groups them properly.




5. mypy – checks type hints to catch type errors before running.

Example:

mypy my_script.py

If your function expects a list of strings but you pass integers, it will warn you.




6. bandit – scans for common security issues.

Example:

bandit -r .

This checks all files in your project for unsafe code patterns like hardcoded passwords.




7. coverage.py – measures how much of your code is covered by tests.

Example:

coverage run -m pytest

coverage report

It shows which lines of code were tested and which were not.




So in short:


  • pylint / flake8 → code style and errors
  • black / isort → auto-formatting and import order
  • mypy → type checking
  • bandit → security issues
  • coverage.py → test coverage




Recommended order in practice:


  1. black → format
  2. isort → fix imports
  3. flake8/pylint → style & logic issues
  4. mypy → type checking
  5. bandit → security scan
  6. coverage.py → testing completeness



From Blogger iPhone client

10 Tips to Optimize Your Python Coding

Got it 👍 — I’ll rewrite the article as one long piece of plain text so you can easily Select All → Copy without formatting blocks interfering. I’ll keep the code inline with indentation so it’s still readable but won’t break your copy-paste.




10 Tips to Optimize Your Python Coding


Writing Python code that is clean, efficient, and easy to maintain requires more than just knowing the syntax. It’s about building good habits that help you and others understand, test, and reuse your work. Below are ten practical tips to optimize your Python coding, each supported with examples you can apply in real projects.


  1. Use if name == “main” for Safer Execution
  2. When writing modules, always protect the entry point with an if name == “main” block. Without it, any function you call directly in the module will execute every time the module is imported, which can lead to unintended side effects.
  3. module.py
  4. def connect():
  5. print(“Connected!”)
  6. if name == “main”:
  7. connect()



This ensures your code runs only when intended, avoids duplicate execution, and signals to other developers that this script was meant to be run directly.


  1. Define a Clear Main Function
  2. Even in small scripts, create a main() function to serve as the central entry point. This makes your code easier to follow and mirrors conventions used in other programming languages like Java or C++.
  3. def greet():
  4. print(“Hello!”)
  5. def goodbye():
  6. print(“Goodbye!”)
  7. def main():
  8. greet()
  9. goodbye()
  10. if name == “main”:
  11. main()



This structure creates a clear separation between definition and execution, making your program more organized and testable.


  1. Keep Functions Simple and Reusable
  2. Avoid writing functions that try to handle everything at once. Instead, break logic into smaller reusable parts. This improves readability and makes it easier to modify or extend functionality later.
  3. def is_adult(age: int, has_id: bool) -> bool:
  4. return has_id and age >= 21
  5. def is_banned(name: str) -> bool:
  6. return name.lower() == “bob”
  7. def enter_club(name: str, age: int, has_id: bool) -> None:
  8. if is_banned(name):
  9. print(f”{name}, you are not allowed in.”)
  10. elif is_adult(age, has_id):
  11. print(f”Welcome to the club, {name}!”)
  12. else:
  13. print(f”Sorry {name}, you cannot enter.”)



Breaking logic apart increases reusability and avoids bloated, “do-everything” functions.


  1. Leverage Type Annotations
  2. Python is dynamically typed, but using type hints clarifies intent, prevents errors, and improves IDE support. This helps others understand what your functions expect and return.
  3. def uppercase_elements(elements: list[str]) -> list[str]:
  4. return [el.upper() for el in elements]
  5. names = [“alice”, “bob”, “charlie”]
  6. print(uppercase_elements(names))



Static analyzers like mypy can catch issues before runtime, reducing the risk of silent bugs.


  1. Adopt List Comprehensions for Cleaner Loops
  2. List comprehensions make code more concise and often faster than traditional loops. Instead of writing multiple lines, you can express filtering and transformation in one.
  3. people = [“James”, “Charlotte”, “Stephanie”, “Mario”, “Sandra”]
  4. long_names = [name for name in people if len(name) > 7]
  5. print(long_names)



Use descriptive variable names to maintain readability.


  1. Avoid Hardcoding Magic Values
  2. Magic values make code harder to maintain. Instead, define constants with clear names.
  3. LEGAL_AGE = 21
  4. BANNED_USERS = {“bob”}
  5. def is_adult(age: int, has_id: bool) -> bool:
  6. return has_id and age >= LEGAL_AGE
  7. def is_banned(name: str) -> bool:
  8. return name.lower() in BANNED_USERS



This improves readability and allows you to change values in a single place if requirements shift.


  1. Use Meaningful Variable and Function Names
  2. Short, unclear names can confuse collaborators. Opt for descriptive identifiers that explain intent without requiring extra comments.
  3. Bad
  4. def f(a, b): return a + b
  5. Good
  6. def calculate_total(price: float, tax: float) -> float:
  7. return price + tax



Names are the first form of documentation — make them count.


  1. Write Docstrings for Clarity
  2. For functions that perform more complex logic, provide docstrings that explain purpose, inputs, and outputs. This avoids confusion and speeds up collaboration.
  3. def calculate_discount(price: float, discount_rate: float) -> float:
  4. “””
  5. Calculate the final price after applying a discount.




 Args:

   price (float): Original price of the item.

   discount_rate (float): Discount as a decimal (e.g., 0.2 for 20%).


 Returns:

   float: Final price after discount.

 """

 return price * (1 - discount_rate)




Even simple comments save future developers (and your future self) time.


  1. Handle Errors Gracefully
  2. Use exceptions to manage errors instead of letting your program crash. This makes your code more robust and user-friendly.
  3. def safe_divide(a: float, b: float) -> float:
  4. try:
  5. return a / b
  6. except ZeroDivisionError:
  7. print(“Error: division by zero is not allowed.”)
  8. return float(“inf”)
  9. print(safe_divide(10, 0))



Good error handling prevents edge cases from breaking your program.


  1. Optimize with Built-in Functions and Libraries
  2. Python’s standard library and built-ins are often more efficient than reinventing the wheel. Use tools like sum(), max(), min(), and any() to replace manual loops.
  3. numbers = [2, 4, 6, 8]
  4. print(sum(numbers))    # 20
  5. print(max(numbers))    # 8
  6. print(any(n > 5 for n in numbers)) # True



Built-ins are optimized in C, making them faster than equivalent Python loops.


Final Thoughts

By combining these ten practices — from structuring your scripts with if name == “main” to writing reusable functions, leveraging type hints, and handling errors gracefully — you can dramatically improve the readability, reliability, and maintainability of your Python code. These aren’t just tricks; they’re habits that separate quick hacks from professional-quality software.




✅ Now you can just “select all” and copy the whole thing without losing code.


Do you want me to also make a shorter cheat sheet version of these 10 tips (like a quick reference you can keep on hand)?


From Blogger iPhone client

Python code visualizer

https://cscircles.cemc.uwaterloo.ca/visualize

From Blogger iPhone client

Animated comparison multithreaded school multiprocessing

import time

import random

import threading

import multiprocessing as mp

import matplotlib.pyplot as plt

import matplotlib.animation as animation

import psutil


# -----------------------------

# Worker function

# -----------------------------

def worker(task_id, sleep_time, results, lock):

  start = time.perf_counter() * 1000 # ms

  time.sleep(sleep_time)       # simulate work

  end = time.perf_counter() * 1000

  with lock:

    results.append((task_id, start, end - start))



# -----------------------------

# Multithreading: many small tasks

# -----------------------------

def run_multithreading(num_threads=4, num_tasks=80, results=None, lock=None):

  threads = []

  for i in range(num_tasks):

    t = threading.Thread(

      target=worker,

      args=(i % num_threads, random.uniform(0.005, 0.02), results, lock)

    )

    threads.append(t)

    t.start()

  for t in threads:

    t.join()



# -----------------------------

# Multiprocessing: few long tasks

# -----------------------------

def run_multiprocessing(num_procs=4, results=None, lock=None):

  procs = []

  for i in range(num_procs):

    p = mp.Process(

      target=worker,

      args=(i, random.uniform(0.2, 0.4), results, lock)

    )

    procs.append(p)

    p.start()

  for p in procs:

    p.join()



# -----------------------------

# Live Plotter

# -----------------------------

def animate_execution(mode="threading", duration=2):

  colors = ['#7fcfd4', '#fff29b', '#c8c0ff', '#ff8f80']


  # Shared results

  if mode == "threading":

    results = []

    lock = threading.Lock()

    task_runner = threading.Thread(target=run_multithreading, args=(4, 80, results, lock))

  else:

    manager = mp.Manager()

    results = manager.list()

    lock = manager.Lock()

    task_runner = mp.Process(target=run_multiprocessing, args=(4, results, lock))


  task_runner.start()


  # Setup figure

  fig, (ax_timeline, ax_cpu) = plt.subplots(2, 1, figsize=(10, 6))

  ax_timeline.set_title(f"{mode} timeline (live)")

  ax_timeline.set_xlabel("time (ms)")

  ax_timeline.set_ylabel("worker")

  ax_cpu.set_title("CPU utilization (live)")

  ax_cpu.set_xlabel("time (ms)")

  ax_cpu.set_ylabel("CPU %")


  cpu_timestamps, cpu_data = [], []


  # Animation update function

  def update(frame):

    now = time.perf_counter() * 1000

    ax_timeline.clear()

    ax_timeline.set_title(f"{mode} timeline (live)")

    ax_timeline.set_xlabel("time (ms)")

    ax_timeline.set_ylabel("worker")


    # Draw intervals so far

    for task_id, start, dur in list(results):

      ax_timeline.broken_barh(

        [(start, dur)], (task_id + 0.1, 0.8),

        facecolors=colors[task_id % len(colors)]

      )

    ax_timeline.grid(True, linestyle=":", alpha=0.5)


    # CPU usage

    usage = psutil.cpu_percent(percpu=True)

    cpu_data.append(usage)

    elapsed = (time.perf_counter() * 1000)

    cpu_timestamps.append(elapsed)


    ax_cpu.clear()

    for core in range(len(cpu_data[0])):

      core_usage = [row[core] for row in cpu_data]

      ax_cpu.plot(cpu_timestamps, core_usage, label=f"core {core}")

    ax_cpu.set_title("CPU utilization (live)")

    ax_cpu.set_xlabel("time (ms)")

    ax_cpu.set_ylabel("CPU %")

    ax_cpu.legend(fontsize="x-small", ncol=2)


  ani = animation.FuncAnimation(fig, update, interval=100)

  plt.tight_layout()

  plt.show()


  task_runner.join()



# -----------------------------

# Main

# -----------------------------

if __name__ == "__main__":

  print("Running live multithreading demo...")

  animate_execution(mode="threading", duration=2)


  print("Running live multiprocessing demo...")

  animate_execution(mode="multiprocessing", duration=2)

From Blogger iPhone client

Multi threading vs multi processing


  • Multithreading: Many small alternating tasks → looks like a checkerboard pattern.
  • Multiprocessing: Few long blocks per worker → each process hogs its slice until done.



  • Threading: runs 80 small tasks (short sleep) → gives many short alternating bars.
  • Processing: runs 1 long task per process → gives big contiguous blocks.
  • CPU Usage: shown in third subplot to compare how system cores are actually loaded.





import time

import random

import threading

import multiprocessing as mp

import matplotlib.pyplot as plt

import psutil


# -----------------------------

# Worker function

# -----------------------------

def worker(task_id, sleep_time, results, lock):

  start = time.perf_counter() * 1000 # ms

  time.sleep(sleep_time)       # simulate work

  end = time.perf_counter() * 1000

  with lock:

    results.append((task_id, start, end - start))



# -----------------------------

# Multithreading: many small tasks

# -----------------------------

def run_multithreading(num_threads=4, num_tasks=80):

  results = []

  lock = threading.Lock()

  threads = []


  for i in range(num_tasks):

    # short bursts, interleaved

    t = threading.Thread(

      target=worker,

      args=(i % num_threads, random.uniform(0.005, 0.02), results, lock)

    )

    threads.append(t)

    t.start()


  for t in threads:

    t.join()


  return results



# -----------------------------

# Multiprocessing: few long tasks

# -----------------------------

def run_multiprocessing(num_procs=4):

  manager = mp.Manager()

  results = manager.list()

  lock = manager.Lock()

  procs = []


  for i in range(num_procs):

    # each process does a single long job

    p = mp.Process(

      target=worker,

      args=(i, random.uniform(0.2, 0.4), results, lock)

    )

    procs.append(p)

    p.start()


  for p in procs:

    p.join()


  return list(results)



# -----------------------------

# Collect CPU usage

# -----------------------------

def collect_cpu_usage(duration=2, interval=0.05):

  cpu_data = []

  timestamps = []

  start = time.perf_counter()

  while time.perf_counter() - start < duration:

    usage = psutil.cpu_percent(percpu=True)

    cpu_data.append(usage)

    timestamps.append((time.perf_counter() - start) * 1000) # ms

    time.sleep(interval)

  return timestamps, cpu_data



# -----------------------------

# Plotting

# -----------------------------

def plot_results(results, mode="threading", cpu_timestamps=None, cpu_data=None):

  colors = ['#7fcfd4', '#fff29b', '#c8c0ff', '#ff8f80']

  num_workers = len(set(r[0] for r in results))


  fig, axs = plt.subplots(1, 3, figsize=(16, 5))


  # --- Timeline ---

  for task_id, start, dur in results:

    axs[0].broken_barh(

      [(start, dur)], (task_id + 0.1, 0.8),

      facecolors=colors[task_id % len(colors)]

    )

  axs[0].set_xlabel("time (ms)")

  axs[0].set_ylabel("worker")

  axs[0].set_yticks(range(num_workers))

  axs[0].set_title(f"{mode} timeline")

  axs[0].grid(True, linestyle=":", alpha=0.5)


  # --- Work totals ---

  totals = {}

  for task_id, _, dur in results:

    totals[task_id] = totals.get(task_id, 0) + dur

  axs[1].bar(

    list(totals.keys()),

    list(totals.values()),

    color=[colors[k % len(colors)] for k in totals.keys()],

    edgecolor="k"

  )

  axs[1].set_xlabel("worker")

  axs[1].set_ylabel("time (ms)")

  axs[1].set_title(f"{mode} total work")


  # --- CPU usage ---

  if cpu_timestamps and cpu_data:

    for core in range(len(cpu_data[0])):

      core_usage = [row[core] for row in cpu_data]

      axs[2].plot(cpu_timestamps, core_usage, label=f"core {core}")

    axs[2].set_xlabel("time (ms)")

    axs[2].set_ylabel("CPU %")

    axs[2].set_title("CPU utilization")

    axs[2].legend(fontsize="x-small", ncol=2)

  else:

    axs[2].set_visible(False)


  plt.tight_layout()

  plt.show()



# -----------------------------

# Main

# -----------------------------

if __name__ == "__main__":

  # --- Multithreading: many small jobs ---

  print("Running multithreading...")

  cpu_t, cpu_d = collect_cpu_usage(duration=2)

  thread_results = run_multithreading(num_threads=4, num_tasks=80)

  plot_results(thread_results, mode="threading", cpu_timestamps=cpu_t, cpu_data=cpu_d)


  # --- Multiprocessing: few long jobs ---

  print("Running multiprocessing...")

  cpu_t, cpu_d = collect_cpu_usage(duration=2)

  proc_results = run_multiprocessing(num_procs=4)

  plot_results(proc_results, mode="multiprocessing", cpu_timestamps=cpu_t, cpu_data=cpu_d)

From Blogger iPhone client

Data engineering

https://www.linkedin.com/posts/iamarifalam_%F0%9D%90%93%F0%9D%90%A1%F0%9D%90%9E-%F0%9D%90%8E%F0%9D%90%8D%F0%9D%90%8B%F0%9D%90%98-%F0%9D%90%83%F0%9D%90%9A%F0%9D%90%AD%F0%9D%90%9A-%F0%9D%90%84%F0%9D%90%A7%F0%9D%90%A0%F0%9D%90%A2%F0%9D%90%A7%F0%9D%90%9E%F0%9D%90%9E%F0%9D%90%AB-activity-7364920061062541312-2KNU?utm_source=share&utm_medium=member_ios&rcm=ACoAAAK9IekB_8VSR9kYCe66_RGeIqPa8Tk1zVw


Most beginners waste months learning random tools.

Here’s the real sequence that actually lands jobs:


𝟏. 𝐏𝐫𝐨𝐠𝐫𝐚𝐦𝐦𝐢𝐧𝐠 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧

Start with Python + SQL. Don’t chase 5 languages.

Example: Write a SQL query to analyze sales data, then transform it into a Pandas dataframe.


Free resource: https://lnkd.in/duzDYxyW


𝟐. 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 & 𝐖𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞𝐬

Learn how companies store data. Understand Star Schema vs Snowflake.

Think of Netflix: They model users, shows, watch history so recommendations work fast.


Free resource: https://lnkd.in/gWivy67u


𝟑. 𝐄𝐓𝐋 / 𝐄𝐋𝐓 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞𝐬

Learn how raw messy data becomes clean, structured data. Tools like Airflow and dbt matter more than shiny dashboards.


Free resource: https://lnkd.in/g-Wzxx9s


𝟒. 𝐁𝐢𝐠 𝐃𝐚𝐭𝐚 𝐒𝐲𝐬𝐭𝐞𝐦𝐬

Handle data at scale with Spark, Kafka, Hadoop.

Example: Twitter streams millions of tweets Kafka pipelines process them in real-time.


Free resource: https://lnkd.in/gizvbK3B


𝟓. 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠

AWS, Azure, GCP all have their data stacks. Learn one deeply.

Companies don’t hire tool collectors, they hire people who can deliver value.


Free resource: https://lnkd.in/dKHXFDNR


𝟔. 𝐃𝐚𝐭𝐚 𝐆𝐨𝐯𝐞𝐫𝐧𝐚𝐧𝐜𝐞 & 𝐎𝐩𝐬

Logging, monitoring, and security make you senior-level.

Example: An e-commerce site failing to mask customer PII → million-dollar fines.


Free resource: https://lnkd.in/grni-NfF


𝟕. 𝐅𝐢𝐧𝐚𝐥 𝐏𝐫𝐨𝐣𝐞𝐜𝐭 (𝐲𝐨𝐮𝐫 𝐠𝐨𝐥𝐝𝐞𝐧 𝐭𝐢𝐜𝐤𝐞𝐭)

Build an end-to-end pipeline:

Pull data → Clean it → Store it in warehouse → Expose it via API → Dashboard on top.


Free resource: https://lnkd.in/gBGCsnrx


𝐓𝐡𝐢𝐬 𝐢𝐬 𝐧𝐨𝐭 𝐚 𝐭𝐨𝐨𝐥 𝐜𝐡𝐞𝐜𝐤𝐥𝐢𝐬𝐭. 𝐓𝐡𝐢𝐬 𝐢𝐬 𝐡𝐨𝐰 𝐝𝐚𝐭𝐚 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐬 𝐚𝐫𝐞 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐡𝐢𝐫𝐞𝐝.


Save this roadmap. Read it again in 6 months you’ll see the difference.


That's a wrap!! 


- Python 🐍

- AI/ML 🤖

- Data Science 🐼

- SW Dev 🛠

- AI Tools 🧰

- Roadmap ❗️


Find me → Arif Alam ✔️


Everyday, I share post on above topics.

From Blogger iPhone client