Section 1: Arrays & Strings
Q1. What is the difference between an Array and a Linked List?
Arrays store elements in contiguous memory locations, providing O(1) random access via index. Insertion and deletion are O(n) because elements must be shifted. Linked Lists use nodes connected by pointers — each node stores data and a reference to the next node. Insertion/deletion at a known position is O(1), but accessing an element by index is O(n) since you must traverse from the head.
Arrays are cache-friendly (sequential memory) and preferred when you need fast reads. Linked Lists excel when data size is unpredictable and frequent insertions/deletions are needed. In practice, dynamic arrays (ArrayList, Python list, vector) combine the benefits of both.
Q2. How do you find the second largest element in an array?
The optimal approach uses a single pass with two variables — O(n) time, O(1) space:
def second_largest(arr):
first = second = float('-inf')
for num in arr:
if num > first:
second = first
first = num
elif num > second and num != first:
second = num
return second if second != float('-inf') else None
Avoid sorting (O(n log n)) when a single-pass O(n) solution exists. Interviewers test whether you can optimize beyond the obvious approach.
Q3. How do you check if a string is a palindrome?
Use the two-pointer technique — one pointer at the start, one at the end, moving inward:
def is_palindrome(s):
s = s.lower().replace(' ', '')
left, right = 0, len(s) - 1
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
Time: O(n), Space: O(1). The Pythonic s == s[::-1] works but uses O(n) extra space. Interviewers prefer the two-pointer approach because it demonstrates algorithmic thinking.
Q4. How do you find duplicate elements in an array?
Multiple approaches with different trade-offs:
- HashSet: O(n) time, O(n) space — iterate and check membership.
- Sorting: O(n log n) time, O(1) space — sort then check adjacent pairs.
- Floyd's Cycle Detection: O(n) time, O(1) space — for arrays where values are in range [1, n].
def find_duplicates(arr):
seen = set()
duplicates = []
for num in arr:
if num in seen:
duplicates.append(num)
seen.add(num)
return duplicates
Section 2: Linked Lists, Stacks & Queues
Q5. How do you reverse a Linked List?
Iterative approach — O(n) time, O(1) space:
def reverse_list(head):
prev = None
current = head
while current:
next_node = current.next
current.next = prev
prev = current
current = next_node
return prev
This is one of the most frequently asked interview questions. You maintain three pointers — prev, current, and next_node — and reverse links one by one. Be ready to also explain the recursive approach.
Q6. How do you detect a cycle in a Linked List?
Use Floyd's Cycle Detection (Tortoise and Hare): two pointers move at different speeds. If a cycle exists, they will eventually meet.
def has_cycle(head):
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast:
return True
return False
Time: O(n), Space: O(1). A HashSet approach uses O(n) space. Floyd's is the gold standard answer.
Q7. What is a Stack and what are its real-world applications?
A Stack is a LIFO (Last In, First Out) data structure. The last element added is the first removed. Core operations: push(), pop(), peek() — all O(1).
Real-world applications:
- Browser back button: Each page visit is pushed; back pops the last page.
- Undo/Redo: Editors use two stacks — one for undo, one for redo.
- Function call stack: Each function call is pushed onto the call stack; return pops it.
- Expression evaluation: Infix to postfix conversion, bracket matching.
- DFS traversal: Depth-first search uses a stack (explicitly or via recursion).
Q8. What is the difference between a Stack and a Queue?
Stack (LIFO): Last element added is first removed. Think of a stack of plates. Operations: push/pop from the top.
Queue (FIFO): First element added is first removed. Think of a line at a ticket counter. Operations: enqueue at rear, dequeue from front.
Variants: Priority Queue (elements dequeued by priority, not order — implemented via heap), Deque (double-ended queue — insert/remove from both ends), Circular Queue (wraps around to reuse space).
Section 3: Trees & Graphs
Q9. What is a Binary Search Tree (BST)?
A BST is a binary tree where for every node: all values in the left subtree are smaller, and all values in the right subtree are larger. This property enables O(log n) search, insert, and delete operations (on average). In the worst case (skewed tree), operations degrade to O(n).
Self-balancing variants like AVL Trees and Red-Black Trees guarantee O(log n) by automatically rebalancing after insertions/deletions. Java's TreeMap and C++'s std::map use Red-Black Trees internally.
Q10. What are BFS and DFS? When do you use each?
BFS (Breadth-First Search): Explores nodes level by level using a queue. Finds the shortest path in unweighted graphs. Use for: shortest path, level-order traversal, finding connected components.
DFS (Depth-First Search): Explores as deep as possible before backtracking, using a stack (or recursion). Use for: cycle detection, topological sorting, maze solving, connected components in directed graphs.
Time complexity for both: O(V + E) where V = vertices, E = edges. BFS uses O(V) space for the queue; DFS uses O(V) for the recursion stack.
Q11. How do you find the height of a binary tree?
def height(root):
if root is None:
return -1 # or 0, depending on convention
return 1 + max(height(root.left), height(root.right))
Classic recursion problem. Time: O(n) — visit every node. Space: O(h) where h is tree height (recursion stack). Interviewers often follow up with: "What if the tree has millions of nodes?" — then use an iterative BFS approach with a queue.
Section 4: Recursion & Dynamic Programming
Q12. What is Recursion and what are its pitfalls?
Recursion is when a function calls itself to solve smaller instances of the same problem. Every recursive solution has a base case (stopping condition) and a recursive case.
Pitfalls: Stack overflow for deep recursion (Python default limit: 1000); redundant computation without memoization (naive Fibonacci is O(2^n)); difficult debugging. Always consider whether an iterative approach is simpler. Use recursion when the problem has a natural recursive structure (trees, divide-and-conquer, backtracking).
Q13. What is Dynamic Programming and when should you use it?
DP is an optimization technique for problems with two properties: overlapping subproblems (same subproblems solved multiple times) and optimal substructure (optimal solution can be built from optimal solutions of subproblems).
Two approaches:
- Top-Down (Memoization): Recursive + cache results in a dictionary/array.
- Bottom-Up (Tabulation): Iterative — fill a table from smallest subproblem to largest.
# Fibonacci — Bottom-Up DP: O(n) time, O(1) space
def fib(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
Classic DP problems: Climbing Stairs, Coin Change, Longest Common Subsequence, Knapsack, Edit Distance.
Q14. Solve the Climbing Stairs problem.
You can climb 1 or 2 steps at a time. How many distinct ways to reach step n?
def climb_stairs(n):
if n <= 2:
return n
a, b = 1, 2
for _ in range(3, n + 1):
a, b = b, a + b
return b
# climb_stairs(5) → 8
This is essentially Fibonacci. The key insight: ways(n) = ways(n-1) + ways(n-2). Recognizing this pattern is what interviewers test.
Section 5: OOPs Concepts
Q15. What are the four pillars of Object-Oriented Programming?
- Encapsulation: Bundling data (attributes) and methods that operate on that data into a single unit (class). Access modifiers (private, protected, public) control visibility. Protects internal state from outside interference.
- Abstraction: Hiding implementation complexity and exposing only the essential interface. Abstract classes and interfaces define what an object does without specifying how.
- Inheritance: Creating new classes (child) from existing classes (parent), inheriting their properties and methods. Promotes code reuse. Types: single, multi-level, hierarchical, multiple (via interfaces).
- Polymorphism: Same interface, different implementations. Compile-time (method overloading — same name, different parameters) and Runtime (method overriding — child class redefines parent method).
Q16. What is the difference between an Abstract Class and an Interface?
- Abstract Class: Can have both abstract (unimplemented) and concrete (implemented) methods. Can have state (instance variables). A class can extend only one abstract class. Use when classes share common behavior.
- Interface: Only method signatures (in traditional OOP). No state. A class can implement multiple interfaces. Use to define a contract that unrelated classes must follow.
In modern languages (Java 8+, C# 8+), interfaces can have default method implementations, blurring the distinction. The key difference remains: single inheritance for abstract classes vs. multiple inheritance for interfaces.
Q17. What are SOLID principles?
- S — Single Responsibility: A class should have only one reason to change.
- O — Open/Closed: Open for extension, closed for modification.
- L — Liskov Substitution: Subtypes must be substitutable for their base types without breaking behavior.
- I — Interface Segregation: Prefer many small, specific interfaces over one large, general-purpose interface.
- D — Dependency Inversion: Depend on abstractions, not concrete implementations.
SOLID principles are essential for writing maintainable, testable, scalable code. Interviewers at senior levels expect you to apply these in system design discussions.
Section 6: DBMS & Operating System
Q18. What is Normalization in DBMS?
Normalization organizes database tables to reduce redundancy and dependency. Key normal forms:
- 1NF: Each column contains atomic (indivisible) values. No repeating groups.
- 2NF: 1NF + no partial dependency — every non-key column depends on the entire primary key.
- 3NF: 2NF + no transitive dependency — non-key columns depend only on the primary key, not on other non-key columns.
- BCNF: Stricter 3NF — every determinant is a candidate key.
In practice, most production databases are normalized to 3NF. Denormalization (intentionally adding redundancy) is used for read-heavy analytical workloads where join performance matters.
Q19. What are ACID properties in databases?
- Atomicity: A transaction is all-or-nothing. If any part fails, the entire transaction rolls back.
- Consistency: A transaction moves the database from one valid state to another, maintaining all constraints.
- Isolation: Concurrent transactions don't interfere with each other. Isolation levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable.
- Durability: Once committed, data persists even after system crashes (written to disk/logs).
ACID is critical for relational databases (SQL Server, PostgreSQL, MySQL). NoSQL databases often relax ACID for availability and partition tolerance (CAP theorem).
Q20. What is the difference between a Process and a Thread?
Process: An independent program in execution with its own memory space, code, data, and system resources. Processes are isolated — one crash doesn't affect others. Inter-process communication (IPC) is expensive (pipes, sockets, shared memory).
Thread: A lightweight unit of execution within a process. Threads share the same memory space and resources of their parent process. Context switching between threads is faster than between processes. However, shared memory means you need synchronization (locks, mutexes, semaphores) to avoid race conditions.
Modern applications use multi-threading for parallelism (web servers handling multiple requests) and multi-processing for isolation (Chrome runs each tab as a separate process).
Q21. What is Deadlock and how do you prevent it?
Deadlock occurs when two or more processes are waiting for each other to release resources, creating a circular dependency where none can proceed. Four necessary conditions (Coffman conditions):
- Mutual Exclusion: Resources cannot be shared.
- Hold and Wait: Process holds resources while waiting for more.
- No Preemption: Resources cannot be forcibly taken.
- Circular Wait: Circular chain of processes waiting for each other.
Prevention: Break any one condition. Most practical approach: resource ordering — always acquire locks in the same global order to prevent circular wait.
Section 7: System Design
Q22. How would you design a URL shortener (like bit.ly)?
Key components:
- Encoding: Generate a unique short code (6-8 chars) using Base62 encoding of an auto-increment ID or a hash (MD5/SHA256 truncated).
- Database: Store mapping (short_code → long_url). Use a relational DB or key-value store (Redis for caching, DynamoDB for storage).
- Read path: User hits short URL → lookup short_code → 301 redirect to long_url. Cache aggressively (90%+ read traffic).
- Scale: Horizontal scaling with load balancer, database sharding by short_code range, Redis cache layer, CDN for popular URLs.
- Analytics: Track clicks, referrers, geolocation asynchronously via message queue (Kafka).
Estimated scale: 100M URLs, 10:1 read/write ratio, ~1000 writes/sec, ~10000 reads/sec.
Q23. What is the CAP Theorem?
The CAP Theorem states that a distributed system can guarantee only two of three properties simultaneously:
- Consistency: Every read receives the most recent write.
- Availability: Every request receives a response (not necessarily the latest data).
- Partition Tolerance: The system continues operating despite network failures between nodes.
Since network partitions are inevitable in distributed systems, the real choice is between CP (Consistency + Partition Tolerance) — e.g., MongoDB, HBase — and AP (Availability + Partition Tolerance) — e.g., Cassandra, DynamoDB. Most modern systems use tunable consistency to balance based on the use case.
Q24. What is Load Balancing and what are common strategies?
Load balancing distributes incoming traffic across multiple servers to ensure no single server is overwhelmed. Strategies:
- Round Robin: Requests distributed sequentially across servers. Simple but ignores server load.
- Least Connections: Routes to the server with fewest active connections. Better for varying request durations.
- IP Hash: Routes based on client IP — ensures same client always hits same server (useful for session affinity).
- Weighted: Assigns weights based on server capacity — powerful servers get more traffic.
Tools: NGINX, HAProxy, AWS ALB/NLB, Azure Load Balancer, Cloudflare. Layer 4 (TCP) load balancing is faster; Layer 7 (HTTP) enables content-based routing.
Q25. What is Caching and when should you use it?
Caching stores frequently accessed data in a fast-access layer (memory) to reduce latency and backend load. Cache layers in a typical system:
- Browser Cache: Static assets (CSS, JS, images) cached on client side.
- CDN Cache: Content cached at edge locations globally (Cloudflare, Azure CDN).
- Application Cache: In-memory cache (Redis, Memcached) for DB query results, API responses, session data.
- Database Cache: Query cache, buffer pool within the database engine.
Cache invalidation strategies: TTL (Time to Live), Write-Through (update cache on write), Write-Behind (async cache update), Cache-Aside (lazy loading — populate on first miss). Remember: "There are only two hard things in computer science: cache invalidation and naming things."