← Back to Blog

Working with Large JSON Files: Performance Guide 2026

Learn to handle large JSON files efficiently. Covers streaming parsers, memory optimization, and specialized tools for big data.

Big JSON Team12 min readadvanced
B

Big JSON Team

Technical Writer

Expert in JSON data manipulation, API development, and web technologies. Passionate about creating tools that make developers' lives easier.

12 min read

The Challenge

Large JSON files (100MB+) can cause:

  • Memory issues
  • Slow parsing
  • Editor crashes
  • Debugging difficulty

Solutions

Handles files up to several hundred MB:

  • Lazy loading
  • Virtual scrolling
  • Memory-efficient
  • Search functionality

Visit bigjson.online

Streaming Parsers

Python ijson

import ijson

with open('large.json', 'rb') as f:

for item in ijson.items(f, 'items.item'):

process(item)

Node.js stream-json

const { parser } = require('stream-json');

const { streamArray } = require('stream-json/streamers/StreamArray');

fs.createReadStream('large.json')

.pipe(parser())

.pipe(streamArray())

.on('data', ({ value }) => {

process(value);

});

Command Line with jq

# Stream processing

jq -c '.items[]' large.json | head -10

# Extract specific field

jq '.data | length' large.json

# Filter

jq '.items[] | select(.active)' large.json

Memory Optimization

Process in Chunks

def process_jsonl_chunks(filename, chunk_size=10000):

chunk = []

with open(filename) as f:

for line in f:

chunk.append(json.loads(line))

if len(chunk) >= chunk_size:

yield chunk

chunk = []

for chunk in process_jsonl_chunks('data.jsonl'):

process_batch(chunk)

Generator Pattern

def stream_records(filename):

with open(filename, 'rb') as f:

for record in ijson.items(f, 'records.item'):

yield record

total = sum(r['amount'] for r in stream_records('large.json'))

Splitting Large Files

import ijson

def split_json_streaming(input_file, output_prefix, records_per_file=100000):

chunk = []

file_num = 1

with open(input_file, 'rb') as f:

for item in ijson.items(f, 'items.item'):

chunk.append(item)

if len(chunk) >= records_per_file:

with open(f'{output_prefix}_{file_num:04d}.json', 'w') as out:

json.dump(chunk, out)

chunk = []

file_num += 1

JSON Lines Format

One JSON object per line - easier to stream:

# Convert to JSONL

with open('data.json') as f:

data = json.load(f)

with open('data.jsonl', 'w') as f:

for item in data['items']:

f.write(json.dumps(item) + '\n')

Performance Comparison

| Method | 100MB | 1GB | Memory |

|--------|-------|-----|--------|

| json.load() | 5s | 50s+ | 2-3x file |

| ijson | 15s | 150s | ~50MB |

| jq | 3s | 30s | Streams |

| Big JSON Viewer | 2s | 10s | Optimized |

Best Practices

  • Know file size first
  • Use streaming for > 100MB
  • Convert to JSONL for repeated processing
  • Consider databases for very large data
  • Monitor memory usage
  • Database Alternative

    For very large datasets, use a database:

    import sqlite3
    

    import json

    conn = sqlite3.connect('data.db')

    cursor = conn.cursor()

    cursor.execute('''

    CREATE TABLE items (

    id INTEGER PRIMARY KEY,

    data TEXT

    )

    ''')

    with open('large.json') as f:

    for item in ijson.items(f, 'items.item'):

    cursor.execute('INSERT INTO items (data) VALUES (?)',

    (json.dumps(item),))

    conn.commit()

    Monitoring Performance

    import time
    

    import psutil

    def measure_memory():

    process = psutil.Process()

    return process.memory_info().rss / 1024 / 1024 # MB

    start_mem = measure_memory()

    start_time = time.time()

    # Process data

    for item in stream_records('large.json'):

    process(item)

    print(f"Time: {time.time() - start_time:.2f}s")

    print(f"Memory: {measure_memory() - start_mem:.2f}MB")

    Conclusion

    For files > 100MB, use Big JSON Viewer for viewing and ijson/jq for processing. Don't let file size slow you down!

    Share:

    Related Articles

    Read in other languages