← Kembali ke Blog

Bekerja dengan File JSON Besar: Panduan Performa 2026

Pelajari cara menangani file JSON besar secara efisien. Mencakup parser streaming, optimasi memori, dan alat khusus untuk data besar.

Big JSON Team12 menit bacalanjutan
B

Big JSON Team

Technical Writer

Expert in JSON data manipulation, API development, and web technologies. Passionate about creating tools that make developers' lives easier.

12 min read

Tantangannya

File JSON besar (100MB+) dapat menyebabkan:

  • Masalah memori
  • Penguraian (parsing) yang lambat
  • Editor crash
  • Kesulitan dalam debugging

Solusi

Big JSON Viewer (Direkomendasikan)

Dapat menangani file hingga beberapa ratus MB:

  • Lazy loading (pemuatan malas)
  • Virtual scrolling (gulir virtual)
  • Efisiensi memori
  • Fungsi pencarian

Kunjungi bigjson.online

Parser Streaming

Python ijson

import ijson

with open('large.json', 'rb') as f:

for item in ijson.items(f, 'items.item'):

process(item)

Node.js stream-json

const { parser } = require('stream-json');

const { streamArray } = require('stream-json/streamers/StreamArray');

fs.createReadStream('large.json')

.pipe(parser())

.pipe(streamArray())

.on('data', ({ value }) => {

process(value);

});

Baris Perintah dengan jq

# Pemrosesan aliran (stream processing)

jq -c '.items[]' large.json | head -10

# Ekstrak field tertentu

jq '.data | length' large.json

# Filter

jq '.items[] | select(.active)' large.json

Optimasi Memori

Proses dalam Bongkahan (Chunks)

def process_jsonl_chunks(filename, chunk_size=10000):

chunk = []

with open(filename) as f:

for line in f:

chunk.append(json.loads(line))

if len(chunk) >= chunk_size:

yield chunk

chunk = []

for chunk in process_jsonl_chunks('data.jsonl'):

process_batch(chunk)

Pola Generator

def stream_records(filename):

with open(filename, 'rb') as f:

for record in ijson.items(f, 'records.item'):

yield record

total = sum(r['amount'] for r in stream_records('large.json'))

Membagi File Besar

import ijson

def split_json_streaming(input_file, output_prefix, records_per_file=100000):

chunk = []

file_num = 1

with open(input_file, 'rb') as f:

for item in ijson.items(f, 'items.item'):

chunk.append(item)

if len(chunk) >= records_per_file:

with open(f'{output_prefix}_{file_num:04d}.json', 'w') as out:

json.dump(chunk, out)

chunk = []

file_num += 1

Format JSON Lines

Satu objek JSON per baris - lebih mudah untuk streaming:

# Konversi ke JSONL

with open('data.json') as f:

data = json.load(f)

with open('data.jsonl', 'w') as f:

for item in data['items']:

f.write(json.dumps(item) + '\n')

Perbandingan Performa

| Metode | 100MB | 1GB | Memori |

|--------|-------|-----|--------|

| json.load() | 5d | 50detik+ | 2-3x file |

| ijson | 15d | 150d | ~50MB |

| jq | 3d | 30d | Streams |

| Big JSON Viewer | 2d | 10d | Dioptimalkan |

Praktik Terbaik

  • Ketahui ukuran file terlebih dahulu
  • Gunakan streaming untuk file > 100MB
  • Konversi ke JSONL untuk pemrosesan berulang
  • Pertimbangkan database untuk data yang sangat besar
  • Pantau penggunaan memori
  • Alternatif Database

    Untuk dataset yang sangat besar, gunakan database:

    import sqlite3
    

    import json

    conn = sqlite3.connect('data.db')

    cursor = conn.cursor()

    cursor.execute('''

    CREATE TABLE items (

    id INTEGER PRIMARY KEY,

    data TEXT

    )

    ''')

    with open('large.json') as f:

    for item in ijson.items(f, 'items.item'):

    cursor.execute('INSERT INTO items (data) VALUES (?)',

    (json.dumps(item),))

    conn.commit()

    Memantau Performa

    import time
    

    import psutil

    def measure_memory():

    process = psutil.Process()

    return process.memory_info().rss / 1024 / 1024 # MB

    start_mem = measure_memory()

    start_time = time.time()

    # Proses data

    for item in stream_records('large.json'):

    process(item)

    print(f"Waktu: {time.time() - start_time:.2f}d")

    print(f"Memori: {measure_memory() - start_mem:.2f}MB")

    Kesimpulan

    Untuk file > 100MB, gunakan Big JSON Viewer untuk melihat dan ijson/jq untuk memproses. Jangan biarkan ukuran file memperlambat Anda!

    Share:

    Artikel Terkait

    Read in English