Lädt...

🔧 Stop One-By-One Inserts! Bulk Insert Data In Rails With insert_all Or activerecord-import ⚡


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

If your CSV import is slower than dial-up internet, discover how bulk inserts can rocket your Rails performance. 🐢

1. Why Bulk Inserts Matter?

When dealing with large datasets for example, you want to import a CSV containing more than a thousand of tasks from your department, inserting records one by one (.create or .save) can drastically slow down performance and cause database bottlenecks

Imagine inserting 100,000 records—doing it traditionally means 100,000 separate queries! Instead, bulk inserting reduces this to a single efficient query, saving time and resources.

2. Prerequisites

Before diving in, ensure:

  • You're using Rails 6+ for insert_all
  • You have the activerecord-import gem installed if using an older version which supports Rails 3.x

3. Step-by-Step Guide

3.1 The Problem With Each-By-Each Inserts

For example, you want to create 100k records of tasks by a naive way, this is how it look like:

  # Create 100k records each times
  100_000.times do |i|
    Task.create!({ name: "Task #{i}", description: "This is description of task #{i}", is_finished: false })
  end

I did some benchmark here with this snippet:

  execution_time = Benchmark.measure do
    100_000.times do |i|
      Task.create!({ name: "Task #{i}", description: "This is description of task #{i}", is_finished: false })
    end
  end

  puts "Execution time: #{execution_time.real} seconds"

Benchmark One-by-One

Image shows that the execution of record creation takes 20.99 seconds

✅ Pros:

  1. Models validation or callbacks works - the name field is not null but there is a call with null name, the insert will be raise with errors
  2. Older Rails versions < 6.0 support - Legacy project without bulk insert support.

❌ Cons:

  1. Performance Issue - Each insert requires a separate database transaction, increasing overhead.
  2. Increased Network Traffic - 100k request will be made into the database which increases network costs.

3.2 Solution 1: Use insert_all

The insert_all method allows you to insert multiple records at once, significantly boosting performance.

Example: Using insert_all for create 100k tasks

  tasks = []

  # Initialize 100k tasks hash then add to tasks array
  100_000.times do |i|
    tasks << { name: "Task #{i}", description: "This is description of task #{i}", is_finished: false }
  end

  # Bulk insert all the initialized tasks
  Task.insert_all(tasks)

Insert All

✅ Pros:

  1. Faster than .create - Single SQL query
  2. Rails built-in method - Bulk operations directly in Rails without extra gems.

❌ Cons:

  1. Does not accept ActiveRecord models – Only works with raw hashes.
  2. Bypasses model validations and callbacks – Data integrity must be handled manually.
  3. Cannot handle associations automatically – Requires extra queries to fetch related IDs.

Example: Inserting Tasks that Belong to Department

  departments = [{ name: "Marketing" }] 
  department = Department.insert_all(departments) 
  department_id = department.id # Error raised

  tasks = [{ name: "Task 0", description: "This is the task description", is_finished: false, department_id: ??? }] # missing department_id here

No ID returns error exception

Image shows that no id can be returned after using insert_all

👉 Fix: Manually retrieve department_id before inserting tasks, adding an extra query.

  department = Department.find_by(name: "Marketing") # cost a SQL query
  tasks = [{ name: "Task 0", description: "This is the task description", is_finished: false, department_id: department.id }]

3.3 Solution 2: Best of both worlds - Use activerecord-import gem

The import method allows you to insert models with associations

Example: Using import to create 100k tasks in 1 department

  department = Department.new(name: "Marketing")
  tasks = []

  100_000.times do |i|
    tasks << Task.new({ name: "Task #{i}", description: "This is description of task #{i}", is_finished: false })
  end
  department.tasks = tasks

  execution_time = Benchmark.measure do
    Department.import [department]
  end

  puts "Execution time: #{execution_time.real} seconds"
  • Able to import 1 department with 100k tasks with the use of Active Record Import
    Association Import

  • Observe the performance compared to insert_all is similar to each other ( About 1.44 seconds )

Activerecord-import benchmark

✅ Pros:

  • Works with raw columns and arrays of values (fastest)
  • Works with model objects (faster)
  • Performs validations (fast)
  • Performs on duplicate key updates (requires MySQL, SQLite 3.24.0+, or Postgres 9.5+)

❌ Cons:

  • Need an extra gem installed
  • ActiveRecord callbacks related to creatingupdating, or destroying records (other than before_validation and after_validation) will NOT be called when calling the import method. ( calling separately with run_callbacks as recommended )

4. Conclusion

One rule to remember when inserting a large number of records ( Example: Import a large number of records from a CSV file ) is to avoid creating records one-by-one. Instead, we can consider using activerecord-import or insert_all for a great performance.

5. Reference

...

🔧 [SQL Performance Killers] Individual inserts vs. Bulk inserts


📈 64.43 Punkte
🔧 Programmierung

🔧 Generating SQL Insert Statements from ActiveRecord Models in Ruby on Rails


📈 47.19 Punkte
🔧 Programmierung

🕵️ CVE-2023-22794 | activerecord Gem on Ruby ActiveRecord Comment sql injection


📈 44.75 Punkte
🕵️ Sicherheitslücken

🔧 Getting Stale Data for ActiveRecord Associations in Rails: `Model.reload` to fetch latest data


📈 38.06 Punkte
🔧 Programmierung

🔧 Introducing Rails Vault: simple to add settings to any ActiveRecord model


📈 32.8 Punkte
🔧 Programmierung

🔧 Optimize Database Performance in Ruby on Rails and ActiveRecord


📈 32.8 Punkte
🔧 Programmierung

🔧 SQL e Ruby: Explorando Banco de Dados Além do Rails ActiveRecord


📈 32.8 Punkte
🔧 Programmierung

🔧 What is Rails ActiveRecord?


📈 32.8 Punkte
🔧 Programmierung

🔧 Otimização de consultas com ActiveRecord e índices no Rails


📈 32.8 Punkte
🔧 Programmierung

🕵️ Ruby on Rails up to 3.0.3 SQL ActiveRecord .where sql injection


📈 32.8 Punkte
🕵️ Sicherheitslücken

🕵️ activerecord-session_store up to 1.1.3 on Ruby on Rails timing discrepancy


📈 32.8 Punkte
🕵️ Sicherheitslücken

🕵️ Ruby on Rails prior 0.5.0 ActiveRecord sql injection


📈 32.8 Punkte
🕵️ Sicherheitslücken

🔧 SQL INSERT Statement – How to Insert Data into a Table in SQL


📈 31.4 Punkte
🔧 Programmierung

🔧 LabEx Trending: Bulk Insert Data Into Course Schedule and More


📈 31.11 Punkte
🔧 Programmierung

🔧 LabEx Trending: Bulk Insert Data Into Course Schedule and More


📈 31.11 Punkte
🔧 Programmierung

🕵️ CVE-2023-42426 | Froala Editor 4.1.1 Insert Image Insert link cross site scripting


📈 28.77 Punkte
🕵️ Sicherheitslücken

🕵️ Medium CVE-2021-29933: Insert many project Insert many


📈 28.77 Punkte
🕵️ Sicherheitslücken

🔧 EF Core Bulk Insert: Boost Your Performance With Entity Framework Extensions


📈 28.48 Punkte
🔧 Programmierung

🔧 Bulk Insert/Update task


📈 28.48 Punkte
🔧 Programmierung

🔧 How to Perform Bulk Insert with EF Core


📈 28.48 Punkte
🔧 Programmierung

📰 How to Perform Bulk Insert/Update/Upsert Actions with SQLAlchemy ORM


📈 28.48 Punkte
🔧 AI Nachrichten

🔧 Bulk Email Service Provider &amp; Bulk email Services - Mass email Services


📈 28.2 Punkte
🔧 Programmierung

🍏 DRPU Bulk SMS 10.2.4.4 - Bulk messaging utility.


📈 28.2 Punkte
🍏 iOS / Mac OS

🔧 Python code that inserts a large number of records into a MySQL table.


📈 25.17 Punkte
🔧 Programmierung

🔧 How to find the number of SELECTs, INSERTs, DELETEs, and UPDATEs on a specific table in PostgreSQL?


📈 25.17 Punkte
🔧 Programmierung

🍏 Fix: Vision Pro Lens Glare Using ZEISS Inserts


📈 25.17 Punkte
🍏 iOS / Mac OS

🍏 Hands on with Apple Vision Pro ZEISS optical inserts


📈 25.17 Punkte
🍏 iOS / Mac OS

📰 Sehhilfen für Vision Pro: Optical Inserts von Zeiss brauchen ein Pairing


📈 25.17 Punkte
📰 IT Nachrichten

📰 Cubase 9.5: Musik-DAW mit mehr Inserts und Bezierkurven


📈 25.17 Punkte
📰 IT Nachrichten

matomo