Lädt...


🔧 A Comprehensive Guide to Scraping Instagram Data. How to bypass Instagram login while scraping - Facebook Spy / Meta Spy


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

Meta Spy: https://github.com/DEENUU1/meta-spy
Full code is available here: https://pastebin.com/QMmDUZtj

Info

This article is based on my project which I am still developing - Meta Spy (Facebook Spy before) this week I started to add commands for scraping data from Instagram, my idea is to expand this app for all Meta applications and also add Flet framework as a GUI because typing this commands is making me bored.

How to bypass login ?

Bypassing Instagram's login process might sound like a daunting task, but it's surprisingly straightforward. We'll extract the sessionid key from a browser where we're already logged in and integrate it into the Selenium driver. Here's a step-by-step guide:

  1. Launch Instagram in your browser and press F12 to open the Developer Tools.
  2. In the Developer Tools sidebar, select "Data."
  3. Locate and select the "Cookies" option, then choose cookies for instagram.com.
  4. Copy the sessionid value.

Image description

It's time to write some code

Now that we've covered the initial steps, it's time to dive into the code.

Setting Up Chrome Driver Options

To begin, we'll create a class with a static method that simplifies the configuration of the Chrome driver. This class will serve as the foundation for our scraper.


from typing import List  
from time import sleep  
from selenium.webdriver.common.by import By  
from selenium import webdriver  
from selenium.webdriver.support.ui import WebDriverWait  
from selenium.webdriver.chrome.options import Options

class Scraper:  

    @staticmethod  
    def _chrome_driver_configuration() -> Options:  
        chrome_options = Options()  
        chrome_options.add_argument("--disable-notifications")  
        chrome_options.add_argument("--disable-extensions")  
        chrome_options.add_argument("--disable-popup-blocking")  
        chrome_options.add_argument("--disable-default-apps")  
        chrome_options.add_argument("--disable-infobars")  
        chrome_options.add_argument("--disable-web-security")  
        chrome_options.add_argument(  
            "--disable-features=IsolateOrigins,site-per-process"  
        )  
        chrome_options.add_argument(  
            "--enable-features=NetworkService,NetworkServiceInProcess"  
        )  
        chrome_options.add_argument("--profile-directory=Default")  
        chrome_options.add_experimental_option("excludeSwitches", ["enable-logging"])  
        return chrome_options

Implementing the Base Scraper Class

While this tutorial might appear to introduce more classes than necessary, it aligns with our modular approach to project development. This approach allows us to showcase the complete implementation of specific functionalities.


class BaseInstagramScraper(Scraper):  
    def __init__(self, user_id: str, base_url: str) -> None:  
        super().__init__()  
        self._user_id = user_id  
        self._base_url = base_url.format(self._user_id)  
        self._driver = webdriver.Chrome(options=self._chrome_driver_configuration())  
        self._driver.get(self._base_url)  
        self._wait = WebDriverWait(self._driver, 10)

Scoll

Retrieving the full content from Instagram profiles requires scrolling, but it's not as simple as a one-time scroll-and-scrape process. When scrolling through a profile, data appears and disappears dynamically. As only a few rows of images are visible at a time, scrolling to the end and scraping the data is not feasible. To address this, we've created a function that provides a callback mechanism for dynamic content retrieval.

Our standard function scrolls the page down and captures all the visible content. However, in this case, dynamic data retrieval is necessary.

def scroll_page_callback(driver, callback) -> None:  
    """  
    Scrolls the page to load more data from a website    """    try:  
        last_height = driver.execute_script("return document.body.scrollHeight")  
        consecutive_scrolls = 0  

        while consecutive_scrolls < 3:  
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")  

            sleep(3)  
            new_height = driver.execute_script("return document.body.scrollHeight")  

            if new_height == last_height:  
                consecutive_scrolls += 1  
            else:  
                consecutive_scrolls = 0  

            last_height = new_height  

            callback(driver)  

    except Exception as e:  
        logs.log_error(f"Error occurred while scrolling: {e}")

Scraping data

Now, let's put all the pieces together and explore the main class responsible for scraping Instagram data.

class ProfileScraper(BaseInstagramScraper):  
    def __init__(self, user_id: str) -> None:  
        super().__init__(user_id, base_url=f"https://www.instagram.com/{user_id}/")  
        self._driver.add_cookie(  
            {  
                "name": "sessionid",  
                "value": "your_sessionid_goes_HERE",  
                "domain": ".instagram.com",  
            }  
        )  
        self._refresh_driver()  

    def _refresh_driver(self) -> None:  
        self._driver.refresh()

The ProfileScraper class inherits from the BaseInstagramScraper, which already includes Chrome driver configurations and more. We add the sessionid cookie to the driver, ensuring that the "value" field contains your sessionid. Next, we call the method:

self._refresh_driver

This method refreshes the driver and correctly loads any newly added cookies.

def extract_images(self) -> List[str]:  
    extracted_image_urls = []  
    try:  

        def extract_callback(driver):  
            img_elements = self._driver.find_elements(  
                By.CLASS_NAME,  
                "x5yr21d.xu96u03.x10l6tqk.x13vifvy.x87ps6o.xh8yej3",  
            )  
            for img_element in img_elements:  
                src_attribute = img_element.get_attribute("src")  
                if src_attribute and src_attribute not in extracted_image_urls:  
                    #print(f"Extracted image URL: {src_attribute}")  
                    extracted_image_urls.append(src_attribute)  
        scroll_page_callback(self._driver, extract_callback)  

    except Exception as e:  
        print(f"An  error occurred while extracting images: {e}")  

    return extracted_image_urls

The core of this class lies in the extract_images method, which returns a list of all scraped image URLs. Inside this method, we find the extract_callback function. It identifies image elements, prints them to the console, and adds them to the extracted_image_url list, checking for duplicates.

Finally, we call the scroll_page_callback function with the Chrome driver and the data extraction function as arguments, ensuring that our scraper works seamlessly.

With this comprehensive guide, you're well-equipped to dive into Instagram data scraping with Meta Spy. As we continue developing this project, expect more features and functionalities that expand its capabilities across all Meta applications. And don't forget, our plans to integrate Flet as a GUI promise to make the experience even more user-friendly. Happy scraping!

Running code


if __name__ == "__main__":  
    scraper = ProfileScraper("sawardega_wataha")  
    data = scraper.extract_images()  
    print(len(data))  
    print(data[0])

Inside ProfileScraper class add a user_id from instagram account.

Results

> python .\main.py
33 # This is a number of scraped urls 
# This is a full url to the scraped image 
https://scontent-waw1-1.cdninstagram.com/v/t51.2885-15/387688415_1338700880368645_3875950289382108239_n.jpg?stp=dst-jpg_e35&efg=eyJ2ZW5jb2RlX3RhZyI6ImltYWdlX3VybGdlbi4xNDQweDE4MDAuc2RyIn0&_nc_ht=scontent-waw1-1.cdnin
stagram.com&_nc_cat=101&_nc_ohc=-w6WTMiiWj4AX-_Qfkt&edm=ACWDqb8BAAAA&ccb=7-5&ig_cache_key=MzIxMTM2ODUyNjYzMDkzMTEzMA%3D%3D.2-ccb7-5&oh=00_AfDoHMVh0dS6msk5yKaW9d81HCeCSgBUJzW82sKRHYRvwQ&oe=65433911&_nc_sid=ee9879

...

📰 Taylor's gonna spy, spy, spy, spy, spy... fans can't shake cam off, shake cam off


📈 44.48 Punkte
📰 IT Security Nachrichten

🔧 Web Scraping: A Comprehensive Guide to Extracting Data from the Web


📈 30.12 Punkte
🔧 Programmierung

🔧 Scraping Users Social Behavior to Personalize Retail Stores Using Data Scraping


📈 27.8 Punkte
🔧 Programmierung

🔧 Next.js 14 Booking App with Live Data Scraping using Scraping Browser


📈 27.8 Punkte
🔧 Programmierung

🕵️ Irish data protection commission fines Meta over 2021 data-scraping leak


📈 26.74 Punkte
🕵️ Hacking

📰 Spy vs Spy vs Spy as Israel Watches Russian Hackers: NYT


📈 26.69 Punkte
📰 IT Security Nachrichten

📰 Artists Are Deleting Instagram For New App Cara In Protest of Meta AI Scraping


📈 26.21 Punkte
📰 IT Security Nachrichten

📰 Scrapestack Web Scraping API (Review): Powerful Real-time Engine for Website Scraping


📈 24.57 Punkte
Web Tipps

📰 Scrapestack Web Scraping API (Review): Powerful Real-time Engine for Website Scraping


📈 24.57 Punkte
🖥️ Betriebssysteme

🔧 Kunstfirma gegen „toxischen“ Konzern: Meta verklagt Meta, weil es sich Meta nennt


📈 23.96 Punkte
🔧 Programmierung

📰 Namensstreit: Meta verklagt Meta, weil es sich Meta nennt


📈 23.96 Punkte
📰 IT Nachrichten

📰 Privacy group complains to UK regulator about Meta scraping user data to train AI


📈 23.51 Punkte
📰 IT Security Nachrichten

🔧 Solving Issues with og: Meta Tags: A Comprehensive Guide


📈 22.59 Punkte
🔧 Programmierung

🕵️ Smule: No Rate Limiting On Phone Number Login Leads to Login Bypass


📈 21.46 Punkte
🕵️ Sicherheitslücken

⚠️ #0daytoday #User Registration &amp;amp; Login and User Management System 2.1 - Login Bypass SQL [#0day #Exploit]


📈 21.46 Punkte
⚠️ PoC

⚠️ [webapps] User Registration & Login and User Management System 2.1 - Login Bypass SQL Injection


📈 21.46 Punkte
⚠️ PoC

📰 A Spy Site Is Scraping Discord and Selling Users' Messages


📈 21.18 Punkte
📰 IT Security Nachrichten

🔧 # Breaking Into Data Science: A Comprehensive Guide for Aspiring Data Scientists


📈 21.07 Punkte
🔧 Programmierung

🔧 Data Analytics 101: A Comprehensive Guide to a Data-Driven World.


📈 21.07 Punkte
🔧 Programmierung

📰 What is Data Encryption: The Comprehensive Guide for Data Security


📈 21.07 Punkte
📰 IT Security Nachrichten

🔧 Navigating the Data Jungle. Data Analysis Software: A Comprehensive Guide


📈 21.07 Punkte
🔧 Programmierung

🔧 Revolutionizing Data Processing with CXXGraph: A Comprehensive Guide to Graph Data Structures in C++


📈 21.07 Punkte
🔧 Programmierung

🔧 Web Data Scraping with Python: A Complete Guide


📈 21.03 Punkte
🔧 Programmierung

🔧 Scraping Data from Websites using JavaScript: A Beginner's Guide


📈 21.03 Punkte
🔧 Programmierung

🔧 Know the secrets of Netflix dataset: A guide of data scraping


📈 21.03 Punkte
🔧 Programmierung

📰 Sallie Mae login Guide: How To Login Sallie Mae Student Account?


📈 20.62 Punkte
📰 IT Security Nachrichten

📰 Sallie Mae login Guide: How To Login Sallie Mae Student Account?


📈 20.62 Punkte
📰 IT Security Nachrichten

🔧 Meta geht gegen Scraping-Firmen vor


📈 20.27 Punkte
🔧 Programmierung

🔧 Mastering Loops in JavaScript: `while`, `do...while`, and `for`


📈 19.92 Punkte
🔧 Programmierung

🔧 Mastering the Art of Loops: Understanding For, While, and Do While Constructs Through Practical Examples


📈 19.92 Punkte
🔧 Programmierung

🐧 While-Loops R: While-Schleifen in R


📈 19.92 Punkte
🐧 Server

🕵️ Google TensorFlow up to 2.1.3/2.2.2/2.3.2/2.4.1 lite/kernels/while.cc While infinite loop


📈 19.92 Punkte
🕵️ Sicherheitslücken

matomo