In Adobe Experience Platform, there are two primary methods for ingesting data: batch ingestion and streaming ingestion. The key difference between these two methods lies in the way data is collected, processed, and ingested into the platform.
Batch Ingestion
Batch ingestion is the process of ingesting large, static files containing data into Adobe Experience Platform. This method is suitable for ingesting data that is collected and stored over a period of time, such as daily logs or weekly reports. The data is typically stored in a file format like CSV, JSON, or Parquet, and then uploaded to the platform in batches.
The batch ingestion process typically involves the following steps:
- Prepare the data file(s) in the supported format(s).
- Create a batch ingestion source in Adobe Experience Platform.
- Map the data file(s) to the appropriate Experience Data Model (XDM) schema.
- Schedule or manually trigger the ingestion process.
- Monitor the ingestion process and handle any errors or issues.
Batch ingestion is well-suited for large volumes of data that do not require real-time processing. It is often used for historical data ingestion, data backfills, or periodic data uploads from various sources.
Streaming Ingestion
Streaming ingestion, on the other hand, is the process of ingesting data in real-time or near real-time into Adobe Experience Platform. This method is suitable for ingesting data that is generated continuously, such as website interactions, mobile app events, or IoT sensor data.
The streaming ingestion process typically involves the following steps:
- Configure a streaming source in Adobe Experience Platform.
- Map the streaming data to the appropriate XDM schema.
- Send the data to the streaming endpoint using a supported protocol (e.g., HTTP API, Amazon Kinesis, Google Cloud Pub/Sub).
- Monitor the ingestion process and handle any errors or issues.
Streaming ingestion is well-suited for real-time or near real-time data processing and analysis, enabling organizations to react quickly to customer interactions, detect anomalies, or trigger automated workflows based on incoming data.
Key Considerations
When deciding between batch and streaming ingestion, consider the following factors:
- Data Volume: Batch ingestion is generally more efficient for large volumes of data, while streaming ingestion is better suited for smaller, continuous data streams.
- Latency Requirements: If you need to process and analyze data in real-time or near real-time, streaming ingestion is the preferred method. If latency is not a critical factor, batch ingestion may be more suitable.
- Data Source: Batch ingestion is often used for ingesting data from sources that generate data in batches, such as logs or reports. Streaming ingestion is more appropriate for sources that generate data continuously, like website interactions or IoT devices.
- Data Processing Requirements: If you need to perform complex data transformations or enrichment before ingestion, batch ingestion may be more suitable, as it allows for preprocessing before ingestion.
It’s important to note that Adobe Experience Platform supports both batch and streaming ingestion, allowing organizations to choose the method that best fits their specific use case and data requirements.