Extract or rearrange specific pages from a PDF document. This endpoint allows you to create a new PDF containing only selected pages, and you can specify the order in which pages appear in the output.
Endpoint
POST /extractPages
Authentication
Requires a valid API key or OAuth token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
See Authentication for details.
Request Body
Content-Type: application/json or multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
pdf_base64 | string | Conditional | Base64-encoded PDF file |
pdf_url | string | Conditional | URL to fetch the PDF from |
file | file | Conditional | PDF file upload (multipart only) |
pages | array/string | Yes | Page numbers to extract (1-indexed) |
filename | string | No | Output filename (default: “extracted.pdf”) |
storage | object | No | Storage options for the generated PDF (see below) |
return_binary | boolean | No | Return PDF binary even when using storage (default: false) |
Note: You must provide exactly one of: pdf_base64, pdf_url, or file.
Field Details
pdf_base64 (conditional)
A base64-encoded PDF file. Use this when you have the PDF data in memory or need to send it as part of a JSON payload. The encoded string should not include the data URI prefix.
pdf_url (conditional)
A publicly accessible URL where the PDF can be fetched. The server will download the PDF from this URL before processing. Supports redirects.
file (conditional, multipart only)
Direct file upload via multipart form data. This is the simplest option when you have the PDF file available locally.
pages (required)
Specify which pages to extract and their order:
- JSON format: Array of integers
[1, 3, 2, 5] - Multipart format: Comma-separated string
"1,3,2,5"
Page numbers are 1-indexed (first page is 1, not 0). Pages will appear in the output PDF in the order specified. You can:
- Extract a subset of pages
- Reorder pages by listing them in your desired sequence
- Duplicate pages by including the same page number multiple times
filename (optional)
The filename for the output PDF. Defaults to “extracted.pdf”. The .pdf extension is automatically added if not provided.
Storage Options
Control how the generated PDF is stored and returned.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
storage.mode | string | No | memory | Storage mode: memory, default, or byob |
storage.filename | string | No | endpoint default | Custom filename (auto-suffixed with timestamp if duplicate) |
storage.expires_in | integer | No | 3600 | Signed URL expiry in seconds (60-604800) |
storage.retention_days | integer | No | 14 | Auto-delete document after N days (1-365) |
Storage Modes:
- memory: Return PDF bytes directly in response (no persistence). This is the default for REST API calls.
- default: Store in pdf-mcp S3 bucket, return JSON with document metadata and signed URL.
- byob: Store in your own S3 bucket (requires BYOB configuration).
Example Request
Extract Specific Pages (JSON with Base64)
# First, encode your PDF to base64
PDF_BASE64=$(base64 -i document.pdf)
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_base64": "'"$PDF_BASE64"'",
"pages": [1, 3, 5]
}'
Extract from URL
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/documents/report.pdf",
"pages": [1, 2, 3],
"filename": "first-three-pages.pdf"
}'
Reorder Pages
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/documents/report.pdf",
"pages": [3, 1, 2]
}'
This extracts pages 3, 1, and 2 (in that order), effectively moving page 3 to the front.
Using File Upload (Multipart)
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@document.pdf" \
-F "pages=1,3,5"
File Upload with Custom Filename
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@document.pdf" \
-F "pages=2,4,6,8" \
-F "filename=even-pages.pdf"
With Storage (Persistent PDF)
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/documents/report.pdf",
"pages": [1, 3, 5],
"storage": {
"mode": "default",
"filename": "key-pages.pdf",
"expires_in": 86400,
"retention_days": 30
}
}'
Response
Success (memory mode)
When storage.mode is memory or not provided, returns the PDF as a binary file download.
Response Headers:
| Header | Description |
|---|---|
Content-Type | application/pdf |
Content-Disposition | attachment; filename="extracted.pdf" (or custom filename) |
Success (with storage)
When storage.mode is default or byob, returns JSON with document metadata and a signed download URL:
{
"success": true,
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"url": "https://s3.eu-central-1.amazonaws.com/...",
"filename": "key-pages.pdf",
"file_size_bytes": 23456,
"page_count": 3,
"storage_mode": "default",
"expires_at": "2024-02-15T10:30:00Z",
"signed_url_expires_at": "2024-01-15T12:00:00Z"
}
| Field | Type | Description |
|---|---|---|
success | boolean | Always true on success |
document_id | string | UUID of the stored document |
url | string | Presigned URL for downloading the PDF |
filename | string | Document filename |
file_size_bytes | integer | Size of the PDF in bytes |
page_count | integer | Number of pages in the PDF |
storage_mode | string | Storage mode used (default or byob) |
expires_at | string | Auto-deletion timestamp (ISO 8601), if retention is set |
signed_url_expires_at | string | Expiration time of the signed URL (ISO 8601) |
Error
{
"error": "Failed to extract pages",
"message": "Error description"
}
Status Codes:
| Code | Description |
|---|---|
| 200 | Success - PDF returned or stored |
| 400 | Bad Request - Invalid input or BYOB not configured |
| 401 | Unauthorized - Missing or invalid Authorization header |
| 402 | Payment Required - Insufficient credits |
| 403 | Forbidden - Invalid API key or OAuth token |
| 500 | Internal Server Error - Page extraction failed |
| 502 | Bad Gateway - Storage operation failed (S3 upload or URL generation) |
Page Extraction Details
How Page Extraction Works
The API uses PyPDF to read the source PDF and create a new PDF with only the selected pages. The original PDF is never modified.
Page Number Validation
- Page numbers are 1-indexed (first page is 1)
- Requesting a page number outside the valid range (less than 1 or greater than total pages) will return an error
- Use the Page Count endpoint first if you need to know the total number of pages
Page Order
Pages are added to the output PDF in the exact order specified in the pages array. This allows you to:
- Extract sequentially:
[1, 2, 3]- pages in original order - Reverse order:
[5, 4, 3, 2, 1]- pages in reverse - Reorder:
[3, 1, 4, 2]- custom arrangement - Skip pages:
[1, 3, 5, 7]- every other page
Use Cases
Split a Long Document
# Extract first 10 pages of a long report
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/documents/annual-report.pdf",
"pages": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"filename": "executive-summary.pdf"
}'
Remove Unwanted Pages
Extract only the pages you want, effectively removing unwanted content:
# Original has 5 pages, skip page 3 (unwanted ad page)
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@document.pdf" \
-F "pages=1,2,4,5"
Rearrange Document Order
# Move the conclusion (page 10) to the front
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/documents/thesis.pdf",
"pages": [10, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"filename": "thesis-reordered.pdf"
}'
Extract Cover Page
curl -X POST https://api.pdf-mcp.io/extractPages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"pdf_url": "https://example.com/documents/ebook.pdf",
"pages": [1],
"filename": "cover.pdf"
}'
Example: Python Integration
import requests
API_KEY = "YOUR_API_KEY"
API_URL = "https://api.pdf-mcp.io/extractPages"
# Extract pages from a URL
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json={
"pdf_url": "https://example.com/documents/report.pdf",
"pages": [1, 3, 5],
"filename": "selected-pages.pdf"
}
)
if response.status_code == 200:
with open("selected-pages.pdf", "wb") as f:
f.write(response.content)
print("PDF saved successfully!")
else:
print(f"Error: {response.status_code}")
Using File Upload (Python)
import requests
API_KEY = "YOUR_API_KEY"
API_URL = "https://api.pdf-mcp.io/extractPages"
# Extract pages from a local file
with open("document.pdf", "rb") as f:
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {API_KEY}"
},
files={
"file": ("document.pdf", f, "application/pdf")
},
data={
"pages": "1,2,3",
"filename": "first-three-pages.pdf"
}
)
if response.status_code == 200:
with open("first-three-pages.pdf", "wb") as f:
f.write(response.content)
print("PDF saved successfully!")
else:
print(f"Error: {response.status_code}")
Example: Node.js Integration
const fs = require('fs');
const API_KEY = 'YOUR_API_KEY';
const API_URL = 'https://api.pdf-mcp.io/extractPages';
async function extractPages(pdfUrl, pages, outputFilename) {
const response = await fetch(API_URL, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
pdf_url: pdfUrl,
pages: pages,
filename: outputFilename
})
});
if (response.ok) {
const buffer = await response.arrayBuffer();
fs.writeFileSync(outputFilename, Buffer.from(buffer));
console.log('PDF saved successfully!');
} else {
console.error('Error:', response.status);
}
}
extractPages(
'https://example.com/documents/report.pdf',
[1, 3, 5],
'selected-pages.pdf'
);
Using File Upload (Node.js)
const fs = require('fs');
const path = require('path');
const API_KEY = 'YOUR_API_KEY';
const API_URL = 'https://api.pdf-mcp.io/extractPages';
async function extractPagesFromFile(filePath, pages, outputFilename) {
const fileBuffer = fs.readFileSync(filePath);
const formData = new FormData();
formData.append('file', new Blob([fileBuffer]), path.basename(filePath));
formData.append('pages', pages.join(','));
formData.append('filename', outputFilename);
const response = await fetch(API_URL, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`
},
body: formData
});
if (response.ok) {
const buffer = await response.arrayBuffer();
fs.writeFileSync(outputFilename, Buffer.from(buffer));
console.log('PDF saved successfully!');
} else {
console.error('Error:', response.status);
}
}
extractPagesFromFile('document.pdf', [1, 2, 3], 'first-three-pages.pdf');
Tips and Best Practices
Choosing Input Method
- File upload: Best for local files, simplest to implement
- Base64: Best for programmatic access when PDF is already in memory
- URL: Best for processing PDFs already hosted online
Performance Optimization
- Extract only the pages you need to minimize processing time
- For very large PDFs, consider checking the page count first
- Use URL-based input when possible to avoid base64 encoding overhead
Building Dynamic Documents
You can use page extraction to build custom documents by:
- Extracting specific pages from multiple source PDFs
- Using Merge PDFs to combine the extracted pages
Error Handling
- Always validate page numbers are within range before making the request
- Use the Page Count endpoint to verify total pages
- Implement retry logic for URL-based extraction (network failures)
Common Patterns
# Extract first page only
"pages": [1]
# Extract last 3 pages (assuming 10 total pages)
"pages": [8, 9, 10]
# Extract odd pages
"pages": [1, 3, 5, 7, 9]
# Extract even pages
"pages": [2, 4, 6, 8, 10]
# Reverse page order
"pages": [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
Related Endpoints
- Page Count - Get the number of pages in a PDF before extraction
- Merge PDFs - Combine multiple PDFs (including extracted pages)
- Extract Text - Extract text content from specific pages
Credit Usage
Approximately 1 credit per page in the output PDF.