Extract or rearrange specific pages from a PDF document. This endpoint allows you to create a new PDF containing only selected pages, and you can specify the order in which pages appear in the output.

Endpoint

POST /extractPages

Authentication

Requires a valid API key or OAuth token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for details.


Request Body

Content-Type: application/json or multipart/form-data

FieldTypeRequiredDescription
pdf_base64stringConditionalBase64-encoded PDF file
pdf_urlstringConditionalURL to fetch the PDF from
filefileConditionalPDF file upload (multipart only)
pagesarray/stringYesPage numbers to extract (1-indexed)
filenamestringNoOutput filename (default: “extracted.pdf”)
storageobjectNoStorage options for the generated PDF (see below)
return_binarybooleanNoReturn PDF binary even when using storage (default: false)

Note: You must provide exactly one of: pdf_base64, pdf_url, or file.

Field Details

pdf_base64 (conditional)

A base64-encoded PDF file. Use this when you have the PDF data in memory or need to send it as part of a JSON payload. The encoded string should not include the data URI prefix.

pdf_url (conditional)

A publicly accessible URL where the PDF can be fetched. The server will download the PDF from this URL before processing. Supports redirects.

file (conditional, multipart only)

Direct file upload via multipart form data. This is the simplest option when you have the PDF file available locally.

pages (required)

Specify which pages to extract and their order:

  • JSON format: Array of integers [1, 3, 2, 5]
  • Multipart format: Comma-separated string "1,3,2,5"

Page numbers are 1-indexed (first page is 1, not 0). Pages will appear in the output PDF in the order specified. You can:

  • Extract a subset of pages
  • Reorder pages by listing them in your desired sequence
  • Duplicate pages by including the same page number multiple times

filename (optional)

The filename for the output PDF. Defaults to “extracted.pdf”. The .pdf extension is automatically added if not provided.

Storage Options

Control how the generated PDF is stored and returned.

FieldTypeRequiredDefaultDescription
storage.modestringNomemoryStorage mode: memory, default, or byob
storage.filenamestringNoendpoint defaultCustom filename (auto-suffixed with timestamp if duplicate)
storage.expires_inintegerNo3600Signed URL expiry in seconds (60-604800)
storage.retention_daysintegerNo14Auto-delete document after N days (1-365)

Storage Modes:

  • memory: Return PDF bytes directly in response (no persistence). This is the default for REST API calls.
  • default: Store in pdf-mcp S3 bucket, return JSON with document metadata and signed URL.
  • byob: Store in your own S3 bucket (requires BYOB configuration).

Example Request

Extract Specific Pages (JSON with Base64)

# First, encode your PDF to base64
PDF_BASE64=$(base64 -i document.pdf)

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_base64": "'"$PDF_BASE64"'",
    "pages": [1, 3, 5]
  }'

Extract from URL

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/report.pdf",
    "pages": [1, 2, 3],
    "filename": "first-three-pages.pdf"
  }'

Reorder Pages

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/report.pdf",
    "pages": [3, 1, 2]
  }'

This extracts pages 3, 1, and 2 (in that order), effectively moving page 3 to the front.

Using File Upload (Multipart)

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "pages=1,3,5"

File Upload with Custom Filename

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "pages=2,4,6,8" \
  -F "filename=even-pages.pdf"

With Storage (Persistent PDF)

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/report.pdf",
    "pages": [1, 3, 5],
    "storage": {
      "mode": "default",
      "filename": "key-pages.pdf",
      "expires_in": 86400,
      "retention_days": 30
    }
  }'

Response

Success (memory mode)

When storage.mode is memory or not provided, returns the PDF as a binary file download.

Response Headers:

HeaderDescription
Content-Typeapplication/pdf
Content-Dispositionattachment; filename="extracted.pdf" (or custom filename)

Success (with storage)

When storage.mode is default or byob, returns JSON with document metadata and a signed download URL:

{
  "success": true,
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "https://s3.eu-central-1.amazonaws.com/...",
  "filename": "key-pages.pdf",
  "file_size_bytes": 23456,
  "page_count": 3,
  "storage_mode": "default",
  "expires_at": "2024-02-15T10:30:00Z",
  "signed_url_expires_at": "2024-01-15T12:00:00Z"
}
FieldTypeDescription
successbooleanAlways true on success
document_idstringUUID of the stored document
urlstringPresigned URL for downloading the PDF
filenamestringDocument filename
file_size_bytesintegerSize of the PDF in bytes
page_countintegerNumber of pages in the PDF
storage_modestringStorage mode used (default or byob)
expires_atstringAuto-deletion timestamp (ISO 8601), if retention is set
signed_url_expires_atstringExpiration time of the signed URL (ISO 8601)

Error

{
  "error": "Failed to extract pages",
  "message": "Error description"
}

Status Codes:

CodeDescription
200Success - PDF returned or stored
400Bad Request - Invalid input or BYOB not configured
401Unauthorized - Missing or invalid Authorization header
402Payment Required - Insufficient credits
403Forbidden - Invalid API key or OAuth token
500Internal Server Error - Page extraction failed
502Bad Gateway - Storage operation failed (S3 upload or URL generation)

Page Extraction Details

How Page Extraction Works

The API uses PyPDF to read the source PDF and create a new PDF with only the selected pages. The original PDF is never modified.

Page Number Validation

  • Page numbers are 1-indexed (first page is 1)
  • Requesting a page number outside the valid range (less than 1 or greater than total pages) will return an error
  • Use the Page Count endpoint first if you need to know the total number of pages

Page Order

Pages are added to the output PDF in the exact order specified in the pages array. This allows you to:

  • Extract sequentially: [1, 2, 3] - pages in original order
  • Reverse order: [5, 4, 3, 2, 1] - pages in reverse
  • Reorder: [3, 1, 4, 2] - custom arrangement
  • Skip pages: [1, 3, 5, 7] - every other page

Use Cases

Split a Long Document

# Extract first 10 pages of a long report
curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/annual-report.pdf",
    "pages": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "filename": "executive-summary.pdf"
  }'

Remove Unwanted Pages

Extract only the pages you want, effectively removing unwanted content:

# Original has 5 pages, skip page 3 (unwanted ad page)
curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "pages=1,2,4,5"

Rearrange Document Order

# Move the conclusion (page 10) to the front
curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/thesis.pdf",
    "pages": [10, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    "filename": "thesis-reordered.pdf"
  }'

Extract Cover Page

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/ebook.pdf",
    "pages": [1],
    "filename": "cover.pdf"
  }'

Example: Python Integration

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://api.pdf-mcp.io/extractPages"

# Extract pages from a URL
response = requests.post(
    API_URL,
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "pdf_url": "https://example.com/documents/report.pdf",
        "pages": [1, 3, 5],
        "filename": "selected-pages.pdf"
    }
)

if response.status_code == 200:
    with open("selected-pages.pdf", "wb") as f:
        f.write(response.content)
    print("PDF saved successfully!")
else:
    print(f"Error: {response.status_code}")

Using File Upload (Python)

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://api.pdf-mcp.io/extractPages"

# Extract pages from a local file
with open("document.pdf", "rb") as f:
    response = requests.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {API_KEY}"
        },
        files={
            "file": ("document.pdf", f, "application/pdf")
        },
        data={
            "pages": "1,2,3",
            "filename": "first-three-pages.pdf"
        }
    )

if response.status_code == 200:
    with open("first-three-pages.pdf", "wb") as f:
        f.write(response.content)
    print("PDF saved successfully!")
else:
    print(f"Error: {response.status_code}")

Example: Node.js Integration

const fs = require('fs');

const API_KEY = 'YOUR_API_KEY';
const API_URL = 'https://api.pdf-mcp.io/extractPages';

async function extractPages(pdfUrl, pages, outputFilename) {
  const response = await fetch(API_URL, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      pdf_url: pdfUrl,
      pages: pages,
      filename: outputFilename
    })
  });

  if (response.ok) {
    const buffer = await response.arrayBuffer();
    fs.writeFileSync(outputFilename, Buffer.from(buffer));
    console.log('PDF saved successfully!');
  } else {
    console.error('Error:', response.status);
  }
}

extractPages(
  'https://example.com/documents/report.pdf',
  [1, 3, 5],
  'selected-pages.pdf'
);

Using File Upload (Node.js)

const fs = require('fs');
const path = require('path');

const API_KEY = 'YOUR_API_KEY';
const API_URL = 'https://api.pdf-mcp.io/extractPages';

async function extractPagesFromFile(filePath, pages, outputFilename) {
  const fileBuffer = fs.readFileSync(filePath);
  const formData = new FormData();
  formData.append('file', new Blob([fileBuffer]), path.basename(filePath));
  formData.append('pages', pages.join(','));
  formData.append('filename', outputFilename);

  const response = await fetch(API_URL, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`
    },
    body: formData
  });

  if (response.ok) {
    const buffer = await response.arrayBuffer();
    fs.writeFileSync(outputFilename, Buffer.from(buffer));
    console.log('PDF saved successfully!');
  } else {
    console.error('Error:', response.status);
  }
}

extractPagesFromFile('document.pdf', [1, 2, 3], 'first-three-pages.pdf');

Tips and Best Practices

Choosing Input Method

  • File upload: Best for local files, simplest to implement
  • Base64: Best for programmatic access when PDF is already in memory
  • URL: Best for processing PDFs already hosted online

Performance Optimization

  • Extract only the pages you need to minimize processing time
  • For very large PDFs, consider checking the page count first
  • Use URL-based input when possible to avoid base64 encoding overhead

Building Dynamic Documents

You can use page extraction to build custom documents by:

  1. Extracting specific pages from multiple source PDFs
  2. Using Merge PDFs to combine the extracted pages

Error Handling

  • Always validate page numbers are within range before making the request
  • Use the Page Count endpoint to verify total pages
  • Implement retry logic for URL-based extraction (network failures)

Common Patterns

# Extract first page only
"pages": [1]

# Extract last 3 pages (assuming 10 total pages)
"pages": [8, 9, 10]

# Extract odd pages
"pages": [1, 3, 5, 7, 9]

# Extract even pages
"pages": [2, 4, 6, 8, 10]

# Reverse page order
"pages": [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

  • Page Count - Get the number of pages in a PDF before extraction
  • Merge PDFs - Combine multiple PDFs (including extracted pages)
  • Extract Text - Extract text content from specific pages

Credit Usage

Approximately 1 credit per page in the output PDF.