Extract Pages | pdf-mcp Documentation

Extract or rearrange specific pages from a PDF document. This endpoint allows you to create a new PDF containing only selected pages, and you can specify the order in which pages appear in the output.

Endpoint

POST /extractPages

Authentication

Requires a valid API key or OAuth token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

See Authentication for details.

Request Body

Content-Type: application/json or multipart/form-data

Field	Type	Required	Description
`pdf_base64`	string	Conditional	Base64-encoded PDF file
`pdf_url`	string	Conditional	URL to fetch the PDF from
`file`	file	Conditional	PDF file upload (multipart only)
`pages`	array/string	Yes	Page numbers to extract (1-indexed)
`filename`	string	No	Output filename (default: “extracted.pdf”)
`storage`	object	No	Storage options for the generated PDF (see below)
`return_binary`	boolean	No	Return PDF binary even when using storage (default: `false`)

Note: You must provide exactly one of: pdf_base64, pdf_url, or file.

Field Details

pdf_base64 (conditional)

A base64-encoded PDF file. Use this when you have the PDF data in memory or need to send it as part of a JSON payload. The encoded string should not include the data URI prefix.

pdf_url (conditional)

A publicly accessible URL where the PDF can be fetched. The server will download the PDF from this URL before processing. Supports redirects.

file (conditional, multipart only)

Direct file upload via multipart form data. This is the simplest option when you have the PDF file available locally.

pages (required)

Specify which pages to extract and their order:

JSON format: Array of integers [1, 3, 2, 5]
Multipart format: Comma-separated string "1,3,2,5"

Page numbers are 1-indexed (first page is 1, not 0). Pages will appear in the output PDF in the order specified. You can:

Extract a subset of pages
Reorder pages by listing them in your desired sequence
Duplicate pages by including the same page number multiple times

filename (optional)

The filename for the output PDF. Defaults to “extracted.pdf”. The .pdf extension is automatically added if not provided.

Storage Options

Control how the generated PDF is stored and returned.

Field	Type	Required	Default	Description
`storage.mode`	string	No	`memory`	Storage mode: `memory`, `default`, or `byob`
`storage.filename`	string	No	endpoint default	Custom filename (auto-suffixed with timestamp if duplicate)
`storage.expires_in`	integer	No	3600	Signed URL expiry in seconds (60-604800)
`storage.retention_days`	integer	No	14	Auto-delete document after N days (1-365)

Storage Modes:

memory: Return PDF bytes directly in response (no persistence). This is the default for REST API calls.
default: Store in pdf-mcp S3 bucket, return JSON with document metadata and signed URL.
byob: Store in your own S3 bucket (requires BYOB configuration).

Example Request

Extract Specific Pages (JSON with Base64)

# First, encode your PDF to base64
PDF_BASE64=$(base64 -i document.pdf)

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_base64": "'"$PDF_BASE64"'",
    "pages": [1, 3, 5]
  }'

Extract from URL

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/report.pdf",
    "pages": [1, 2, 3],
    "filename": "first-three-pages.pdf"
  }'

Reorder Pages

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/report.pdf",
    "pages": [3, 1, 2]
  }'

This extracts pages 3, 1, and 2 (in that order), effectively moving page 3 to the front.

Using File Upload (Multipart)

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "pages=1,3,5"

File Upload with Custom Filename

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "pages=2,4,6,8" \
  -F "filename=even-pages.pdf"

With Storage (Persistent PDF)

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/report.pdf",
    "pages": [1, 3, 5],
    "storage": {
      "mode": "default",
      "filename": "key-pages.pdf",
      "expires_in": 86400,
      "retention_days": 30
    }
  }'

Response

Success (memory mode)

When storage.mode is memory or not provided, returns the PDF as a binary file download.

Response Headers:

Header	Description
`Content-Type`	`application/pdf`
`Content-Disposition`	`attachment; filename="extracted.pdf"` (or custom filename)

Success (with storage)

When storage.mode is default or byob, returns JSON with document metadata and a signed download URL:

{
  "success": true,
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "https://s3.eu-central-1.amazonaws.com/...",
  "filename": "key-pages.pdf",
  "file_size_bytes": 23456,
  "page_count": 3,
  "storage_mode": "default",
  "expires_at": "2024-02-15T10:30:00Z",
  "signed_url_expires_at": "2024-01-15T12:00:00Z"
}

Field	Type	Description
`success`	boolean	Always `true` on success
`document_id`	string	UUID of the stored document
`url`	string	Presigned URL for downloading the PDF
`filename`	string	Document filename
`file_size_bytes`	integer	Size of the PDF in bytes
`page_count`	integer	Number of pages in the PDF
`storage_mode`	string	Storage mode used (`default` or `byob`)
`expires_at`	string	Auto-deletion timestamp (ISO 8601), if retention is set
`signed_url_expires_at`	string	Expiration time of the signed URL (ISO 8601)

Error

{
  "error": "Failed to extract pages",
  "message": "Error description"
}

Status Codes:

Code	Description
200	Success - PDF returned or stored
400	Bad Request - Invalid input or BYOB not configured
401	Unauthorized - Missing or invalid Authorization header
402	Payment Required - Insufficient credits
403	Forbidden - Invalid API key or OAuth token
500	Internal Server Error - Page extraction failed
502	Bad Gateway - Storage operation failed (S3 upload or URL generation)

Page Extraction Details

How Page Extraction Works

The API uses PyPDF to read the source PDF and create a new PDF with only the selected pages. The original PDF is never modified.

Page Number Validation

Page numbers are 1-indexed (first page is 1)
Requesting a page number outside the valid range (less than 1 or greater than total pages) will return an error
Use the Page Count endpoint first if you need to know the total number of pages

Page Order

Pages are added to the output PDF in the exact order specified in the pages array. This allows you to:

Extract sequentially: [1, 2, 3] - pages in original order
Reverse order: [5, 4, 3, 2, 1] - pages in reverse
Reorder: [3, 1, 4, 2] - custom arrangement
Skip pages: [1, 3, 5, 7] - every other page

Use Cases

Split a Long Document

# Extract first 10 pages of a long report
curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/annual-report.pdf",
    "pages": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "filename": "executive-summary.pdf"
  }'

Remove Unwanted Pages

Extract only the pages you want, effectively removing unwanted content:

# Original has 5 pages, skip page 3 (unwanted ad page)
curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "pages=1,2,4,5"

Rearrange Document Order

# Move the conclusion (page 10) to the front
curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/thesis.pdf",
    "pages": [10, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    "filename": "thesis-reordered.pdf"
  }'

Extract Cover Page

curl -X POST https://api.pdf-mcp.io/extractPages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/documents/ebook.pdf",
    "pages": [1],
    "filename": "cover.pdf"
  }'

Example: Python Integration

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://api.pdf-mcp.io/extractPages"

# Extract pages from a URL
response = requests.post(
    API_URL,
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "pdf_url": "https://example.com/documents/report.pdf",
        "pages": [1, 3, 5],
        "filename": "selected-pages.pdf"
    }
)

if response.status_code == 200:
    with open("selected-pages.pdf", "wb") as f:
        f.write(response.content)
    print("PDF saved successfully!")
else:
    print(f"Error: {response.status_code}")

Using File Upload (Python)

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://api.pdf-mcp.io/extractPages"

# Extract pages from a local file
with open("document.pdf", "rb") as f:
    response = requests.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {API_KEY}"
        },
        files={
            "file": ("document.pdf", f, "application/pdf")
        },
        data={
            "pages": "1,2,3",
            "filename": "first-three-pages.pdf"
        }
    )

if response.status_code == 200:
    with open("first-three-pages.pdf", "wb") as f:
        f.write(response.content)
    print("PDF saved successfully!")
else:
    print(f"Error: {response.status_code}")

Example: Node.js Integration

const fs = require('fs');

const API_KEY = 'YOUR_API_KEY';
const API_URL = 'https://api.pdf-mcp.io/extractPages';

async function extractPages(pdfUrl, pages, outputFilename) {
  const response = await fetch(API_URL, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      pdf_url: pdfUrl,
      pages: pages,
      filename: outputFilename
    })
  });

  if (response.ok) {
    const buffer = await response.arrayBuffer();
    fs.writeFileSync(outputFilename, Buffer.from(buffer));
    console.log('PDF saved successfully!');
  } else {
    console.error('Error:', response.status);
  }
}

extractPages(
  'https://example.com/documents/report.pdf',
  [1, 3, 5],
  'selected-pages.pdf'
);

Using File Upload (Node.js)

const fs = require('fs');
const path = require('path');

const API_KEY = 'YOUR_API_KEY';
const API_URL = 'https://api.pdf-mcp.io/extractPages';

async function extractPagesFromFile(filePath, pages, outputFilename) {
  const fileBuffer = fs.readFileSync(filePath);
  const formData = new FormData();
  formData.append('file', new Blob([fileBuffer]), path.basename(filePath));
  formData.append('pages', pages.join(','));
  formData.append('filename', outputFilename);

  const response = await fetch(API_URL, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`
    },
    body: formData
  });

  if (response.ok) {
    const buffer = await response.arrayBuffer();
    fs.writeFileSync(outputFilename, Buffer.from(buffer));
    console.log('PDF saved successfully!');
  } else {
    console.error('Error:', response.status);
  }
}

extractPagesFromFile('document.pdf', [1, 2, 3], 'first-three-pages.pdf');

Tips and Best Practices

Choosing Input Method

File upload: Best for local files, simplest to implement
Base64: Best for programmatic access when PDF is already in memory
URL: Best for processing PDFs already hosted online

Performance Optimization

Extract only the pages you need to minimize processing time
For very large PDFs, consider checking the page count first
Use URL-based input when possible to avoid base64 encoding overhead

Building Dynamic Documents

You can use page extraction to build custom documents by:

Extracting specific pages from multiple source PDFs
Using Merge PDFs to combine the extracted pages

Error Handling

Always validate page numbers are within range before making the request
Use the Page Count endpoint to verify total pages
Implement retry logic for URL-based extraction (network failures)

Common Patterns

# Extract first page only
"pages": [1]

# Extract last 3 pages (assuming 10 total pages)
"pages": [8, 9, 10]

# Extract odd pages
"pages": [1, 3, 5, 7, 9]

# Extract even pages
"pages": [2, 4, 6, 8, 10]

# Reverse page order
"pages": [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Page Count - Get the number of pages in a PDF before extraction
Merge PDFs - Combine multiple PDFs (including extracted pages)
Extract Text - Extract text content from specific pages

Credit Usage

Approximately 1 credit per page in the output PDF.