DeepLX/translate/translate.go
Vincent Yang 98918dd9f6
feat(translate): migrate to oneshot endpoint (#217)
* feat(translate): migrate to oneshot endpoint to bypass www2 anti-bot

The www2.deepl.com/jsonrpc backends behind LMT_handle_texts /
LMT_handle_jobs now sit behind aggressive WAF + per-IP throttling that
returns HTTP 429 (code 1042911 "Too many requests") within a handful of
calls from any single host — making the free path effectively unusable.

The official DeepL browser extension and iOS app skip that backend
entirely for stateless single-shot translation and POST to a separate
"oneshot" endpoint on a different host pool with its own (much looser)
rate limit. It accepts anonymous traffic with a literal
`Authorization: None` header, returns plain JSON, and supports the same
language pairs.

Switch the free path to:

  POST https://oneshot-free.www.deepl.com/v1/translate
  Authorization: None
  {"text": ["..."], "target_lang": "de", "source_lang": "en"}

Pro users continue to hit oneshot-pro.www.deepl.com with their bearer
token (the `-s` flag now carries an OAuth access token rather than the
legacy dl_session cookie).

This removes:
  - the JSON-RPC envelope (jsonrpc/method/id/params/timestamp wrapper)
  - the `i`-count timestamp trick (getICount + getTimeStamp)
  - the random-id body-spacing trick (handlerBodyMethod)
  - the whatlanggo client-side detection (oneshot detects server-side)

The DeepLXTranslationResult contract is unchanged for service handlers;
Alternatives is now always nil because the oneshot endpoint does not
return alternative translations.

Verified against /translate, /v1/translate and /v2/translate routes
end-to-end (EN/DE/ZH/JA/FR pairs, multi-sentence input, autodetect, 10x
burst) — all 200 OK on an IP that was concurrently being 429'd by www2.

* fix(translate): align oneshot request bytes with the real extension

After capturing the exact bytes the Chrome extension's service-worker
fetch() emits (via an offline echo server pointed at deeplx in place of
oneshot-free.www.deepl.com) and diffing them against what we were
sending, several distinguishable signals remained. Close them all.

Headers
-------
- Origin: chrome-extension://cofdbpoegempjloogbagkncekinflcnj
  (was https://www.deepl.com — a request from www.deepl.com itself
   never lands on the oneshot endpoint, so that origin is unusual.
   The extension ID is the canonical sender.)
- Sec-Fetch-Site: cross-site
  (was same-site — wrong; chrome-extension -> www.deepl.com IS cross-site)
- Drop Referer entirely (extension SW fetch sends none)
- Drop Pragma / Cache-Control / Upgrade-Insecure-Requests / Sec-Fetch-User
  (req.ImpersonateChrome() sets these for top-level navigation; a
   fetch() never sends them — leaving them in is a strong nav-vs-XHR tell)
- Accept-Encoding: gzip, deflate, br
  (was just gzip, Go stdlib default — Chrome 120's fetch() sends all
   three; zstd only landed as a default in Chrome 123+ so leave it off)

Body
----
- Add usage_type: "Translate" and the full app_information object
  (os/os_version/app_version/app_build/instance_id) so the JSON the
  server sees is structurally identical to what background.js IN()
  assembles. Field order in oneshotRequest matches the extension's
  object-literal order so encoding/json produces byte-identical output.
- instance_id is a v4 UUID generated once at process start and reused,
  mirroring the extension's chrome.storage-pinned ID rather than
  rotating per-request (rotation would be a far stronger signal).
- All version strings (TLS handshake, User-Agent, sec-ch-ua,
  app_information.os_version) are pinned to Chrome 120 so they tell
  one consistent story.

Transport
---------
- SetBodyBytes instead of bytes.NewReader so Content-Length is set
  (an io.Reader body forces Transfer-Encoding: chunked, which a
   fetch() with JSON.stringify body never emits)
- Once we set Accept-Encoding manually, the Go stdlib disables its
  transparent decompression and req hands us raw compressed bytes.
  Handle gzip / deflate / br by hand from Content-Encoding.
- DisableAutoReadResponse so we own the body stream end-to-end.

The Chrome 120 TLS ClientHello, HTTP/2 SETTINGS frame, pseudo-header
order and sec-ch-ua claim continue to come from ImpersonateChrome()
unchanged.

Verified end-to-end:
- Outbound bytes (against a local echo server) diff-match the
  extension's observed profile on every header and on body JSON order.
- Live oneshot-free.www.deepl.com calls: 4 language pairs OK,
  /v2/translate official-API compat OK, 10x burst 10/10 200.

* chore(deps): upgrade to latest compatible versions

Run `go get -u ./...` + `go mod tidy`. Direct upgrades:

- github.com/andybalholm/brotli   1.2.0 → 1.2.1
- github.com/tidwall/gjson         1.18.0 → 1.19.0

Indirect (notable):

- github.com/bytedance/sonic       1.15.0 → 1.15.1
- github.com/bytedance/sonic/loader 0.5.0 → 0.5.1
- github.com/bytedance/gopkg       0.1.3 → 0.1.4
- github.com/cloudwego/base64x     0.1.6 → 0.1.7
- github.com/gin-contrib/sse       1.1.0 → 1.1.1
- github.com/go-playground/validator/v10 10.30.1 → 10.30.2
- github.com/goccy/go-json         0.10.5 → 0.10.6
- github.com/klauspost/compress    1.18.4 → 1.18.6
- github.com/mattn/go-isatty       0.0.20 → 0.0.22
- github.com/pelletier/go-toml/v2  2.2.4  → 2.3.1
- golang.org/x/arch                0.24.0 → 0.27.0
- golang.org/x/crypto              0.48.0 → 0.52.0
- golang.org/x/net                 0.51.0 → 0.55.0
- golang.org/x/sys                 0.41.0 → 0.45.0
- golang.org/x/text                0.34.0 → 0.37.0

github.com/imroc/req/v3 (the HTTP client we depend on for Chrome
impersonation) is already on its latest tag v3.57.0 and pins
github.com/quic-go/quic-go to <= v0.57.x — newer quic-go removed
ConnectionTracingID/ConnectionTracingKey, which req's internal/http3
still references. That constraint also holds gin-gonic/gin at v1.11.0
and gin-contrib/cors at v1.7.6 (their later versions pull quic-go
≥ 0.58 transitively). Pin quic-go to v0.57.1 to keep the build green;
revisit when req publishes a release compatible with quic-go ≥ 0.58.

Build + live oneshot end-to-end: 4 language pairs OK, /v2/translate
official-API compat OK, 8x burst 8/8 200.

* fix(translate): seed cookie jar from www.deepl.com on first call

A real chrome-extension fetch() to oneshot-free.www.deepl.com inherits
whatever cookies the browser has on .deepl.com — at minimum
`userCountry=<iso2>` and `verifiedBot=false`, both of which the
deepl.com server sets on any page load. Our outbound bytes were
otherwise extension-identical but went out cookieless, which is a
distinguishable signal.

Wire a process-wide net/http/cookiejar onto the req.Client and trigger
a single warmup GET to https://www.deepl.com/translator on the first
translate call (sync.Once). The Set-Cookie response (userCountry,
verifiedBot) lands on .deepl.com, which the jar then automatically
echoes back on every subsequent POST to oneshot-free.www.deepl.com
(cookies set on .deepl.com match any *.deepl.com subdomain).

Verified outbound:
  Cookie: userCountry=JP; verifiedBot=false

Latency cost: first call after process start pays one extra HTTP GET
(~1s warmup); subsequent calls are unaffected (sync.Once + connection
keep-alive).

Note: we cannot replicate the _ga / _ga_<id> cookies a real user
would also carry — those are set client-side by GA's JS, which a
non-browser HTTP client can't execute. The userCountry+verifiedBot
pair already matches the "first-time visitor with JS disabled" profile,
which is the closest plausible non-browser approximation.
2026-05-22 12:04:44 +08:00

335 lines
11 KiB
Go

/*
* @Author: Vincent Young
* @Date: 2024-09-16 11:59:24
* @LastEditors: Vincent Yang
* @LastEditTime: 2026-05-22 00:00:00
* @FilePath: /DeepLX/translate/translate.go
* @Telegram: https://t.me/missuo
* @GitHub: https://github.com/missuo
*
* Copyright © 2024 by Vincent, All Rights Reserved.
*/
package translate
import (
"compress/flate"
"compress/gzip"
"crypto/rand"
"encoding/hex"
"encoding/json"
"fmt"
"io"
"net/http"
"net/http/cookiejar"
"net/url"
"strings"
"sync"
"time"
"github.com/andybalholm/brotli"
"github.com/imroc/req/v3"
"github.com/tidwall/gjson"
)
// DeepL's interactive web translator migrated to a SignalR/WebSocket
// channel and the legacy LMT_handle_texts backend on www2.deepl.com now
// 429s anonymous traffic within a handful of calls. The official Chrome
// extension instead POSTs to a stateless "oneshot" endpoint that lives
// on a separate rate-limit pool and accepts the literal header
// `Authorization: None` for anonymous requests — that is what we target.
//
// The request we send is reverse-engineered from the extension's
// background.js (Chrome Web Store ID cofdbpoegempjloogbagkncekinflcnj):
// - URL builder → mN() at ~offset 529948
// - body builder → IN() at ~offset 531200
// - fetch wrapper → JO() at ~offset 508659
// - app metadata → Wo() at ~offset 16500
const (
oneshotFreeEndpoint = "https://oneshot-free.www.deepl.com/v1/translate"
oneshotProEndpoint = "https://oneshot-pro.www.deepl.com/v1/translate"
// Pinned to the Chrome version utls bundles into req v3 (HelloChrome_120).
// Keep this in lockstep with the user-agent and app_information.os_version
// so the TLS handshake, UA, and self-reported browser version all agree —
// a mismatch on any one of those is a cheap signal for the WAF.
impersonatedChromeMajor = "120"
chromeExtensionVersion = "1.86.0"
chromeExtensionID = "cofdbpoegempjloogbagkncekinflcnj"
)
// instanceID mirrors the UUID the extension persists in chrome.storage on
// install: stable for the life of the process, reused on every request.
// Rotating it per-request would be a far stronger signal than reusing one.
var instanceID = newInstanceID()
// A real extension fetch() inherits whatever cookies the browser has
// accumulated on .deepl.com. A cold visit to www.deepl.com sets
// userCountry=<iso2> and verifiedBot=false; users who have ever opened
// the site additionally have _ga / _ga_<id> from analytics JS. We share
// a process-wide cookie jar so every oneshot POST automatically carries
// whatever the warmup GET picked up.
var (
cookieJar http.CookieJar
cookieJarOnce sync.Once
cookieWarmer sync.Once
)
func sharedCookieJar() http.CookieJar {
cookieJarOnce.Do(func() {
j, _ := cookiejar.New(nil)
cookieJar = j
})
return cookieJar
}
// warmCookies primes the shared jar by GETting www.deepl.com once.
// The Set-Cookie response (userCountry / verifiedBot) lands on .deepl.com,
// which is the eTLD+1 of oneshot-free.www.deepl.com, so subsequent POSTs
// to the oneshot endpoint will carry those cookies automatically.
func warmCookies(client *req.Client) {
cookieWarmer.Do(func() {
_, _ = client.R().Get("https://www.deepl.com/translator")
})
}
func newInstanceID() string {
b := make([]byte, 16)
if _, err := rand.Read(b); err != nil {
return "00000000-0000-4000-8000-000000000000"
}
b[6] = (b[6] & 0x0f) | 0x40 // RFC 4122 v4
b[8] = (b[8] & 0x3f) | 0x80
s := hex.EncodeToString(b)
return fmt.Sprintf("%s-%s-%s-%s-%s", s[0:8], s[8:12], s[12:16], s[16:20], s[20:32])
}
// langCodeToOneshot translates DeepL's uppercase codes (DE, EN, ZH, ...)
// to the lowercase BCP-47-ish codes the oneshot endpoint requires (de,
// en-US, zh-Hans, ...). Unknown codes fall through lowercased.
var langCodeToOneshot = map[string]string{
"AR": "ar", "BG": "bg", "CS": "cs", "DA": "da", "DE": "de", "EL": "el",
"EN": "en-US", "EN-GB": "en-GB", "EN-US": "en-US",
"ES": "es", "ET": "et", "FI": "fi", "FR": "fr", "HU": "hu",
"ID": "id", "IT": "it", "JA": "ja", "KO": "ko", "LT": "lt", "LV": "lv",
"NB": "nb", "NL": "nl", "PL": "pl",
"PT": "pt-BR", "PT-BR": "pt-BR", "PT-PT": "pt-PT",
"RO": "ro", "RU": "ru", "SK": "sk", "SL": "sl", "SV": "sv",
"TR": "tr", "UK": "uk",
"ZH": "zh-Hans", "ZH-HANS": "zh-Hans", "ZH-HANT": "zh-Hant",
}
func toOneshotLang(code string) string {
if v, ok := langCodeToOneshot[strings.ToUpper(code)]; ok {
return v
}
return strings.ToLower(code)
}
// appInformation matches the snake_case shape produced by background.js
// Wo({isSnakeCase: true}). Values are pinned to the same Chrome version
// as the TLS handshake so the request tells one consistent story.
type appInformation struct {
OS string `json:"os"`
OSVersion string `json:"os_version"`
AppVersion string `json:"app_version"`
AppBuild string `json:"app_build"`
InstanceID string `json:"instance_id"`
}
// oneshotRequest mirrors the body assembled in background.js IN(...).
// Field order matches the extension's object literal so the serialized
// JSON is byte-identical (encoding/json honours struct field order).
type oneshotRequest struct {
Text []string `json:"text"`
TargetLang string `json:"target_lang"`
SourceLang string `json:"source_lang,omitempty"`
UsageType string `json:"usage_type"`
AppInformation appInformation `json:"app_information"`
}
// newOneshotClient configures a req.Client whose outbound profile matches
// a chrome-extension service-worker fetch() byte-for-byte where it can.
// ImpersonateChrome gives us the Chrome 120 TLS ClientHello, HTTP/2
// SETTINGS, pseudo/header order, and a sec-ch-ua/user-agent set tied to
// the same version. It also installs a navigation-flavoured set of common
// headers (pragma, cache-control, upgrade-insecure-requests, sec-fetch-user)
// that a fetch() never emits — wipe those so the WAF cannot tell us apart
// on that axis.
func newOneshotClient(proxyURL string) (*req.Client, error) {
client := req.C().ImpersonateChrome().SetCookieJar(sharedCookieJar())
for _, h := range []string{
"Pragma",
"Cache-Control",
"Upgrade-Insecure-Requests",
"Sec-Fetch-User",
} {
client.Headers.Del(h)
}
// Chrome 120 fetch() advertises gzip/deflate/br (zstd only appeared
// as a default in Chrome 123+). req's default of just "gzip" is a
// distinguishable signal — match Chrome explicitly.
client.SetCommonHeader("Accept-Encoding", "gzip, deflate, br")
if proxyURL != "" {
u, err := url.Parse(proxyURL)
if err != nil {
return nil, err
}
client.SetProxyURL(u.String())
}
return client, nil
}
// callOneshot POSTs to the oneshot endpoint and returns the parsed JSON.
// For anonymous traffic bearerToken is empty and we send the literal
// header `Authorization: None` — replicating the extension's JO() wrapper
// exactly. Omitting that header instead would put the request on a
// different server-side auth branch.
func callOneshot(endpoint string, body []byte, bearerToken, proxyURL string) (gjson.Result, int, error) {
client, err := newOneshotClient(proxyURL)
if err != nil {
return gjson.Result{}, 0, err
}
warmCookies(client) // no-op after the first translation in the process
authValue := "None"
if bearerToken != "" {
authValue = "Bearer " + bearerToken
}
resp, err := client.R().
DisableAutoReadResponse().
SetHeader("Content-Type", "application/json").
SetHeader("Accept", "*/*").
SetHeader("Authorization", authValue).
SetHeader("Origin", "chrome-extension://"+chromeExtensionID).
SetHeader("Sec-Fetch-Site", "cross-site").
SetHeader("Sec-Fetch-Mode", "cors").
SetHeader("Sec-Fetch-Dest", "empty").
SetBodyBytes(body). // SetBodyBytes pins Content-Length; using an
// io.Reader instead forces Transfer-Encoding: chunked, which a
// real fetch() with JSON.stringify body never emits.
Post(endpoint)
if err != nil {
return gjson.Result{}, 0, err
}
defer resp.Body.Close()
// Once we set Accept-Encoding ourselves, Go's HTTP stack stops
// transparently decompressing, so handle gzip/deflate/br by hand.
var reader io.Reader = resp.Body
switch strings.ToLower(resp.Header.Get("Content-Encoding")) {
case "gzip":
gr, err := gzip.NewReader(resp.Body)
if err != nil {
return gjson.Result{}, resp.StatusCode, fmt.Errorf("gzip reader: %w", err)
}
defer gr.Close()
reader = gr
case "deflate":
reader = flate.NewReader(resp.Body)
case "br":
reader = brotli.NewReader(resp.Body)
}
raw, err := io.ReadAll(reader)
if err != nil {
return gjson.Result{}, resp.StatusCode, fmt.Errorf("read response body: %w", err)
}
return gjson.ParseBytes(raw), resp.StatusCode, nil
}
// TranslateByDeepLX performs translation via the DeepL oneshot endpoint.
// Passing dlSession switches to the Pro endpoint; the value is sent
// verbatim as the Bearer token (i.e. it must be an OAuth access token,
// not the legacy dl_session cookie).
func TranslateByDeepLX(sourceLang, targetLang, text string, tagHandling string, proxyURL string, dlSession string) (DeepLXTranslationResult, error) {
if text == "" {
return DeepLXTranslationResult{
Code: http.StatusNotFound,
Message: "No text to translate",
}, nil
}
reqStruct := oneshotRequest{
Text: []string{text},
TargetLang: toOneshotLang(targetLang),
UsageType: "Translate",
AppInformation: appInformation{
OS: "brex_macOS",
OSVersion: "brex_chrome_" + impersonatedChromeMajor + ".0.0.0",
AppVersion: chromeExtensionVersion,
AppBuild: "chrome_web_store",
InstanceID: instanceID,
},
}
if sourceLang != "" && !strings.EqualFold(sourceLang, "auto") {
reqStruct.SourceLang = toOneshotLang(sourceLang)
}
bodyBytes, _ := json.Marshal(reqStruct)
endpoint := oneshotFreeEndpoint
if dlSession != "" {
endpoint = oneshotProEndpoint
}
id := time.Now().UnixMilli()
result, status, err := callOneshot(endpoint, bodyBytes, dlSession, proxyURL)
if err != nil {
return DeepLXTranslationResult{
ID: id,
Code: http.StatusServiceUnavailable,
Message: err.Error(),
}, nil
}
switch status {
case http.StatusOK:
// fall through to body parsing
case http.StatusTooManyRequests:
return DeepLXTranslationResult{
ID: id,
Code: http.StatusTooManyRequests,
Message: "too many requests, your IP has been blocked by DeepL temporarily, please don't request it frequently in a short time",
}, nil
default:
return DeepLXTranslationResult{
ID: id,
Code: http.StatusServiceUnavailable,
Message: fmt.Sprintf("request failed with status code: %d", status),
}, nil
}
translations := result.Get("translations").Array()
if len(translations) == 0 {
return DeepLXTranslationResult{
ID: id,
Code: http.StatusServiceUnavailable,
Message: "Translation failed",
}, nil
}
mainText := translations[0].Get("text").String()
if mainText == "" {
return DeepLXTranslationResult{
ID: id,
Code: http.StatusServiceUnavailable,
Message: "Translation failed",
}, nil
}
if detected := translations[0].Get("detected_source_language").String(); detected != "" {
sourceLang = strings.ToUpper(detected)
}
return DeepLXTranslationResult{
Code: http.StatusOK,
ID: id,
Data: mainText,
Alternatives: nil, // oneshot does not return alternatives
SourceLang: sourceLang,
TargetLang: targetLang,
Method: map[bool]string{true: "Pro", false: "Free"}[dlSession != ""],
}, nil
}