WWW Reader liest nur "Müll"

goran · 29. Mai 2025 um 08:48

Hallo zusammen,
mein WWW Reader liest nur noch, ja wie soll ichs beschreiben, „Müll“:

TXT: 29.05.2025, 10:14:09 | https://www.radio-bamberg.de/blitzer-verkehr/ | <US>�<BS><NUL><NUL><NUL><NUL><NUL><NUL><ETX>���r���(����)�w��<EOT>E�<ETB>IԲgI����n��7{�<SUB><ENQ>H�$,�`<ETX>�.�֎�1<DC1>'N�D���s"�<GS>��y�SW��(���

Das jetzt hier nur Auschnittsweise. Aber eben nicht mehr die html Inhalte. Wo liegt hier das Problem?
Und abgesehen davon, ich habe die Instanz „WWW Reader“, neu hinzufügen könnte ich den „WWW Ausleser“ und zusätzlich gibt es noch den HTTP Client. Wie/Was ist den hier der Unterschied?

gruß

tobiasr · 29. Mai 2025 um 09:12

Bist du dir sicher, dass du HTML willst und nicht eher die Ausgabe z.B. hiervon?

https://www.radio-bamberg.de/cache/playlists/all-channels.json

goran · 29. Mai 2025 um 09:15

ja ich will das Html. Ich extrahiere dann die Verkehrsmeldungen und schicke mir die per Prowl aufs Handy. Das ganze hat ja vor ein paar Tagen wunderbar funktioniert…

tobiasr · 29. Mai 2025 um 09:19

IP-Symcon Update gemacht und seit dem geht es nicht mehr?

goran · 29. Mai 2025 um 09:20

könnte ca. zusammen fallen, ja.

paresy · 29. Mai 2025 um 14:04

$ curl https://www.radio-bamberg.de/blitzer-verkehr/          
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

Ich glaube da ist bei denen was neues

paresy

goran · 29. Mai 2025 um 15:33

Davon bin ich ausgegangen… nur Frage 1: woher kommt jetzt die Fehlermeldung?
2. Ist das Absicht um eine externe Anfrage zu stören und wie kann ich das umgehen?
3. Ich bin davon ausgegangen dass sich der WWW Reader der externen Seite gegenüber wie ein Browser verhält und der Html ausgewertet wird. Ist dem nicht so?

Vieleicht könnt ihr meine Ahnungslosigkeit etwas erhellen…

gruß

Nall-chan · 29. Mai 2025 um 16:16

Ist halt gzip komprimiert.
Aber warum wird das sowohl mit curl, Symcon & Co nicht automatisch erkannt?
So geht es:

$url = 'https://www.radio-bamberg.de/blitzer-verkehr/';
$d = gzdecode(file_get_contents($url));
echo $d;

firebuster · 30. Mai 2025 um 04:54

Es gibt mittlerweile viele Seiten die z.B. von Cloudflare vor DDOS Attacken geschützt werden. Die blockieren genau solche Anfragen.
Teilweise verständlich aber für uns meist eher nervig.

paresy · 30. Mai 2025 um 05:01

Wobei mir wundert dass curl es im Terminal auch betrifft - auch wenn ich meinen Browser Agent gut setzen

paresy

goran · 1. Juni 2025 um 09:14

Ok danke euch. Dann lese ich es halt mit einem Skript aus. Ist ja auch kein Problem. Kann ich das dann auch irgendwie an den Text Parser übergeben? Ich glaube nicht oder?
Sonst muss ich das halt mit php lösen.

paresy · 1. Juni 2025 um 11:18

Also du könntest das Skript mit einem Hook verbinden und den WWW Reader darauf zeigen lassen

paresy

tobiasr · 1. Juni 2025 um 11:35

@paresy könnt ihr den WWW Reader (und ggf. auch die Sys_GetURL oder so ähnlich) nicht reparieren, dass die auch GZIP korrekt verarbeiten?

paresy · 1. Juni 2025 um 11:40

Tut er, aber die Webseite macht irgendwas so, dass curl (welches wir intern nutzen) dies nicht korrekt verarbeiten kann. Ich vermute, dass dies bewusst gemacht wird.

paresy

Nall-chan · 1. Juni 2025 um 14:09

Gibt dazu ein Bugreport von curl. Wo auch erklärt wird das die Ursache die Website ist, weil sie gzip sendet auch wenn es nicht angefordert wurde.
Lösung wäre imho immer den --compressed Parameter zu setzen.

github.com/curl/curl

Raw compressed output when not using --compressed but server returns gzip data

opened 05:42PM - 05 Aug 18 UTC

closed 03:33PM - 06 Aug 18 UTC

nikosdion

HTTP cmdline tool

### I did this `curl -L -s "https://downloads.joomla.org/latest" -o -"` ##…# I expected the following Get a screen full of HTML data. What happened is that I got a gzip stream. ### curl/libcurl version ``` curl 7.58.0 (x86_64-pc-linux-gnu) libcurl/7.58.0 OpenSSL/1.1.0g zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3 Release-Date: 2018-01-24 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL ``` ### operating system Ubuntu 18.04 ```$ uname -a Linux myhostname 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` ### Further debugging This seems to only happen with sites hosted on Rochen. I've tried the aforementioned site `https://downloads.joomla.org/latest` as well as `https://www.dionysopoulos.me`. Using `--verbose` I see that cURL does not send an Accept-Encoding header at all. The server returns compressed output with the HTTP header `content-encoding: gzip`. However, this response header seems to be ignored by cURL 7.58.0. It also looks like cURL 7.47.0 (on Ubuntu 16.04) does not exhibit this behavior, i.e. it parses the compressed content correctly. Passing the `--compressed` option to cURL the server output is uncompressed and returned correctly. In this case I see that the `Accept-Encoding: gzip, deflate` header is sent by cURL. The server still returns compressed output with the HTTP header `content-encoding: gzip` but this time cURL parses everything correctly. In the name of ruling out a false positive I tried requests to two other sites, `https://www.akeebabackup.com` hosted with a different host (SiteGround, using NginX in front of Apache) and `https://translate.akeeba.com` (self-hosted on Linode using straight up Apache 2.4 Ubuntu 16.04). These servers seem to behave themselves. Without the `--compressed` option the response is sent uncompressed from the server and displayed correctly by cURL. With the `--compressed` option the response is sent compressed with deflate by the server, the `content-encoding: deflate` HTTP header is set and cURL displays the correct, uncompressed output. I am not sure if this is a cURL bug or an issue with the way Rochen's servers behave when no Accept-Encoding HTTP header is sent at all. As far as I can see, the server's behavior is correct per RFC 2616 but incorrect per RFC 7231 (which obsoleted RFC 2616). I don't know if I'm on the right track because these RFC's are for HTTP/1.1 whereas these servers talk HTTP/2. If it's indeed a server issue (not respecting RFC 7231) please feel free to tell me so and close this GitHub issue. I'll then file a bug report with the host. Thank you in advance!

Michael

goran · 3. Juni 2025 um 09:35

Das funktioniert. Und ich hab wieder was gelernt