Jump to content

Downloading a file with sockets


Josh
 Share

Recommended Posts

I am writing the Linux implementation of our DownloadFile function.

The src URL I am trying to download is "https://raw.githubusercontent.com/Leadwerks/Documentation/master/Images/ultraengine_logo.png".

The URL gets broken up into the domain "raw.githubusercontent.com" and the path "/Leadwerks/Documentation/master/Images/ultraengine_logo.png".

When the file is downloaded, the contents are a web page for a 404 error. What is wrong here?

	bool DownloadFile(const WString& src, const WString& dest)
	{
		WString url = src;
		if (url.Left(7) == L"http://")
		{
			url = url.Right(url.size()-7);
		}
		else if (url.Left(8) == L"https://")
		{
			url = url.Right(url.size()-8);
		}
		WString domain = url;
		WString path = "/index.html";
		int p = url.Find("/");
		if (p > -1)
		{
			domain = url.Left(p);
			path = url.Right(url.size()-p);
		}

		Print(domain); 
		Print(path);

		struct sockaddr_in servaddr;
		struct hostent *hp;
		int sock_id;
		//char message[1024*1024] = {0};
		int msglen;
		WString r = L"GET " + path + L" HTTP/1.0\nFrom: who cares\nUser-Agent: Ultra Engine\n\n";
		std::string ss = r.ToUTF8String();
		const char* request = ss.c_str();

		//Get a socket
		if((sock_id = socket(AF_INET, SOCK_STREAM, 0)) == -1)
		{
			Print("Error: Couldn't get a socket.");
			return false;
		}

		//book uses bzero which my man pages say is deprecated
		//the man page said to use memset instead. :-)
		memset(&servaddr,0,sizeof(servaddr));

		std::string s = domain.ToUTF8String();
		if((hp = gethostbyname(s.c_str())) == NULL)
		{
			Print("Error: Couldn't get an address.");
			return false;
		}

		//bcopy is deprecated also, using memcpy instead
		memcpy((char *)&servaddr.sin_addr.s_addr, (char *)hp->h_addr, hp->h_length);

		//fill int port number and type
		servaddr.sin_port = htons(80);
		servaddr.sin_family = AF_INET;

		//make the connection
		int err = connect(sock_id, (struct sockaddr *)&servaddr, sizeof(servaddr));
		if (err != 0)
		{
			Print(errno);
			Print(err);
			Print("Error: Couldn't connect (" + String(errno)+ ").");
			return false;
		}

		//NOW THE HTTP PART!!!

		//send the request
		write(sock_id,request,strlen(request));

		//read the response
		auto buffer = CreateBuffer(0);

		int blocksize = 1024*1024;
		int bufferpos;
		
		while (true)
		{
			bufferpos = buffer->GetSize();
			buffer->Resize(buffer->GetSize() + blocksize);
			msglen = read(sock_id,buffer->Data() + bufferpos,blocksize);
			if (msglen == -1)
			{
				return false;
			}
			buffer->Resize(bufferpos + msglen);
			if (msglen == 0) break;
		}

		return buffer->Save(dest);
	}

The returned text look like this:
HTTP/1.1 404 Not Found Server: GitHub.com Content-Type: text/html; charset=utf-8 ETag: "5e6d6874-239b" Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; img-src data:; connect-src 'self' X-GitHub-Request-Id: D6BC:147A:D22AE:142034:60A6037B Content-Length: 9115 Accept-Ranges: bytes Date: Thu, 20 May 2021 06:40:38 GMT Via: 1.1 varnish Age: 235 Connection: close X-Served-By: cache-lga21928-LGA X-Cache: HIT X-Cache-Hits: 1 X-Timer: S1621492839.513378,VS0,VE0 Vary: Accept-Encoding X-Fastly-Request-ID: 19663c78bc45f40978e491413f8b81fe66626fbf

404

There isn't a GitHub Pages site here.

If you're trying to publish one, read the full documentation to learn how to set up GitHub Pages for your repository, organization, or user account.

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

I replaced this with a call to curl:

		DeleteFile(dest);
		if (FileType(dest)==1) return false;
		std::string comm = ("curl -o \"" + RealPath(dest) + "\" \"" + src + "\"").ToUTF8String();
		system(comm.c_str());
		return FileType(dest) == 1;

 

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

27 minutes ago, Josh said:

I replaced this with a call to curl:




		DeleteFile(dest);
		if (FileType(dest)==1) return false;
		std::string comm = ("curl -o \"" + RealPath(dest) + "\" \"" + src + "\"").ToUTF8String();
		system(comm.c_str());
		return FileType(dest) == 1;

 

Not sure if you're saying it now was working so not a problem, but if it's still a problem I would suggest grabbing the string you use in the c++ version just before you USE it and just check it over visually for oddities like spaces etc. and stick it back a browser and make sure it's what you thought it was and works.  Basically it sounds like it probably should work but might be something silly in formatting that just looks correct on a quick glance.

Also it can simplify things to try HTTP rather than HTTPS initially just to remove the problem of any SSL certificate complaints. 

Just noticed you are asking for HTTP 1.0.  Could be the problem.  I think this is old and might be rejected by some servers. The 404 response is responding with HTTP 1.1 you'll note.

The way the HTTP request is being built is very manual and feels 'hopeful' that the server will like it.   Instead I would try to see what an actual browser  or tool would send and pretend to be that first with regards to HTTP 1.x spec, user agent etc.

 

  • Thanks 1
Link to comment
Share on other sites

Cool, loading a file from a URL works now:

153578812_Screenshotfrom2021-05-2002-29-43.thumb.png.646d9c34aa1bda6e421a0bfb9cbe5a71.png

#include "UltraEngine.h"

using namespace UltraEngine;

int main(int argc, const char* argv[])
{
    //Get the displays
    auto displays = GetDisplays();

    //Create window
    auto window = CreateWindow("Ultra Engine", 0, 0, 800, 600, displays[0]);

    //Create user interface
    auto ui = CreateInterface(window);

    //Create a pixmap
    auto pixmap = LoadPixmap("https://raw.githubusercontent.com/Leadwerks/Documentation/master/Assets/Materials/Ground/dirt01.dds");

    //Show the pixmap
    ui->root->SetPixmap(pixmap);

    while (true)
    {
        const Event ev = WaitEvent();
        switch (ev.id)
        {
        case EVENT_WINDOWCLOSE:
            return 0;
            break;
        }
    }

    return 0;
}

 

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

FYI, curl is sorta why Leadwerks is hard to compile anything above 18.04. Do you plan to bake curl into the sdk to limit dependencies? I wish not to be forced to keep a 2 year old distro because of a fix version of a library from a repo.

Cyclone - Ultra Game System - Component PreprocessorTex2TGA - Darkness Awaits Template (Leadwerks)

If you like my work, consider supporting me on Patreon!

Link to comment
Share on other sites

It's just using the executable installed on the system, not building the library, so the version doesn't matter. I think curl comes on most distros by default.

  • Like 1

My job is to make tools you love, with the features you want, and performance you can't live without.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...