Read data from a txt file instead of .csv file

I am trying to draw a road network consisting of 2 million records.
Is there a way to read this data from a txt file, because the maximum numbers of rows that can fit into a csv file is 1048576

df = pd.read_csv(‘path\file_name.csv’,usecols=[‘x’,‘y’])

Is that not an excel limitation rather than csv? You should be able to read that into a data frame

1 Like

It is a limitation in a csv file too.
I checked my files

Could you take a screenshot explaining where you see that limitation?

Because from my knowledge csv files do not have those limitations. So why do you see it? Is it an OS file size limitation or something else?

Hi Tansneem,

I agree with Mark, I regularly plot larger .csv files so imagine somewhere in the process your file is getting chopped.

If your .txt you know you have more lines in you could adapt your pandas read to load the .txt simply by changing the .csv to .txt inside the braces.

df = pd.read_csv(‘path\file_name.txt’,usecols=[‘x’,‘y’])

Hi Carl

Tried this before and searched for a function that reads a txt.
I think that this function should be changed pd.read_csv( )
because when I try to replace with .txt it gives an error that the column names are missing while they are not.

Hi Mark

This is the row limit. I cannot go further than that even if I try to upload the file as a model.

Could the problem be because I save the excel file as csv. Is there another way to create a csv file.

I suspect but I don’t know, the file you might be loading maybe doesn’t actually have headers try without usecols and see what you get… if I do the following it loads into the dataframe

Hi @Tasneem

What you show is an Excel row limit. It has nothing to do with csv.

How to create the .csv file depends on where your data is coming from. If it is data you calculate, then you can use python (and potentially pandas) to create and save it.

I would need to know more details on your use case to know how it can be solved.

import datashader as ds
import pandas as pd
from colorcet import fire
from datashader import transfer_functions as tf

df = pd.read_csv(‘C:\Users\tasneem\PycharmProjects\plotting data\THE COMPLETE 2m RECORD.txt’,usecols=[‘dropoff_x’,‘dropoff_y’])
#df = pd.read_csv(‘C:\Users\tasneem\Desktop\new.csv’,usecols=[‘dropoff_x’,‘dropoff_y’])

df.head()
agg = ds.Canvas().points(df,‘dropoff_x’,‘dropoff_y’)

tf.set_background(tf.shade(agg, cmap=fire),“black”)

This is my code. so if I remove usecols from pd.read_csv( ), I have to remove it from ds.Canvas().points( ) also
and it does not work

Actually it is a ready-made spatial dataset of a road network (normalized latitude and longitude values) saved as a txt file.
It is similar to the concept of Openstreet map that generates a dataset of a geographical area that you specify.

My problem lies in using datasher to draw the dataset as an image. So I am compelled to use agg = ds.Canvas().points(df,‘dropoff_x’,‘dropoff_y’) to draw the points and this function has 3 attributes that I have to specify: dataframe, x coordinate, y coordinate

I tired to split my data into 2 sheets inside a single csv file then read that file.
But apparently it is reading the first sheet only

After you’ve removed usecols and performed a df.head() what happens? Can you show screen capture.

Let’s see if your data is loading into data frame first. If the code requires column headers it can be added retrospectively.


It is loading to data frame

The data frame doesn’t have column headers so you will need to supply those. It’s failing on usecols of x name because the names don’t exist

df.columns = ['x', 'y']
df.tail(3)

Just one more thing not seeing the file your working with it also appears like you might need to supply an appropriate separator in the read function something like sep=‘ ‘ for a space it looks like pandas thought you had one column when it looks like there is three, an index and your x,y columns.

You’re right, it is working now. I just added the separator field and it can read from a txt file
Thanks alot!

2 Likes