4 minutes read

Introduction

It is convenient to store different data in different formats. For example, it is most logical to save the text in a text file like .txt or .docx. But what if the data is in a form of a table with a lot of headers and cells? There are different formats for storing such tabular data. And today we will get acquainted with two of them: the CSV format and the TSV, which is similar to it. Let's start with CSV.

What is CSV

CSV stands for comma-separated values. In fact, a CSV file is a simple text file with numerous lines. Each line of the file is a data entry containing elements separated by a comma. Let's have a look at an example of CSV:

a CSV file opened in Libre Office Calc on left and Notepad on right

Our file is called cars.csv, we opened it with Libre Office Calc (see picture on the left). The right picture shows the same file opened in Notepad. It contains information about different cars, their models, and the production year.

The first line is a line with column titles separated by commas: Car brand, Car model, and Production year. It's an optional header line, so if you are going to carry out any manipulations with the data, then you need to make sure that the line with the headers is separate from the values. All the other lines (entries) with certain values contain information about cars.

CSV not only stands for comma-separated values and implies a certain standard for tabular data with a comma delimiter, but has another more general meaning. This is an umbrella term. Often CSV denotes data separated by any delimiter. That is, instead of a comma, there can be, for example, spaces, a semicolon, any character that the user wants, and the tabulation inherent in the TSV format can also be such a separator in the CSV file. Thus, these alternative delimiter-separated files are often given a .csv extension despite the use of a non-comma field separator.

After we got acquainted with CSV, let's find out how to work with it.

Working with CSV

To work with a CSV file one just needs to open it. The CSV file format is supported by almost all spreadsheets and database management systems, including LibreOffice Calc, Gnumeric, Emacs, Microsoft Excel, Numbers, SpreadsheetPro, CSVed, KSpread, and Google Sheets. Also, CSV import and export is possible in many engineering packages such as ANSYS and LabVIEW.

For example, that's how a CSV file in Libre Office Calc looks:

a CSV file opened in Libre Office Calc

Moreover, one can work with a CSV file using programming. For example, in Python one can import either csv or pandas module to open, edit and create tables in the CSV format. In Java one may use CSVReader, and in C Sharp there is a CsvHelper module. It's also possible to work with CSV files in C++ and PHP and some other programming languages.

So, now we know how to work with CSV files. But what about the other similar TSV format? Let's take a look at it.

What is TSV

TSV is the second most popular delimited file format, which is the reason why it got its name and even its .tsv extension. It stands for tab-separated values. So, TSV also allows you to store tabular data, but it provides another delimiter. Each record in the table is a line of a text file. Each field of the record is separated from the others by a tab character, more precisely a horizontal tab. It is so because the comma is quite common in text data, in the spelling of numbers according to some national standards, in contrast to the tabulation. Thus, it does not need to escape the comma in the middle of the values.

Visually, the TSV file looks exactly the same as the CSV file. So we can open a TSV file with the same spreadsheets and database management systems as CSV.

Why CSV

CSV and TSV files are often used for data interchange between software with different internal file formats. It is especially useful in business tasks. More specifically, one can move data between different vendor implementations of spreadsheets, databases, and data from websites such as banking transactions.

That's why some choose CSV. Moreover, the CSV format has several notable advantages. CSV files:

  • Provide a straightforward information schema;
  • Can be viewed and edited even in a text editor;
  • Are human-readable;
  • Are simple to create and parse;
  • Are compact.

On the other hand, there are some disadvantages:

  • There can be only a single sheet in a CSV file;
  • They keep only raw data, no macros, and formulas;
  • No distinction between text and numeric values;
  • No standard way to represent binary data;
  • Problems with the distinction between null values and empty strings.

And now you know, what CSV is for and what advantages and disadvantages this format has.

Conclusion

To sum up,

  • One can store tabular data in CSV and TSV formats;
  • CSV stands for comma-separated values and TSV means tab-separated values;
  • These file formats are supported by almost all spreadsheets and database management systems;
  • CSV also denotes the more general concept of "delimited data", where the delimiter can be almost anything.

So, are you up for some challenge? Proceed to the tasks and see how well you understood this topic!

134 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo